|
|
DescriptionUse TaskScheduler instead of WorkerPool in nss_util.cc.
The following traits are used:
Priority: Inherited (default)
The priority is inherited from the calling context (i.e. TaskTraits
are initialized with the priority of the current task).
Shutdown behavior: CONTINUE_ON_SHUTDOWN
Tasks posted with this mode which have not started executing before
shutdown is initiated will never run. Tasks with this mode running at
shutdown will be ignored (the worker will not be joined).
Note: Tasks that were previously posted to base::WorkerPool should
use this shutdown behavior because this is how base::WorkerPool
handles all its tasks.
May Block:
Tasks posted with MayBlock() may block. This includes but is not
limited to tasks that wait on synchronous file I/O operations:
read or write a file from disk, interact with a pipe or a socket,
rename or delete a file, enumerate files in a directory, etc. This
trait isn't required for the mere use of locks.
BUG=659191
Patch Set 1 #
Total comments: 1
Messages
Total messages: 24 (9 generated)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
fdoray@chromium.org changed reviewers: + davidben@chromium.org
PTAL. WorkerPool is being deprecated in favor of TaskScheduler.
davidben@chromium.org changed reviewers: + rsleevi@chromium.org
+rsleevi is probably a better person to review this.
This is one of those changes that Gab and I had discussed as benefiting from/needing worker pool, rather than switching to TaskScheduler. I don't believe this is a change we want, but I think it may be easier to chat over GVC to discuss.
On 2017/01/03 19:03:47, Ryan Sleevi wrote: > This is one of those changes that Gab and I had discussed as benefiting > from/needing worker pool, rather than switching to TaskScheduler. > > I don't believe this is a change we want, but I think it may be easier to chat > over GVC to discuss. I know you talked with Gab about crypto operations which need to run on a single thread. However, this CL just migrates NSSInitSingleton::InitializeTPMTokenOnWorkerThread from WorkerPool to TaskScheduler. Both WorkerPool and TaskScheduler are dynamic thread pools. WorkerPool can run the tasks it receives concurrently in any order. Tasks posted to TaskScheduler with base::PostTask*() can also run concurrently in any order. However, TaskScheduler provides more advanced features: sequences of tasks, priorities, shutdown behaviors, single-threaded TaskRunners... If you really think that NSSInitSingleton::InitializeTPMTokenOnWorkerThread benefits from something that is provided by WorkerPool but not TaskScheduler, we can schedule a GVC meeting. [1] TaskScheduler offers the CreateSingleThreadTaskRunner() function to run multiple tasks on the same thread, but it is not used in this CL.
Description was changed from ========== Use TaskScheduler instead of WorkerPool in nss_util.cc. The following traits are used: Priority: Inherited (default) The priority is inherited from the calling context (i.e. TaskTraits are initialized with the priority of the current task). Shutdown behavior: CONTINUE_ON_SHUTDOWN Tasks posted with this mode which have not started executing before shutdown is initiated will never run. Tasks with this mode running at shutdown will be ignored (the worker will not be joined). Note: Tasks that were previously posted to base::WorkerPool should use this shutdown behavior because this is how base::WorkerPool handles all its tasks. May Block: Tasks posted with MayBlock() may block. This includes but is not limited to tasks that wait on synchronous file I/O operations: read or write a file from disk, interact with a pipe or a socket, rename or delete a file, enumerate files in a directory, etc. This trait isn't required for the mere use of locks. BUG=659191 ========== to ========== Use TaskScheduler instead of WorkerPool in nss_util.cc. The following traits are used: Priority: Inherited (default) The priority is inherited from the calling context (i.e. TaskTraits are initialized with the priority of the current task). Shutdown behavior: CONTINUE_ON_SHUTDOWN Tasks posted with this mode which have not started executing before shutdown is initiated will never run. Tasks with this mode running at shutdown will be ignored (the worker will not be joined). Note: Tasks that were previously posted to base::WorkerPool should use this shutdown behavior because this is how base::WorkerPool handles all its tasks. May Block: Tasks posted with MayBlock() may block. This includes but is not limited to tasks that wait on synchronous file I/O operations: read or write a file from disk, interact with a pipe or a socket, rename or delete a file, enumerate files in a directory, etc. This trait isn't required for the mere use of locks. BUG=659191 ==========
fdoray@chromium.org changed reviewers: + gab@chromium.org
On 2017/01/04 16:30:22, fdoray wrote: > On 2017/01/03 19:03:47, Ryan Sleevi wrote: > > This is one of those changes that Gab and I had discussed as benefiting > > from/needing worker pool, rather than switching to TaskScheduler. > > > > I don't believe this is a change we want, but I think it may be easier to chat > > over GVC to discuss. > > I know you talked with Gab about crypto operations which need to run on a single > thread. > > However, this CL just migrates > NSSInitSingleton::InitializeTPMTokenOnWorkerThread from WorkerPool to > TaskScheduler. Both WorkerPool and TaskScheduler are dynamic thread pools. > WorkerPool can run the tasks it receives concurrently in any order. Tasks posted > to TaskScheduler with base::PostTask*() can also run concurrently in any order. > However, TaskScheduler provides more advanced features: sequences of tasks, > priorities, shutdown behaviors, single-threaded TaskRunners... Right, @rsleevi: this is different from all the work running on the IO thread which we discussed. The WorkerPool API is being deprecated in favor of TaskScheduler per the latter's API being a superset of the former (plus it allows chrome to own more of its threads and as such be in a better position for scheduling). Is there anything WorkerPool gives you here that TaskScheduler doesn't? > > If you really think that NSSInitSingleton::InitializeTPMTokenOnWorkerThread > benefits from something that is provided by WorkerPool but not TaskScheduler, we > can schedule a GVC meeting. > > [1] TaskScheduler offers the CreateSingleThreadTaskRunner() function to run > multiple tasks on the same thread, but it is not used in this CL.
On 2017/01/11 16:13:57, gab wrote: > Right, @rsleevi: this is different from all the work running on the IO thread > which we discussed. The WorkerPool API is being deprecated in favor of > TaskScheduler per the latter's API being a superset of the former (plus it > allows chrome to own more of its threads and as such be in a better position for > scheduling). Is there anything WorkerPool gives you here that TaskScheduler > doesn't? Yes, it guarantees the creation of distinct threads to run the task. That is, a task submitted to the workerpool will *always* start on a thread (whether reusing existing or starting a new one) within a turn of the pump. This ensures we don't have any systemic deadlocks due to multiple resources (which may include "surprising" resources, like the GPU tasks) from being blocked on the NSS mutex. Put differently, we don't want to 'queue' these messages. They need to start and start independently because they may block other, user-sensitive tasks from completing (as a consequence of the NSS global mutex) base::WorkerPool was chosen over manually managing (e.g. base::Thread) because the base::WorkerPool automatically handled any necessary shutdown semantics w/o joining.
On 2017/01/11 18:17:44, Ryan Sleevi wrote: > On 2017/01/11 16:13:57, gab wrote: > > Right, @rsleevi: this is different from all the work running on the IO thread > > which we discussed. The WorkerPool API is being deprecated in favor of > > TaskScheduler per the latter's API being a superset of the former (plus it > > allows chrome to own more of its threads and as such be in a better position > for > > scheduling). Is there anything WorkerPool gives you here that TaskScheduler > > doesn't? > > Yes, it guarantees the creation of distinct threads to run the task. That is, a > task submitted to the workerpool will *always* start on a thread (whether > reusing existing or starting a new one) within a turn of the pump. This ensures > we don't have any systemic deadlocks due to multiple resources (which may > include "surprising" resources, like the GPU tasks) from being blocked on the > NSS mutex. > > Put differently, we don't want to 'queue' these messages. They need to start and > start independently because they may block other, user-sensitive tasks from > completing (as a consequence of the NSS global mutex) > > base::WorkerPool was chosen over manually managing (e.g. base::Thread) because > the base::WorkerPool automatically handled any necessary shutdown semantics w/o > joining. When using base::PostTask, the work is posted in parallel, it will be picked up by one of the TaskScheduler's worker threads regardless of other tasks (i.e. it's not sequenced with any other work (nor even other tasks posted by from that same callsite) and will be scheduled the second a worker picks it as its next task) so I don't see the difference. If you mean that the WorkerPool's thread count grows without bounds to accommodate for more work this (1) isn't quite true (i.e. it's capped at 256 threads on Windows) and this (2) is also not a desirable property, we'd much rather own our threads and decide when we're okay running one more task. You get the same shutdown semantics from the TaskScheduler via the TaskShutdownBehavior::CONTINUE_ON_SHUTDOWN trait.
On 2017/01/11 18:57:38, gab wrote: > When using base::PostTask, the work is posted in parallel, it will be picked up > by one of the TaskScheduler's worker threads regardless of other tasks (i.e. > it's not sequenced with any other work (nor even other tasks posted by from that > same callsite) and will be scheduled the second a worker picks it as its next > task) so I don't see the difference. If you mean that the WorkerPool's thread > count grows without bounds to accommodate for more work this (1) isn't quite > true (i.e. it's capped at 256 threads on Windows) and this (2) is also not a > desirable property, we'd much rather own our threads and decide when we're okay > running one more task. Right, considering that the NSS issues only manifest on Linux, I'm ignoring (1), and suggesting that (2) is a desirable property on some dimensions - such as it ensures reasoning about what threads will do what. As an explanation of the convolution and why I'm uneasy with the migration (or more aptly, I'm uncertain that this isn't going to cause all sorts of explosions 3 weeks into a stable because it involves a user's smart card that we fundamentally cannot test with - which has happened, including affecting things like 1.1 million such smartcards used by citizens in a country to do all government interaction online...): - Chromium code grabs a lock - Chromium code calls an NSS function - NSS code internally locks - NSS sees smartcard needs authentication - NSS re-enters Chromium (via function pointer to handle smart card auth) - Chromium code needs to PostTask to the UI thread to do some work - UI thread needs to PostTask to the GPU thread to do some work - [God help me if the GPU thread needs the lock here - I think we shook out all those cases] - Additional tasks may be queued on any number of threads - Eventually UI posts back to the worker (if it's running a message loop) or tickles the semaphore (if it's blocking) - Chromium code finishes, returning control to NSS code - NSS code does the needful - NSS code unlocks and returns control to Chromium code - Chromium code unlocks Basically, whenever an NSS function does a WorkerPool PostTask, it's explicitly doing so to try to ensure that a new thread will be started to complete the work. That's the desired behaviour - because it then cuts out many of the scenarios where tasks are being starved because all worker threads are waiting on events that can only be processed when more tasks are drained, but cannot complete until those tasks drain. Starvation. There's a lot that can be said wrong about this, and I don't disagree that this is not the optimal position. But in the steady state it's historically represented little cost (versus the significant cost of the alternatives), except in the places where people suddenly violate these assumptions. And when those assumptions are violated, they have to be owned by //net folks to fix. So that's why I'm pushing back here and trying to see if we can't find a solution that allows those invariants, and if we can't, that we can quantify why and what the cost would be to allow that. I'm also trying to figure out if it's something you're fundamentally opposed to, or whether it's something that can be accepted as "not ideal, but acceptable"
On 2017/01/11 20:29:42, Ryan Sleevi wrote: > On 2017/01/11 18:57:38, gab wrote: > > When using base::PostTask, the work is posted in parallel, it will be picked > up > > by one of the TaskScheduler's worker threads regardless of other tasks (i.e. > > it's not sequenced with any other work (nor even other tasks posted by from > that > > same callsite) and will be scheduled the second a worker picks it as its next > > task) so I don't see the difference. If you mean that the WorkerPool's thread > > count grows without bounds to accommodate for more work this (1) isn't quite > > true (i.e. it's capped at 256 threads on Windows) and this (2) is also not a > > desirable property, we'd much rather own our threads and decide when we're > okay > > running one more task. > > Right, considering that the NSS issues only manifest on Linux, I'm ignoring (1), > and suggesting that (2) is a desirable property on some dimensions - such as it > ensures reasoning about what threads will do what. > > As an explanation of the convolution and why I'm uneasy with the migration (or > more aptly, I'm uncertain that this isn't going to cause all sorts of explosions > 3 weeks into a stable because it involves a user's smart card that we > fundamentally cannot test with - which has happened, including affecting things > like 1.1 million such smartcards used by citizens in a country to do all > government interaction online...): > - Chromium code grabs a lock > - Chromium code calls an NSS function > - NSS code internally locks > - NSS sees smartcard needs authentication > - NSS re-enters Chromium (via function pointer to handle smart card auth) > - Chromium code needs to PostTask to the UI thread to do some work > - UI thread needs to PostTask to the GPU thread to do some work > - [God help me if the GPU thread needs the lock here - I think we shook out all > those cases] > - Additional tasks may be queued on any number of threads > - Eventually UI posts back to the worker (if it's running a message loop) or > tickles the semaphore (if it's blocking) > - Chromium code finishes, returning control to NSS code > - NSS code does the needful > - NSS code unlocks and returns control to Chromium code > - Chromium code unlocks > > Basically, whenever an NSS function does a WorkerPool PostTask, it's explicitly > doing so to try to ensure that a new thread will be started to complete the > work. That's the desired behaviour - because it then cuts out many of the > scenarios where tasks are being starved because all worker threads are waiting > on events that can only be processed when more tasks are drained, but cannot > complete until those tasks drain. Starvation. > > There's a lot that can be said wrong about this, and I don't disagree that this > is not the optimal position. But in the steady state it's historically > represented little cost (versus the significant cost of the alternatives), > except in the places where people suddenly violate these assumptions. And when > those assumptions are violated, they have to be owned by //net folks to fix. > > So that's why I'm pushing back here and trying to see if we can't find a > solution that allows those invariants, and if we can't, that we can quantify why > and what the cost would be to allow that. I'm also trying to figure out if it's > something you're fundamentally opposed to, or whether it's something that can be > accepted as "not ideal, but acceptable" AFAIK, waiting on another thread is an anti-pattern in Chromium. I understand that it might be necessary sometimes, e.g. in a reentrant call on a random thread that needs to respond synchronously but requires state from a specific thread. It should however be so infrequent that having all worker threads blocked on work which is behind it in the pool's queue is extremely unlikely (if it did get into that state: Chrome would stall regardless of NSS). The one thing we just discussed here offline that makes this more likely is that in the current design, one can request SingleThreadTaskRunner for thread-affine work and that binds a single worker thread as the only thread allowed to process that queue. Today that thread can also process other work when idle and as such could unluckily pick up a task doing NSS which could re-enter and block for a response meant to be provided by the very thread being blocked. Based on this we just decided to tweak the TaskScheduler to give SingleThreadTaskRunners their very own (unshared) thread. In that world, a parallel task posted to the TaskScheduler is guaranteed to be processed and we can even give it .WithPriority(TaskPriority::USER_BLOCKING) if it must be processed as fast as possible. USER_BLOCKING tasks will never starve (if they somehow did, Chrome would have major issues much beyond NSS). > I'm also trying to figure out if it's something you're fundamentally opposed to, or whether it's something that can be accepted as "not ideal, but acceptable" We're trying to coalesce all PostTask APIs into TaskScheduler and as such keeping WorkerPool around for a single use case is highly undesirable. We instead want to tweak TaskScheduler to handle all use cases. With the aforementioned tweak to the scheduler, would you be okay with this change?
On 2017/01/11 21:29:10, gab wrote: > AFAIK, waiting on another thread is an anti-pattern in Chromium. It is, but it's required and extremely common when interacting with OS or third-party libraries. Every OS API call can be seen as potentially blocking on another thread. I understand > that it might be necessary sometimes, e.g. in a reentrant call on a random > thread that needs to respond synchronously but requires state from a specific > thread. It should however be so infrequent that having all worker threads > blocked on work which is behind it in the pool's queue is extremely unlikely (if > it did get into that state: Chrome would stall regardless of NSS). At present, it would only stall if the critical threads were blocked - UI, GPU, IPC, etc. However, it's precisely to avoid this issue that we dispatch NSS calls to Worker Threads which automatically grow with the work. > The one thing we just discussed here offline that makes this more likely is that > in the current design, one can request SingleThreadTaskRunner for thread-affine > work and that binds a single worker thread as the only thread allowed to process > that queue. Today that thread can also process other work when idle and as such > could unluckily pick up a task doing NSS which could re-enter and block for a > response meant to be provided by the very thread being blocked. That is explicitly something we do not want to do here, as that significantly serializes tasks in a performance-degrading way. This isn't just gut feeling - we actively experimented with this and paid between 20-100ms latency depending on device and activity. For some Google internal resources, it added 30s to startup. And while these cases were influenced by secondary decisions and unfortunate design tradeoffs by Google services, it's unquestionable that it does happen. > We're trying to coalesce all PostTask APIs into TaskScheduler and as such > keeping WorkerPool around for a single use case is highly undesirable. We > instead want to tweak TaskScheduler to handle all use cases. While I understand your goal, I don't think the solution is workable. Worst case, I feel like the answer is "Reimplement worker pool in //net so you can deprecate what you want, even though the functionality is still needed." Understandably, I don't think either of us see that particularly desirable, so I'm trying to see if there's options we can find. If not though, I do believe that would be the right solution, versus using TaskScheduler. > With the aforementioned tweak to the scheduler, would you be okay with this > change? No
On 2017/01/11 21:36:35, Ryan Sleevi wrote: > On 2017/01/11 21:29:10, gab wrote: > > AFAIK, waiting on another thread is an anti-pattern in Chromium. > > It is, but it's required and extremely common when interacting with OS or > third-party libraries. Every OS API call can be seen as potentially blocking on > another thread. > > I understand > > that it might be necessary sometimes, e.g. in a reentrant call on a random > > thread that needs to respond synchronously but requires state from a specific > > thread. It should however be so infrequent that having all worker threads > > blocked on work which is behind it in the pool's queue is extremely unlikely > (if > > it did get into that state: Chrome would stall regardless of NSS). > > At present, it would only stall if the critical threads were blocked - UI, GPU, > IPC, etc. However, it's precisely to avoid this issue that we dispatch NSS calls > to Worker Threads which automatically grow with the work. > > > The one thing we just discussed here offline that makes this more likely is > that > > in the current design, one can request SingleThreadTaskRunner for > thread-affine > > work and that binds a single worker thread as the only thread allowed to > process > > that queue. Today that thread can also process other work when idle and as > such > > could unluckily pick up a task doing NSS which could re-enter and block for a > > response meant to be provided by the very thread being blocked. > > That is explicitly something we do not want to do here, as that significantly > serializes tasks in a performance-degrading way. This isn't just gut feeling - > we actively experimented with this and paid between 20-100ms latency depending > on device and activity. For some Google internal resources, it added 30s to > startup. And while these cases were influenced by secondary decisions and > unfortunate design tradeoffs by Google services, it's unquestionable that it > does happen. I'm not sure what you mean by "this" and "it" here but I don't think we're talking about the same thing..? And as I mentioned, processing other work on SingleThreadTaskRunners while idle is something we're looking to drop. > > > We're trying to coalesce all PostTask APIs into TaskScheduler and as such > > keeping WorkerPool around for a single use case is highly undesirable. We > > instead want to tweak TaskScheduler to handle all use cases. > > While I understand your goal, I don't think the solution is workable. Worst > case, I feel like the answer is "Reimplement worker pool in //net so you can > deprecate what you want, even though the functionality is still needed." > Understandably, I don't think either of us see that particularly desirable, so > I'm trying to see if there's options we can find. If not though, I do believe > that would be the right solution, versus using TaskScheduler. Let's settle on exactly what the problem is though: Why do you absolutely need another thread? To make sure it runs, right? We could provide you a TaskTraits::MustRun() trait or something that would create another thread for that task if all workers are busy (and there are already cases where we're considering going beyond |max_threads| in the scheduler's impl). But I'm not sure we need this if we can already guarantee you your task will run..? Can there be enough NSS tasks to flood the pool itself such that all workers would block on the NSS lock and the one with the lock would do a reentrant call that would also block on something in the queue? If not, I don't see what can preempt your task from running? And since they're all using the same NSS lock, parallelizing them also doesn't make much sense? We could sequence all NSS tasks to make sure to avoid this problem? Worse comes to worst: base::Thread::Options now has a |joinable| field, could this simply use its own non-joinable thread? > > > With the aforementioned tweak to the scheduler, would you be okay with this > > change? > > No
On 2017/01/11 21:55:05, gab wrote: > I'm not sure what you mean by "this" and "it" here but I don't think we're > talking about the same thing..? And as I mentioned, processing other work on > SingleThreadTaskRunners while idle is something we're looking to drop. it = serializing NSS tasks through a bounded set of threads. > Let's settle on exactly what the problem is though: > > Why do you absolutely need another thread? To make sure it runs, right? I'm not sure how to parse "to make sure it runs" (and your remarks below re: guarantee expand on that) > We could > provide you a TaskTraits::MustRun() trait or something that would create another > thread for that task if all workers are busy (and there are already cases where > we're considering going beyond |max_threads| in the scheduler's impl). But I'm > not sure we need this if we can already guarantee you your task will run..? As with guarantee, I'm not sure if you're suggesting "eventually" or "Without regard to other tasks or pools". A different way to put it might be that I believe we *don't* want NSS to participate in any Lucky Luke prioritization or reordering, which may explain some of the disconnect. > Can > there be enough NSS tasks to flood the pool itself such that all workers would > block on the NSS lock and the one with the lock would do a reentrant call that > would also block on something in the queue? This is the thing I'm afraid of, and which has happened in the past, and which is why we moved NSS tasks like this to the WorkerPool - to ensure they're consuming a different resource from 'any' other task in Chrome (since, at least with respect to the platforms that it matters - Linux/ChromeOS - the worker pool will grow indefinitely and thus inherently not in contention) > If not, I don't see what can preempt > your task from running? And since they're all using the same NSS lock, > parallelizing them also doesn't make much sense? We could sequence all NSS tasks > to make sure to avoid this problem? While I grossly oversimplified with my remarks about NSS lock, it's definitely the case that there are many in play here, and because it also involves third-party code (the smartcard), there many be non-NSS locks also involved (on Windows, we've seen this with respect to GDI resources) > Worse comes to worst: base::Thread::Options now has a |joinable| field, could > this simply use its own non-joinable thread? Maybe. We (//net) explored that around 6 years ago, both through base::Thread and base::PlatformThread. At the time, the issue was significant parts of //base had dependencies on the base::AtExitManager (logging & message passing were the two I'm sure of), through the use of (non-leaky) Singletons and LazyInstances. The consequence of this meant that even though we explicitly weren't joining the thread (such as through forced leaks), when the base::AtExitManager shutdown, it was corrupting those structures and causing shutdown crashes. Obviously, a lot has changed (and been rewritten), and these dependencies may no longer exist. Or people replaced Singleton/LazyInstances with ::Leaky variations, and base::AtExitManager can (maybe, finally) die in a fire. But that's part of the context on why we ended up using a base::SequencedWorkerPool (in the case of NSS sockets, since removed) for the socket operations - as the socket operations were at least sequencable on a single thread by virtue of the fact that they were already sequenced on the IO thread - and base::WorkerPool for the non-socket NSS interactions (which itself was a result of originally using the same pool for both the socket and non-socket operations, but then finding out that the socket operations were deadlocking on the non-socket operations, due to the introduction of the GPU process, which depended on the IO thread) Again, I want to stress, I'm not trying to defend it as a good design (the ways in which I hate third-party code has no end, but especially the threading implications of NSS), and if the feeling is that this is an entirely unreasonable (or otherwise significant) code cost for y'all on the scheduler side to take care of, then I want to try to find how we can make it easier. We're already working to make it easier for //net folks to understand (largely by excising NSS), but knowing that it's a bit of a delicate system re: threading, my goal in pushing back is mostly to make sure that in the process of replacing the jet engines while we're flying, people don't start trying to replace the landing gear as well - since we'll need that if we crash... OK, terrible metaphor. Put differently, I'm trying to minimize the scope of change/behaviour difference as much as possible, so that we don't have to firedrill if it blows up, rather than continuing work on replacing it so we no longer have to worry :)
Thanks for the extra context. The issues with base::Thread you mention are no more: non-joinable base::Thread works and unless your own Run() implementation uses things that might go away during shutdown it'll be fine (and if it somehow doesn't that's a P0 I'm happy to own). So that's at least one option. But I'm still unsure why your task must run right away. I understand the reentrancy problem but (1) you only ever post one of these tasks at a time (as highlighted in comment below) and (2) the reentrancy problem is only present when the task is running, not while it's in the queue, so why is it more important to execute it right away (with respect to all other Chromium tasks)? As for "participating in Lucky Luke prioritization and reordering" I'm not sure what you mean, but running on any thread is participating in some kind of prioritization (i.e. Lucky Luke or the kernel's) and reordering wise you're posting in parallel to WorkerPool so you have no ordering guarantees already..? If your task is crucial to respond to user queries then you mark it with TaskPriority::USER_BLOCKING and it will run shortly (most likely sooner than if you created a new thread in the vast majority of cases). https://codereview.chromium.org/2610463002/diff/1/crypto/nss_util.cc File crypto/nss_util.cc (right): https://codereview.chromium.org/2610463002/diff/1/crypto/nss_util.cc#newcode346 crypto/nss_util.cc:346: DCHECK(!initializing_tpm_token_); According to |initializing_tpm_token_| I'm led to believe you only ever have at most one pending InitializeTPMTokenOnWorkerThread() task.
davidben@chromium.org changed reviewers: - davidben@chromium.org
On 2017/01/12 15:44:50, gab wrote: > But I'm still unsure why your task must run right away. I understand the > reentrancy problem but (1) you only ever post one of these tasks at a time (as > highlighted in comment below) and (2) the reentrancy problem is only present > when the task is running, not while it's in the queue, so why is it more > important to execute it right away (with respect to all other Chromium tasks)? Ah, perhaps the disconnect is I'm discussing any NSS-using functions being dispatched to the worker pool if they cause the issue. That is, intentionally, rather than making a giant omnibus object that tried to manage all NSS interactions, individual places that run into this issue use the base::WorkerPool. That is, we treat it like an OS library that should be 'understood' (for some value of understood), as it is part of the LSB, and instead treat it at the callsites. Put differently, the task here is not just sequenced with initializing the TPM - it's sequenced with all other NSS tasks that need to be sequenced like this. > As for "participating in Lucky Luke prioritization and reordering" I'm not sure > what you mean, but running on any thread is participating in some kind of > prioritization (i.e. Lucky Luke or the kernel's) and reordering wise you're > posting in parallel to WorkerPool so you have no ordering guarantees already..? It seems my attempts to simplify the explanation (since I think we're largely just not on the same page yet) may be adding to the conflict. Here again, the concept I was trying to communicate is "We need this task to run, and never be sequenced against/waiting on any of the other NSS tasks that have been posted". As I understand Lucky Luke, independent sequences may be reordered or deprioritized, while tasks within a sequence may block on an earlier task completing. Both are behaviours we don't want here. > If your task is crucial to respond to user queries then you mark it with > TaskPriority::USER_BLOCKING and it will run shortly (most likely sooner than if > you created a new thread in the vast majority of cases). I'm afraid I'm not communicating well, because that's both not the issue here and still has the major downsides I've been talking about.
On 2017/01/12 19:11:46, Ryan Sleevi wrote: > On 2017/01/12 15:44:50, gab wrote: > > But I'm still unsure why your task must run right away. I understand the > > reentrancy problem but (1) you only ever post one of these tasks at a time (as > > highlighted in comment below) and (2) the reentrancy problem is only present > > when the task is running, not while it's in the queue, so why is it more > > important to execute it right away (with respect to all other Chromium tasks)? > > Ah, perhaps the disconnect is I'm discussing any NSS-using functions being > dispatched to the worker pool if they cause the issue. That is, intentionally, > rather than making a giant omnibus object that tried to manage all NSS > interactions, individual places that run into this issue use the > base::WorkerPool. That is, we treat it like an OS library that should be > 'understood' (for some value of understood), as it is part of the LSB, and > instead treat it at the callsites. > > Put differently, the task here is not just sequenced with initializing the TPM - > it's sequenced with all other NSS tasks that need to be sequenced like this. > > > As for "participating in Lucky Luke prioritization and reordering" I'm not > sure > > what you mean, but running on any thread is participating in some kind of > > prioritization (i.e. Lucky Luke or the kernel's) and reordering wise you're > > posting in parallel to WorkerPool so you have no ordering guarantees > already..? > > It seems my attempts to simplify the explanation (since I think we're largely > just not on the same page yet) may be adding to the conflict. Here again, the > concept I was trying to communicate is "We need this task to run, and never be > sequenced against/waiting on any of the other NSS tasks that have been posted". > > As I understand Lucky Luke, independent sequences may be reordered or > deprioritized, while tasks within a sequence may block on an earlier task > completing. Both are behaviours we don't want here. Ok, then I guess that, as much as I think we both dislike the current design, the only solution is to move WorkerPool (POSIX only) to //net once we're done migrating all other callsites to keep status quo for NSS tasks. We'll be happy to revisit this design with you later if you wish. For example, we could provide a TaskTraits that guarantees a new thread for tasks with it if all worker threads are currently busy (which is seldom the case so overall you'd avoid creating new threads in the majority of cases). We're not tempted to provide that just yet though by fear of abuse while the API is brand new. Thanks for your time going through this complicated issue. > > > If your task is crucial to respond to user queries then you mark it with > > TaskPriority::USER_BLOCKING and it will run shortly (most likely sooner than > if > > you created a new thread in the vast majority of cases). > > I'm afraid I'm not communicating well, because that's both not the issue here > and still has the major downsides I've been talking about. |