|
|
DescriptionUse TaskScheduler instead of WorkerPool in multi_threaded_cert_verifier.cc.
The following traits are used:
Priority: Inherited (default)
The priority is inherited from the calling context (i.e. TaskTraits
are initialized with the priority of the current task).
Shutdown behavior: CONTINUE_ON_SHUTDOWN
Tasks posted with this mode which have not started executing before
shutdown is initiated will never run. Tasks with this mode running at
shutdown will be ignored (the worker will not be joined).
Note: Tasks that were previously posted to base::WorkerPool should
use this shutdown behavior because this is how base::WorkerPool
handles all its tasks.
May Block:
Tasks posted with MayBlock() may block. This includes but is not
limited to tasks that wait on synchronous file I/O operations:
read or write a file from disk, interact with a pipe or a socket,
rename or delete a file, enumerate files in a directory, etc. This
trait isn't required for the mere use of locks.
BUG=659191
Patch Set 1 #Patch Set 2 : add ScopedTaskSchedulers #Patch Set 3 : add ScopedTaskSchedulers #Patch Set 4 : Fix Linux tests. #Patch Set 5 : Remove ScopedTaskScheduler from TokenBindingURLRequestTest #Patch Set 6 : self-review #
Total comments: 3
Messages
Total messages: 37 (28 generated)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
fdoray@chromium.org changed reviewers: + eroman@chromium.org
PTAL
eroman@chromium.org changed reviewers: + rsleevi@chromium.org
+rsleevi https://codereview.chromium.org/2602243002/diff/100001/net/quic/chromium/quic... File net/quic/chromium/quic_stream_factory_test.cc (right): https://codereview.chromium.org/2602243002/diff/100001/net/quic/chromium/quic... net/quic/chromium/quic_stream_factory_test.cc:4976: base::test::ScopedTaskScheduler scoped_task_scheduler( What is the recipe for including ScopedTaskScheduler throughout this changelist? It is flimsy to have this everywhere that transitively depends on the verifier code. Is this something that can be added generically to say NetTestSuite?
rsleevi@chromium.org changed reviewers: + gab@chromium.org
CC'ing gab. I started partially through, but I think overall this is probably something we don't want - this has many of the issues and concerns I raised to gab regarding //net and NSS usage. I feel awful for "Not LGTM"ing this though, because I know that's quite demoralizing. Perhaps it might be worthwhile to setup some time to go through possible places that you're looking at moving //net over, and we can discuss the risks/concerns, and then make sure that TaskScheduler is appropriate. More concretely: These calls may (and do!) synchronously block on - The UI thread - The IO thread - The browser thread - Threads created by third parties which may be blocked on other tasks in the TaskScheduler So this overall seems super-dangerous to convert, but I might be misunderstanding, so it seems useful to sync up, and that might help avoid work that ends up not being workable, or to figure out whether there are changes needed to the TaskScheduler to capture these. https://codereview.chromium.org/2602243002/diff/100001/net/cert_net/nss_ocsp_... File net/cert_net/nss_ocsp_unittest.cc (right): https://codereview.chromium.org/2602243002/diff/100001/net/cert_net/nss_ocsp_... net/cert_net/nss_ocsp_unittest.cc:128: // Required by MultiThreadedCertVerifier. Needs to be a This seems to be an undesirable coupling. It also seems to suggest that TaskScheduler is having the dependency issues that I highlighted (re: net threads and IO threads and worker threads). Perhaps we should setup another GVC to figure out how to look at this holistically?
On 2017/02/07 22:52:09, Ryan Sleevi wrote: > CC'ing gab. > > I started partially through, but I think overall this is probably something we > don't want - this has many of the issues and concerns I raised to gab regarding > //net and NSS usage. > > I feel awful for "Not LGTM"ing this though, because I know that's quite > demoralizing. Perhaps it might be worthwhile to setup some time to go through > possible places that you're looking at moving //net over, and we can discuss the > risks/concerns, and then make sure that TaskScheduler is appropriate. > > More concretely: These calls may (and do!) synchronously block on > - The UI thread > - The IO thread > - The browser thread > - Threads created by third parties which may be blocked on other tasks in the > TaskScheduler > > So this overall seems super-dangerous to convert, but I might be > misunderstanding, so it seems useful to sync up, and that might help avoid work > that ends up not being workable, or to figure out whether there are changes > needed to the TaskScheduler to capture these. That's unfortunate. //net is the only component in the codebase whose paradigms differ so wildly :(. I understand that this is in part due to being a component that interfaces with old OS interfaces across multiple platforms but it's sad nonetheless. As for blocking on our own threads this is something in our control, a better option might be to have the tasks registered on observers of the blocking task and only posted by it when it completes. Having an unbound amount of unowned WorkerPool threads is undesired. Let's stick to not migrating //net for now I guess (we'll soon be moving base::WorkerPool to net::WorkerPool...). Longer term the TaskScheduler might support self-monitoring and replacing blocked threads in its pool with new ones which I think should alleviate your concern (we're thinking of building this feature to help parallelize more file I/O but it happens to address this too). > https://codereview.chromium.org/2602243002/diff/100001/net/cert_net/nss_ocsp_... > File net/cert_net/nss_ocsp_unittest.cc (right): > > https://codereview.chromium.org/2602243002/diff/100001/net/cert_net/nss_ocsp_... > net/cert_net/nss_ocsp_unittest.cc:128: // Required by MultiThreadedCertVerifier. > Needs to be a > This seems to be an undesirable coupling. > > It also seems to suggest that TaskScheduler is having the dependency issues that > I highlighted (re: net threads and IO threads and worker threads). > > Perhaps we should setup another GVC to figure out how to look at this > holistically? https://codereview.chromium.org/2602243002/diff/100001/net/quic/chromium/quic... File net/quic/chromium/quic_stream_factory_test.cc (right): https://codereview.chromium.org/2602243002/diff/100001/net/quic/chromium/quic... net/quic/chromium/quic_stream_factory_test.cc:4976: base::test::ScopedTaskScheduler scoped_task_scheduler( On 2017/02/07 22:45:45, eroman (slow) wrote: > What is the recipe for including ScopedTaskScheduler throughout this changelist? > > It is flimsy to have this everywhere that transitively depends on the verifier > code. > > Is this something that can be added generically to say NetTestSuite? We could (although we'd need a way to ask for a ScopedAsyncTaskScheduler for the odd cases where that's required)
On 2017/02/08 16:01:42, gab wrote: > That's unfortunate. //net is the only component in the codebase whose paradigms > differ so wildly :(. I understand that this is in part due to being a component > that interfaces with old OS interfaces across multiple platforms but it's sad > nonetheless. Yes, it's not ideal, but it's unfortunately necessary. We interact with a large number of blocking (OS, third-party) APIs in ways that may be re-entrant with or conflict with calls to (OS, third-party) APIs elsewhere in the codebase. We can't and don't want to centralize every interaction with the (OS, third-party) APIs and manage interactions ourselves, since that's an impossible task, given that the (OS, third-party) APIs explicitly don't provide guarantees about the behaviours as implemented or can change in an OS release. > As for blocking on our own threads this is something in our control, a better > option might be to have the tasks registered on observers of the blocking task > and only posted by it when it completes. Having an unbound amount of unowned > WorkerPool threads is undesired. I'm not sure I understand your remark as observers; I read it to suggest that we know when the activity of Task A may block on Task B, but that's effectively the same as what I was describing above (as the impossible task). However, I very well may have misunderstood. Can you explain your concerns regarding an unbounded amount of unowned worker threads? As currently implemented, these threads are reaped when idle (or were, perhaps that implementation has changed or I'm misremembering), but grow for the task. Many of these (OS, third-party) APIs are themselves blocking on IO events (network or hardware), and thus their contention from both CPU and task switching overhead is minimal (compared to, say, an unbounded number of audio processing threads). I can understand why it might be unappealing on pure design aspects, but I was curious if there was a pragmatic or measured reason why it's undesirable. > Longer term the TaskScheduler might support self-monitoring and replacing > blocked threads in its pool with new ones which I think should alleviate your > concern (we're thinking of building this feature to help parallelize more file > I/O but it happens to address this too). Indeed, that might
On 2017/02/08 16:01:42, gab wrote: > That's unfortunate. //net is the only component in the codebase whose paradigms > differ so wildly :(. I understand that this is in part due to being a component > that interfaces with old OS interfaces across multiple platforms but it's sad > nonetheless. Yes, it's not ideal, but it's unfortunately necessary. We interact with a large number of blocking (OS, third-party) APIs in ways that may be re-entrant with or conflict with calls to (OS, third-party) APIs elsewhere in the codebase. We can't and don't want to centralize every interaction with the (OS, third-party) APIs and manage interactions ourselves, since that's an impossible task, given that the (OS, third-party) APIs explicitly don't provide guarantees about the behaviours as implemented or can change in an OS release. > As for blocking on our own threads this is something in our control, a better > option might be to have the tasks registered on observers of the blocking task > and only posted by it when it completes. Having an unbound amount of unowned > WorkerPool threads is undesired. I'm not sure I understand your remark as observers; I read it to suggest that we know when the activity of Task A may block on Task B, but that's effectively the same as what I was describing above (as the impossible task). However, I very well may have misunderstood. Can you explain your concerns regarding an unbounded amount of unowned worker threads? As currently implemented, these threads are reaped when idle (or were, perhaps that implementation has changed or I'm misremembering), but grow for the task. Many of these (OS, third-party) APIs are themselves blocking on IO events (network or hardware), and thus their contention from both CPU and task switching overhead is minimal (compared to, say, an unbounded number of audio processing threads). I can understand why it might be unappealing on pure design aspects, but I was curious if there was a pragmatic or measured reason why it's undesirable. > Longer term the TaskScheduler might support self-monitoring and replacing > blocked threads in its pool with new ones which I think should alleviate your > concern (we're thinking of building this feature to help parallelize more file > I/O but it happens to address this too). Indeed, that might
On 2017/02/08 16:09:21, Ryan Sleevi wrote: > On 2017/02/08 16:01:42, gab wrote: > > That's unfortunate. //net is the only component in the codebase whose > paradigms > > differ so wildly :(. I understand that this is in part due to being a > component > > that interfaces with old OS interfaces across multiple platforms but it's sad > > nonetheless. > > Yes, it's not ideal, but it's unfortunately necessary. We interact with a large > number of blocking (OS, third-party) APIs in ways that may be re-entrant with or > conflict with calls to (OS, third-party) APIs elsewhere in the codebase. We > can't and don't want to centralize every interaction with the (OS, third-party) > APIs and manage interactions ourselves, since that's an impossible task, given > that the (OS, third-party) APIs explicitly don't provide guarantees about the > behaviours as implemented or can change in an OS release. > > > As for blocking on our own threads this is something in our control, a better > > option might be to have the tasks registered on observers of the blocking task > > and only posted by it when it completes. Having an unbound amount of unowned > > WorkerPool threads is undesired. > > I'm not sure I understand your remark as observers; I read it to suggest that we > know when the activity of Task A may block on Task B, but that's effectively the > same as what I was describing above (as the impossible task). However, I very > well may have misunderstood. It seems the need for ScopedAsyncTaskScheduler in this CL was caused by a task blocking on a WaitableEvent to be signaled. Whoever signals the event could instead notify observers which are responsible for dispatching the work (i.e. dispatch the work when ready instead of ahead of time + blocking on dependent result). > > Can you explain your concerns regarding an unbounded amount of unowned worker > threads? As currently implemented, these threads are reaped when idle (or were, > perhaps that implementation has changed or I'm misremembering), but grow for the > task. Many of these (OS, third-party) APIs are themselves blocking on IO events > (network or hardware), and thus their contention from both CPU and task > switching overhead is minimal (compared to, say, an unbounded number of audio > processing threads). I can understand why it might be unappealing on pure design > aspects, but I was curious if there was a pragmatic or measured reason why it's > undesirable. Each task posted to WorkerPool may create a new thread (will if no thread is available). It is thus possible to end up with many worker threads (and I assume this happens given your concern that it's possible to flood all of TaskScheduler's threads). This is even more likely under high resource contention as each thread is less likely to complete its work and each task (new thread) merely adds more contention... Piling up work for the OS while under contention is one of the core things TaskScheduler is aiming to solve (by owning the tasks and underlying threads). I understand that in your case the work is mostly blocking so each thread is very low overhead (and that one-thread-per-blocking-async-task is a common paradigm on POSIX). But TaskScheduler aims to unify the place where such implementation details decisions are taken and implemented (e.g. one thread per blocking task on POSIX). There were previously many users of base::WorkerPool and other APIs hacking their own way at what they wanted from the system (e.g. v8 posting |num_core| computing tasks to base::WorkerPool and using those as "workers" precisely to avoid flooding with |num_tasks| threads). We should decide what's best on a given platform, implement it in one place (TaskScheduler) and have every component benefit from it. > > > Longer term the TaskScheduler might support self-monitoring and replacing > > blocked threads in its pool with new ones which I think should alleviate your > > concern (we're thinking of building this feature to help parallelize more file > > I/O but it happens to address this too). > > Indeed, that might
On 2017/02/08 16:45:31, gab wrote: > It seems the need for ScopedAsyncTaskScheduler in this CL was caused by a task > blocking on a WaitableEvent to be signaled. Whoever signals the event could > instead notify observers which are responsible for dispatching the work (i.e. > dispatch the work when ready instead of ahead of time + blocking on dependent > result). Ah, ok. Perhaps you may not have looked closely (hopefully that doesn't come off negative! I do it all the time in reviews), or perhaps the code was underdocumented (if so, we can fix that! Let me know :D), but the reason the code blocks on a WaitableEvent is that we're implementing a third-party library delegate interface (that is, 3P library calls into us when it needs us to perform some work), and the 3P library requires that we MUST complete that work synchronously. Since the work to be completed synchronously is "Perform a network request", and since we don't and can't block the IO thread for that, that's part of why the call *into* the 3P library (which will then call back into us) is done on a worker thread; in effect, the worker thread MUST block on the IO thread, because of the 3P API contract, and the IO thread MUST NOT block on the 3P API call, which is why the worker thread :) > I understand that in your case the work is mostly blocking so each thread is > very low overhead (and that one-thread-per-blocking-async-task is a common > paradigm on POSIX). But TaskScheduler aims to unify the place where such > implementation details decisions are taken and implemented (e.g. one thread per > blocking task on POSIX). There were previously many users of base::WorkerPool > and other APIs hacking their own way at what they wanted from the system (e.g. > v8 posting |num_core| computing tasks to base::WorkerPool and using those as > "workers" precisely to avoid flooding with |num_tasks| threads). We should > decide what's best on a given platform, implement it in one place > (TaskScheduler) and have every component benefit from it. Right, and for some cases (e.g. calling a blocking system API), I totally agree that having a centralized place to manage those growth and scaling and distribution decisions is not a bad outcome. It may be that some of the disconnect comes from the fact that most of the WorkerPool usage in //net (at least related to certificates or crypto) is coming from the fact that we have a sequence of calls such as Chromium -> 3P library blocking API call But we also have 3P library -> Chromium implementing a 'blocking' method as part of the 3P API That is, NSS provides hooks for UI decisions (which must be implemented using a blocking/synchronous method), and for network fetching (which must be implemented using a blocking/synchronous method). Unlike interactions with other 3P libraries, like, say, glib, NSS does not have any notion of message loop or task queue, so there's no way we can indicate an asynchronous completion. It used to be more substantial, but our shift to BoringSSL has allowed us to make (at least, for other platforms) these interaction points async methods; that is, when BoringSSL calls into Chromium, we can say "We'll get back to you later", which allows BoringSSL's state machine to pause until we (Chromium) complete the work _and then signal_ to BoringSSL that the work has completed. I don't know how well I communicated that pattern/paradigm in the past, but since you mentioned observers above, I suspect I didn't do so cleanly. There's also a few interaction points we have on Windows, but luckily, none of the interfaces we're implementing would require us to block, so we're able to just synchronously complete the work. On macOS, we largely just suffer from the "Chromium -> 3P library blocking API call" pattern (meaning we might be able to use TaskScheduler there), except that the way it's implemented varies across our supported macOS releases with respect to how the library internally implements its task queue, so we can't quite generalize it safely. Does that help document why the unfortunate sadness?
On 2017/02/08 16:56:33, Ryan Sleevi wrote: > On 2017/02/08 16:45:31, gab wrote: > > It seems the need for ScopedAsyncTaskScheduler in this CL was caused by a task > > blocking on a WaitableEvent to be signaled. Whoever signals the event could > > instead notify observers which are responsible for dispatching the work (i.e. > > dispatch the work when ready instead of ahead of time + blocking on dependent > > result). > > Ah, ok. Perhaps you may not have looked closely (hopefully that doesn't come off > negative! I do it all the time in reviews), or perhaps the code was > underdocumented (if so, we can fix that! Let me know :D), but the reason the > code blocks on a WaitableEvent is that we're implementing a third-party library > delegate interface (that is, 3P library calls into us when it needs us to > perform some work), and the 3P library requires that we MUST complete that work > synchronously. > > Since the work to be completed synchronously is "Perform a network request", and > since we don't and can't block the IO thread for that, that's part of why the > call *into* the 3P library (which will then call back into us) is done on a > worker thread; in effect, the worker thread MUST block on the IO thread, because > of the 3P API contract, and the IO thread MUST NOT block on the 3P API call, > which is why the worker thread :) > > > I understand that in your case the work is mostly blocking so each thread is > > very low overhead (and that one-thread-per-blocking-async-task is a common > > paradigm on POSIX). But TaskScheduler aims to unify the place where such > > implementation details decisions are taken and implemented (e.g. one thread > per > > blocking task on POSIX). There were previously many users of base::WorkerPool > > and other APIs hacking their own way at what they wanted from the system (e.g. > > v8 posting |num_core| computing tasks to base::WorkerPool and using those as > > "workers" precisely to avoid flooding with |num_tasks| threads). We should > > decide what's best on a given platform, implement it in one place > > (TaskScheduler) and have every component benefit from it. > > Right, and for some cases (e.g. calling a blocking system API), I totally agree > that having a centralized place to manage those growth and scaling and > distribution decisions is not a bad outcome. It may be that some of the > disconnect comes from the fact that most of the WorkerPool usage in //net (at > least related to certificates or crypto) is coming from the fact that we have a > sequence of calls such as > > Chromium -> 3P library blocking API call > > But we also have > > 3P library -> Chromium implementing a 'blocking' method as part of the 3P API > > That is, NSS provides hooks for UI decisions (which must be implemented using a > blocking/synchronous method), and for network fetching (which must be > implemented using a blocking/synchronous method). Unlike interactions with other > 3P libraries, like, say, glib, NSS does not have any notion of message loop or > task queue, so there's no way we can indicate an asynchronous completion. > > It used to be more substantial, but our shift to BoringSSL has allowed us to > make (at least, for other platforms) these interaction points async methods; > that is, when BoringSSL calls into Chromium, we can say "We'll get back to you > later", which allows BoringSSL's state machine to pause until we (Chromium) > complete the work _and then signal_ to BoringSSL that the work has completed. > > I don't know how well I communicated that pattern/paradigm in the past, but > since you mentioned observers above, I suspect I didn't do so cleanly. > > There's also a few interaction points we have on Windows, but luckily, none of > the interfaces we're implementing would require us to block, so we're able to > just synchronously complete the work. On macOS, we largely just suffer from the > "Chromium -> 3P library blocking API call" pattern (meaning we might be able to > use TaskScheduler there), except that the way it's implemented varies across our > supported macOS releases with respect to how the library internally implements > its task queue, so we can't quite generalize it safely. > > Does that help document why the unfortunate sadness? Aaaaaaah, ok, I think I finally understand your use case :) (I had indeed not read the code closely enough) This is very specific to //net per having to deal with such third-party APIs and isn't an issue anywhere else in our codebase. Let's move base::WorkerPool to net::WorkerPool when //net becomes its only user and we can eventually discuss how TaskScheduler could support these use cases as well. Thanks, Gab
Description was changed from ========== Use TaskScheduler instead of WorkerPool in multi_threaded_cert_verifier.cc. The following traits are used: Priority: Inherited (default) The priority is inherited from the calling context (i.e. TaskTraits are initialized with the priority of the current task). Shutdown behavior: CONTINUE_ON_SHUTDOWN Tasks posted with this mode which have not started executing before shutdown is initiated will never run. Tasks with this mode running at shutdown will be ignored (the worker will not be joined). Note: Tasks that were previously posted to base::WorkerPool should use this shutdown behavior because this is how base::WorkerPool handles all its tasks. May Block: Tasks posted with MayBlock() may block. This includes but is not limited to tasks that wait on synchronous file I/O operations: read or write a file from disk, interact with a pipe or a socket, rename or delete a file, enumerate files in a directory, etc. This trait isn't required for the mere use of locks. BUG=659191 ========== to ========== Use TaskScheduler instead of WorkerPool in multi_threaded_cert_verifier.cc. The following traits are used: Priority: Inherited (default) The priority is inherited from the calling context (i.e. TaskTraits are initialized with the priority of the current task). Shutdown behavior: CONTINUE_ON_SHUTDOWN Tasks posted with this mode which have not started executing before shutdown is initiated will never run. Tasks with this mode running at shutdown will be ignored (the worker will not be joined). Note: Tasks that were previously posted to base::WorkerPool should use this shutdown behavior because this is how base::WorkerPool handles all its tasks. May Block: Tasks posted with MayBlock() may block. This includes but is not limited to tasks that wait on synchronous file I/O operations: read or write a file from disk, interact with a pipe or a socket, rename or delete a file, enumerate files in a directory, etc. This trait isn't required for the mere use of locks. BUG=659191 ========== |