|
|
Created:
6 years ago by loislo Modified:
5 years, 10 months ago CC:
chromium-reviews, wfh+watch_chromium.org, dsinclair+watch_chromium.org, erikwright+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionDevTools: Parallelize trace messages serialization.
Move serialization into a worker thread. As a result
IO thread will be able to send messages to the browser.
The original implementation did serialization on IO thread
and was not able to send the messages because ipc had
is_blocked_on_write_ = true and had no chance to check
the actual state of the channel. So the messages were
collected in output_queue. Also the messages could be quite
big and could block the IO thread for a long time.
BUG=
Committed: https://crrev.com/d58e06aafd6572e681f9d30f313bf5393db9f2bc
Cr-Commit-Position: refs/heads/master@{#308773}
Committed: https://crrev.com/e7d8d8b60e331469a7af4936e9fcc16630fb8fdf
Cr-Commit-Position: refs/heads/master@{#315957}
Patch Set 1 #
Total comments: 6
Patch Set 2 : comments addressed #
Total comments: 2
Patch Set 3 : comments addressed #Patch Set 4 : tests were fixed #
Total comments: 4
Patch Set 5 : many unittests did fail because their callbacks were called on the worker thread. #Patch Set 6 : minor change #Patch Set 7 : unnecessary changes were removed #Patch Set 8 : rebased #
Messages
Total messages: 45 (14 generated)
loislo@chromium.org changed reviewers: + caseq@chromium.org, nduca@chromium.org, pfeldman@chromium.org, yurys@chromium.org
ptal
dsinclair@chromium.org changed reviewers: + dsinclair@chromium.org, skyostil@google.com, wangxianzhu@chromium.org
+wangxianzhu, +skyostil who did a lot of work on trace buffer locking.
skyostil@chromium.org changed reviewers: + skyostil@chromium.org
Is the motivation here to make a tab more responsive while it is flushing trace events? I don't think I've ever seen it as a problem, but if we can improve it without too much additional complexity I'm all for it. https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.cc File base/debug/trace_event_impl.cc (right): https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.... base/debug/trace_event_impl.cc:1789: async_task_started = WorkerPool::PostTask( Is there any reason to expect this to fail in practice? That is, should we just DCHECK that it worked? https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.h File base/debug/trace_event_impl.h (right): https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.... base/debug/trace_event_impl.h:752: scoped_ptr<TraceBuffer> previous_logged_events_; I'm not sure I understand why this should become a member? Couldn't we just pass the events around like we did before, i.e., through the TraceLog::ConvertTraceEventsToTraceFormat bind closure?
I recorded timeline for a few seconds of scrolling with pictures and layers. As result 10% of tracing buffer took about 440Mb of tracing data. (number of compressed PNG) I used http://lenta.ru as a test site and did scroll by pressing scroll down button on the scrollbar. After pressing stop we start the events serialization process. Right now it takes about 15 seconds of the serialization in ConvertTraceEventsToTraceFormat. When it finishes the IPC code gets a chance to send all these ipc messages that were accumulated in output_queue_ to the browser process. Due to another problem in DevTools code the browser process spends 11 seconds for processing these messages. See https://codereview.chromium.org/784513002/. And finally DevTools renderer does its own part of work. As a result the relatively short timeline session needs about 30 seconds for post processing. And it is almost impossible to use the browser at the time. The situation gets much worse when trace log reaches 50% of trace buffer. In this case we easily get more than 2Gb of trace data and the time increases proportionally. With the patch I see a much better picture. Renderer Browser and DevTools processes eats 100% cpu each and finish the work twice faster. Of course there are other ways for optimization but this one looks simple. https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.cc File base/debug/trace_event_impl.cc (right): https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.... base/debug/trace_event_impl.cc:1789: async_task_started = WorkerPool::PostTask( On 2014/12/05 16:33:01, Sami wrote: > Is there any reason to expect this to fail in practice? That is, should we just > DCHECK that it worked? As far as I see it has different behavior on the POSIX and non POSIX platforms. See https://code.google.com/p/chromium/codesearch#chromium/src/net/dns/serial_wor... https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.h File base/debug/trace_event_impl.h (right): https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.... base/debug/trace_event_impl.h:752: scoped_ptr<TraceBuffer> previous_logged_events_; On 2014/12/05 16:33:01, Sami wrote: > I'm not sure I understand why this should become a member? Couldn't we just pass > the events around like we did before, i.e., through the > TraceLog::ConvertTraceEventsToTraceFormat bind closure? The compilation fails with error deeply in Bind templates because scoped_ptr has to transfer the ownership. ../base/memory/scoped_ptr.h:312:51: note: candidate constructor not viable: 1st argument ('const scoped_ptr<TraceBuffer, base::DefaultDeleter<TraceBuffer> >') would lose const qualifier MOVE_ONLY_TYPE_WITH_MOVE_CONSTRUCTOR_FOR_CPP_03(scoped_ptr)
Thanks for the benchmark numbers. I'm guessing that was on some kind of desktop PC? https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.cc File base/debug/trace_event_impl.cc (right): https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.... base/debug/trace_event_impl.cc:1789: async_task_started = WorkerPool::PostTask( On 2014/12/05 21:38:45, loislo wrote: > On 2014/12/05 16:33:01, Sami wrote: > > Is there any reason to expect this to fail in practice? That is, should we > just > > DCHECK that it worked? > > As far as I see it has different behavior on the POSIX and non POSIX platforms. > See > https://code.google.com/p/chromium/codesearch#chromium/src/net/dns/serial_wor... Interesting. Looks like QueueUserWorkItem can fail on Windows for unspecified reasons. I guess we need to be prepared for that then. https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.h File base/debug/trace_event_impl.h (right): https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.... base/debug/trace_event_impl.h:752: scoped_ptr<TraceBuffer> previous_logged_events_; On 2014/12/05 21:38:45, loislo wrote: > On 2014/12/05 16:33:01, Sami wrote: > > I'm not sure I understand why this should become a member? Couldn't we just > pass > > the events around like we did before, i.e., through the > > TraceLog::ConvertTraceEventsToTraceFormat bind closure? > > The compilation fails with error deeply in Bind templates because scoped_ptr has > to transfer the ownership. > > ../base/memory/scoped_ptr.h:312:51: note: candidate constructor not viable: 1st > argument ('const scoped_ptr<TraceBuffer, base::DefaultDeleter<TraceBuffer> >') > would lose const qualifier > MOVE_ONLY_TYPE_WITH_MOVE_CONSTRUCTOR_FOR_CPP_03(scoped_ptr) You can use Passed() to have the bind transfer ownership: https://code.google.com/p/chromium/codesearch#chromium/src/base/bind_helpers....
On 2014/12/08 14:35:36, Sami wrote: > Thanks for the benchmark numbers. I'm guessing that was on some kind of desktop > PC? yep. It was Z600 corp linux. > > https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.cc > File base/debug/trace_event_impl.cc (right): > > https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.... > base/debug/trace_event_impl.cc:1789: async_task_started = WorkerPool::PostTask( > On 2014/12/05 21:38:45, loislo wrote: > > On 2014/12/05 16:33:01, Sami wrote: > > > Is there any reason to expect this to fail in practice? That is, should we > > just > > > DCHECK that it worked? > > > > As far as I see it has different behavior on the POSIX and non POSIX > platforms. > > See > > > https://code.google.com/p/chromium/codesearch#chromium/src/net/dns/serial_wor... > > Interesting. Looks like QueueUserWorkItem can fail on Windows for unspecified > reasons. I guess we need to be prepared for that then. > > https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.h > File base/debug/trace_event_impl.h (right): > > https://codereview.chromium.org/750183008/diff/1/base/debug/trace_event_impl.... > base/debug/trace_event_impl.h:752: scoped_ptr<TraceBuffer> > previous_logged_events_; > On 2014/12/05 21:38:45, loislo wrote: > > On 2014/12/05 16:33:01, Sami wrote: > > > I'm not sure I understand why this should become a member? Couldn't we just > > pass > > > the events around like we did before, i.e., through the > > > TraceLog::ConvertTraceEventsToTraceFormat bind closure? > > > > The compilation fails with error deeply in Bind templates because scoped_ptr > has > > to transfer the ownership. > > > > ../base/memory/scoped_ptr.h:312:51: note: candidate constructor not viable: > 1st > > argument ('const scoped_ptr<TraceBuffer, base::DefaultDeleter<TraceBuffer> >') > > would lose const qualifier > > MOVE_ONLY_TYPE_WITH_MOVE_CONSTRUCTOR_FOR_CPP_03(scoped_ptr) > > You can use Passed() to have the bind transfer ownership: > https://code.google.com/p/chromium/codesearch#chromium/src/base/bind_helpers.... done
Patchset #2 (id:20001) has been deleted
https://codereview.chromium.org/750183008/diff/40001/base/debug/trace_event_i... File base/debug/trace_event_impl.cc (right): https://codereview.chromium.org/750183008/diff/40001/base/debug/trace_event_i... base/debug/trace_event_impl.cc:1792: Bind(&TraceLog::ConvertTraceEventsToTraceFormat, Could you turn ConvertTraceEventsToTraceFormat into a static function to make it clearer that it doesn't need to access member variables and hence can be run on a different thread? Please also add a comment there mentioning that it can be on a different thread.
https://codereview.chromium.org/750183008/diff/40001/base/debug/trace_event_i... File base/debug/trace_event_impl.cc (right): https://codereview.chromium.org/750183008/diff/40001/base/debug/trace_event_i... base/debug/trace_event_impl.cc:1792: Bind(&TraceLog::ConvertTraceEventsToTraceFormat, On 2014/12/10 14:56:01, Sami wrote: > Could you turn ConvertTraceEventsToTraceFormat into a static function to make it > clearer that it doesn't need to access member variables and hence can be run on > a different thread? Please also add a comment there mentioning that it can be on > a different thread. Done.
On 2014/12/10 15:31:38, loislo wrote: > https://codereview.chromium.org/750183008/diff/40001/base/debug/trace_event_i... > File base/debug/trace_event_impl.cc (right): > > https://codereview.chromium.org/750183008/diff/40001/base/debug/trace_event_i... > base/debug/trace_event_impl.cc:1792: > Bind(&TraceLog::ConvertTraceEventsToTraceFormat, > On 2014/12/10 14:56:01, Sami wrote: > > Could you turn ConvertTraceEventsToTraceFormat into a static function to make > it > > clearer that it doesn't need to access member variables and hence can be run > on > > a different thread? Please also add a comment there mentioning that it can be > on > > a different thread. > > Done. PTAL
lgtm (not an owner).
The CQ bit was checked by dsinclair@chromium.org
lgtm
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/750183008/60001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios_dbg_simulator on tryserver.chromium.mac (http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator...)
The CQ bit was checked by loislo@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/750183008/80001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by loislo@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/750183008/80001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
https://codereview.chromium.org/750183008/diff/80001/base/debug/trace_event_i... File base/debug/trace_event_impl.cc (right): https://codereview.chromium.org/750183008/diff/80001/base/debug/trace_event_i... base/debug/trace_event_impl.cc:77: const size_t kTraceEventBufferSize = 10 * 1024 * 1024; Can you rename it to kTraceEventBufferSizeInBytes as in other places size refers to the number of events rather than bytes (e.g. kTraceBufferChunkSize)? https://codereview.chromium.org/750183008/diff/80001/base/debug/trace_event_i... base/debug/trace_event_impl.cc:1768: flush_output_callback.Run(json_events_str_ptr, has_more_events); Is it OK to run this callback on a worker thread?
https://codereview.chromium.org/750183008/diff/80001/base/debug/trace_event_i... File base/debug/trace_event_impl.cc (right): https://codereview.chromium.org/750183008/diff/80001/base/debug/trace_event_i... base/debug/trace_event_impl.cc:77: const size_t kTraceEventBufferSize = 10 * 1024 * 1024; On 2014/12/17 08:50:30, yurys wrote: > Can you rename it to kTraceEventBufferSizeInBytes as in other places size refers > to the number of events rather than bytes (e.g. kTraceBufferChunkSize)? Done. https://codereview.chromium.org/750183008/diff/80001/base/debug/trace_event_i... base/debug/trace_event_impl.cc:1768: flush_output_callback.Run(json_events_str_ptr, has_more_events); On 2014/12/17 08:50:30, yurys wrote: > Is it OK to run this callback on a worker thread? Done.
The CQ bit was checked by loislo@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/750183008/120001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by loislo@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/750183008/140001
Message was sent while issue was closed.
Committed patchset #7 (id:140001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/d58e06aafd6572e681f9d30f313bf5393db9f2bc Cr-Commit-Position: refs/heads/master@{#308773}
Message was sent while issue was closed.
A revert of this CL (patchset #7 id:140001) has been created in https://codereview.chromium.org/804083004/ by fmeawad@chromium.org. The reason for reverting is: Speculatively reverting this CL as it might be related to redness in the perf bots. The new crashes on the bots all had this CL in their range, and the crashes are all related to MemoryError while receiving trace data. (crbug.com/443398) Feel free to reland if the revert does not fix the issue..
The CQ bit was checked by loislo@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/750183008/160001
On 2014/12/19 21:01:02, fmeawad wrote: > A revert of this CL (patchset #7 id:140001) has been created in > https://codereview.chromium.org/804083004/ by mailto:fmeawad@chromium.org. > > The reason for reverting is: Speculatively reverting this CL as it might be > related to redness in the perf bots. > > The new crashes on the bots all had this CL in their range, and the crashes are > all related to MemoryError while receiving trace data. > > (crbug.com/443398) > > Feel free to reland if the revert does not fix the issue.. I'll reland it and in case of the problem with WinXP I'll revert it. Unfortunately I have no logs from the previous run.
Message was sent while issue was closed.
Committed patchset #8 (id:160001)
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/e7d8d8b60e331469a7af4936e9fcc16630fb8fdf Cr-Commit-Position: refs/heads/master@{#315957}
Message was sent while issue was closed.
A revert of this CL (patchset #8 id:160001) has been created in https://codereview.chromium.org/923693002/ by loislo@chromium.org. The reason for reverting is: looks like it breaks image_decoding_measurements http://build.chromium.org/p/chromium.perf/builders/Win%20XP%20Perf%20%283%29/....
Message was sent while issue was closed.
On 2015/02/12 20:10:46, loislo wrote: > A revert of this CL (patchset #8 id:160001) has been created in > https://codereview.chromium.org/923693002/ by mailto:loislo@chromium.org. > > The reason for reverting is: looks like it breaks image_decoding_measurements > http://build.chromium.org/p/chromium.perf/builders/Win%20XP%20Perf%20%283%29/.... Benchmark code can't deal with the flow of tracing data. C:\b\depot_tools\python276_bin\python.exe C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\run_benchmark -v --output-format=chartjson --upload-results inbox_benchmark.Inbox --browser=release --output-dir=c:\users\chrome~1\appdata\local\temp\tmpiox06u File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\user_story\user_story_runner.py", line 100, in _RunUserStoryAndProcessErrorIfNeeded state.RunUserStory(results) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\page\shared_page_state.py", line 242, in RunUserStory self._test.RunPage(self._current_page, self._current_tab, results) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\page\page_test.py", line 239, in RunPage self.ValidateAndMeasurePage(page, tab, results) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 274, in ValidateAndMeasurePage self._measurement.Measure(tracing_controller, results) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 240, in Measure trace_result = tracing_controller.Stop() File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\core\platform\tracing_controller.py", line 32, in Stop return self._tracing_controller_backend.Stop() File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\core\platform\tracing_controller_backend.py", line 68, in Stop agent.Stop(trace_data_builder) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\core\platform\tracing_agent\chrome_tracing_agent.py", line 97, in Stop devtools_client.StopChromeTracing(trace_data_builder) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\core\backends\chrome_inspector\devtools_client_backend.py", line 149, in StopChromeTracing return self._tracing_backend.StopTracing(trace_data_builder, timeout) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\core\backends\chrome_inspector\tracing_backend.py", line 88, in StopTracing self._inspector_websocket.DispatchNotificationsUntilDone(timeout) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\core\backends\chrome_inspector\inspector_websocket.py", line 137, in DispatchNotificationsUntilDone if self._Receive(timeout): File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\perf\..\telemetry\telemetry\core\backends\chrome_inspector\inspector_websocket.py", line 162, in _Receive data = self._socket.recv() File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\telemetry\third_party\websocket-client\websocket.py", line 596, in recv opcode, data = self.recv_data() File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\telemetry\third_party\websocket-client\websocket.py", line 606, in recv_data frame = self.recv_frame() File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\telemetry\third_party\websocket-client\websocket.py", line 661, in recv_frame payload = self._recv_strict(self._frame_length) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\telemetry\third_party\websocket-client\websocket.py", line 746, in _recv_strict bytes = self._recv(shortage) File "C:\b\build\slave\Win_7_Perf__5_\build\src\tools\telemetry\third_party\websocket-client\websocket.py", line 730, in _recv bytes = self.sock.recv(bufsize) MemoryError |