|
|
Description[chromedriver] Prevent GetLog from returning more than 100,000 at a time.
BUG=681892
Review-Url: https://codereview.chromium.org/2644033004
Cr-Commit-Position: refs/heads/master@{#445429}
Committed: https://chromium.googlesource.com/chromium/src/+/ea5734bd3270765de34fc7f37b64759fe1d1c1bd
Patch Set 1 #
Total comments: 9
Patch Set 2 : add comment #
Messages
Total messages: 28 (8 generated)
samuong@chromium.org changed reviewers: + stgao@chromium.org
See bug for context. I tried implementing an end-to-end test but it takes too long to run (~50s on my Z620) and I think it might time out on the bots. So we've just got a unit test instead, which doesn't exercise the HttpServer buffer, but it does test the change to GetAndClearEntries.
klm@google.com changed reviewers: + klm@google.com
Thanks for a quick turnaround and diligent unit testing! https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... File chrome/test/chromedriver/logging.cc (right): https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.cc:166: if (!entries_->Remove(0, &entry)) I wonder if there is a race between this and the event listener that adds events as they come in from the browser... This loop could be relatively long-running, increasing the chance of an actual race. Or is there an implicit sequencing between this method and the code paths that can add, e.g. because they always run on the same thread? If they may run on different threads, is ListValue thread safe? https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... File chrome/test/chromedriver/logging.h (right): https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.h:30: static const size_t kMaxReturnedEntries; Is this for testing only? I don't believe there is any use for actual callers. Is there an equivalent of e.g. @VisibleForTesting in Chromium C++, or some coding pattern for exposing a constant to tests only, without exposing it to real callers?
On 2017/01/19 19:39:59, samuong wrote: > See bug for context. I tried implementing an end-to-end test but it takes too > long to run (~50s on my Z620) and I think it might time out on the bots. So > we've just got a unit test instead, which doesn't exercise the HttpServer > buffer, but it does test the change to GetAndClearEntries. I wonder if there may be a better data structure to hold the events than a ListValue – maybe just a vector... Before, we've never done rapid-fire 100K adds/removes on a ListValue, and looks like the performance of it is not great – 50s is very long. OTOH even if we use a different structure, we would still have to ultimately return a ListValue, and a ListValue does not currently have a fast way of initializing from e.g. a vector. On the third hand, ListValue does have a vector append for strings, so maybe it would be straightforward to add a vector append of Values to ListValue. And then we could keep a vector here and convert it to a ListValue when we return it - constructed from a range of the first 100K removed from the vector.
ListValue is actually already backed by a std::vector (see the definition of base::ListValue::Storage here: https://cs.chromium.org/chromium/src/base/values.h?q=base::ListV&sq=package:c...). I think it's meant to represent a JSON list/array, more than a linked list. I don't really like the O(n) performance of this function either, but unfortunately ListValue doesn't have a constant-time method to split lists. But at least n has a maximum upper bound of 10k. Also, the 50s time was for the end-to-end test, which includes much more than just calls to the ListValue object. The unit test takes about 4s, so the other 46s is for calling JS console.error 100,000 times, and sending devtools console logging events for each call to chromedriver (and all the serializing and deserializing), etc. I think the performance tracing code is more efficient than this (i.e. it shouldn't take more than 46s?) but I didn't have a hermetic test page to use so I just used many calls to console.error instead. https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... File chrome/test/chromedriver/logging.cc (right): https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.cc:166: if (!entries_->Remove(0, &entry)) On 2017/01/19 20:47:30, klm wrote: > I wonder if there is a race between this and the event listener that adds events > as they come in from the browser... This loop could be relatively long-running, > increasing the chance of an actual race. Or is there an implicit sequencing > between this method and the code paths that can add, e.g. because they always > run on the same thread? If they may run on different threads, is ListValue > thread safe? I don't think there is a race condition - the WebDriverLogs are owned by the Session object, which is thread local. Shuotao can correct me if I'm wrong. https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... File chrome/test/chromedriver/logging.h (right): https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.h:30: static const size_t kMaxReturnedEntries; On 2017/01/19 20:47:30, klm wrote: > Is this for testing only? I don't believe there is any use for actual callers. > Is there an equivalent of e.g. @VisibleForTesting in Chromium C++, or some > coding pattern for exposing a constant to tests only, without exposing it to > real callers? We could make it private and then friend a test class, wdyt?
s/10k/100k
On 2017/01/19 21:15:37, samuong wrote: > ListValue is actually already backed by a std::vector (see the definition of > base::ListValue::Storage here: > https://cs.chromium.org/chromium/src/base/values.h?q=base::ListV&sq=package:c...). > I think it's meant to represent a JSON list/array, more than a linked list. Yes. And vector is not linked underneath either. I was just saying that if we 1) kept a plain std::vector<base::Value> here instead of a ListValue, 2) added a bulk append of values to a ListValue – AppendValues(const std::vector<base::Value>& in_values), then this op may speed up by orders of magnitude when it runs for 100K's of entries, and have no tangible performance change for a low number of entries. > > I don't really like the O(n) performance of this function either, but > unfortunately ListValue doesn't have a constant-time method to split lists. But > at least n has a maximum upper bound of 100k. > > Also, the 50s time was for the end-to-end test, which includes much more than > just calls to the ListValue object. The unit test takes about 4s, so the other > 46s is for calling JS console.error 100,000 times, and sending devtools console > logging events for each call to chromedriver (and all the serializing and > deserializing), etc. > > I think the performance tracing code is more efficient than this (i.e. it > shouldn't take more than 46s?) but I didn't have a hermetic test page to use so > I just used many calls to console.error instead. If the unit test takes 4s, then that's acceptable. You are not touching how trace events make their way into the log, so that's out of scope of this CL.
lgtm https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... File chrome/test/chromedriver/logging.h (right): https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.h:30: static const size_t kMaxReturnedEntries; On 2017/01/19 21:15:37, samuong wrote: > On 2017/01/19 20:47:30, klm wrote: > > Is this for testing only? I don't believe there is any use for actual callers. > > Is there an equivalent of e.g. @VisibleForTesting in Chromium C++, or some > > coding pattern for exposing a constant to tests only, without exposing it to > > real callers? > > We could make it private and then friend a test class, wdyt? I have no strong opinion, was just pointing out something that I didn't quite like:) Up to you and Shuotao what to do with it, if anything at all.
On 2017/01/19 21:26:00, klm wrote: > On 2017/01/19 21:15:37, samuong wrote: > > ListValue is actually already backed by a std::vector (see the definition of > > base::ListValue::Storage here: > > > https://cs.chromium.org/chromium/src/base/values.h?q=base::ListV&sq=package:c...). > > I think it's meant to represent a JSON list/array, more than a linked list. > > Yes. And vector is not linked underneath either. I was just saying that if we 1) > kept a plain std::vector<base::Value> here instead of a ListValue, 2) added a > bulk append of values to a ListValue – AppendValues(const > std::vector<base::Value>& in_values), then this op may speed up by orders of > magnitude when it runs for 100K's of entries, and have no tangible performance > change for a low number of entries. I think that would work, but unfortunately base::ListValue doesn't have a bulk AppendValues method, and I don't think we'd be able to add one, especially if its just for a workaround like this. An alternative would be to keep a std::vector<base::ListValue>, where each ListValue is capped at 100,000 log entries, then we would be fast in all cases. But again, I'm not sure it's worth the complexity; any effort beyond this CL should probably go into making net::HttpServer stream its responses. Perhaps we should revisit performance only if/when the latency lab finds that this is too slow for large traces?
On 2017/01/19 23:07:34, samuong wrote: > On 2017/01/19 21:26:00, klm wrote: > > On 2017/01/19 21:15:37, samuong wrote: > > > ListValue is actually already backed by a std::vector (see the definition of > > > base::ListValue::Storage here: > > > > > > https://cs.chromium.org/chromium/src/base/values.h?q=base::ListV&sq=package:c...). > > > I think it's meant to represent a JSON list/array, more than a linked list. > > > > Yes. And vector is not linked underneath either. I was just saying that if we > 1) > > kept a plain std::vector<base::Value> here instead of a ListValue, 2) added a > > bulk append of values to a ListValue – AppendValues(const > > std::vector<base::Value>& in_values), then this op may speed up by orders of > > magnitude when it runs for 100K's of entries, and have no tangible performance > > change for a low number of entries. > > I think that would work, but unfortunately base::ListValue doesn't have a bulk > AppendValues method, and I don't think we'd be able to add one, especially if > its just for a workaround like this. > > An alternative would be to keep a std::vector<base::ListValue>, where each > ListValue is capped at 100,000 log entries, then we would be fast in all cases. > But again, I'm not sure it's worth the complexity; any effort beyond this CL > should probably go into making net::HttpServer stream its responses. > > Perhaps we should revisit performance only if/when the latency lab finds that > this is too slow for large traces? SGTM.
Per the discussion in this CL, it lgtm https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... File chrome/test/chromedriver/logging.cc (right): https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.cc:164: for (size_t i = 0; i < kMaxReturnedEntries; i++) { Maybe leave a TODO to revisit whether it is possible to avoid a for loop like this by modifying std::vector-based ListValue. https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.cc:166: if (!entries_->Remove(0, &entry)) On 2017/01/19 21:15:36, samuong wrote: > On 2017/01/19 20:47:30, klm wrote: > > I wonder if there is a race between this and the event listener that adds > events > > as they come in from the browser... This loop could be relatively > long-running, > > increasing the chance of an actual race. Or is there an implicit sequencing > > between this method and the code paths that can add, e.g. because they always > > run on the same thread? If they may run on different threads, is ListValue > > thread safe? > > I don't think there is a race condition - the WebDriverLogs are owned by the > Session object, which is thread local. Shuotao can correct me if I'm wrong. IIRC, each session has its own thread.
https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... File chrome/test/chromedriver/logging.cc (right): https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.cc:164: for (size_t i = 0; i < kMaxReturnedEntries; i++) { On 2017/01/20 03:40:12, stgao (OOO until Feb 23) wrote: > Maybe leave a TODO to revisit whether it is possible to avoid a for loop like > this by modifying std::vector-based ListValue. Done. https://codereview.chromium.org/2644033004/diff/1/chrome/test/chromedriver/lo... chrome/test/chromedriver/logging.cc:166: if (!entries_->Remove(0, &entry)) On 2017/01/20 03:40:12, stgao (OOO until Feb 23) wrote: > On 2017/01/19 21:15:36, samuong wrote: > > On 2017/01/19 20:47:30, klm wrote: > > > I wonder if there is a race between this and the event listener that adds > > events > > > as they come in from the browser... This loop could be relatively > > long-running, > > > increasing the chance of an actual race. Or is there an implicit sequencing > > > between this method and the code paths that can add, e.g. because they > always > > > run on the same thread? If they may run on different threads, is ListValue > > > thread safe? > > > > I don't think there is a race condition - the WebDriverLogs are owned by the > > Session object, which is thread local. Shuotao can correct me if I'm wrong. > > IIRC, each session has its own thread. Done.
The CQ bit was checked by samuong@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from stgao@chromium.org, klm@google.com Link to the patchset: https://codereview.chromium.org/2644033004/#ps20001 (title: "add comment")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Sam, just making sure: 1. Have you actually tried my repro example with your fix, and it worked? 2. Would you mind hack-logging the buffer size for the boston.com run? No need to submit that, but just to see how large it is for 100K log entries, to make sure we're not getting too close to the 100M threshold. Thanks again for a quick turnaround!
The CQ bit was unchecked by samuong@chromium.org
It works when I run it against www.boston.com. The JSON object for the first call to GetLog is about 38 MB. However, I am seeing some entries with "Chrome's trace buffer filled while collecting events, so some trace events may have been lost", so this would need to be looked at as well. Can you let me know if you think this needs to be looked into before landing this CL?
On 2017/01/23 18:44:52, samuong wrote: > It works when I run it against http://www.boston.com. The JSON object for the first > call to GetLog is about 38 MB. Great, thanks! I think this is good to go then. > > However, I am seeing some entries with "Chrome's trace buffer filled while > collecting events, so some trace events may have been lost", so this would need > to be looked at as well. Can you let me know if you think this needs to be > looked into before landing this CL? That's a separate issue. Chrome collects trace events in a ring buffer in RAM. IIRC we ask Chrome to dump the collected events on each WD command (completely forgot how exactly it's hooked up in chromedriver, it was a couple of years ago, and then there were some refactorings:), but for large pages that's not enough – the /url alone apparently exceeds the ring buffer capacity. I vaguely recall that there may have been a new way to get the trace events more dynamically, e.g. whenever the ring buffer overflows or some such, but I no longer remember the details. Definitely separate from this CL, but would be very good to investigate and fix.
On 2017/01/23 18:57:00, klm wrote: > On 2017/01/23 18:44:52, samuong wrote: > > It works when I run it against http://www.boston.com. The JSON object for the > first > > call to GetLog is about 38 MB. > > Great, thanks! I think this is good to go then. > > > > > However, I am seeing some entries with "Chrome's trace buffer filled while > > collecting events, so some trace events may have been lost", so this would > need > > to be looked at as well. Can you let me know if you think this needs to be > > looked into before landing this CL? > > That's a separate issue. Chrome collects trace events in a ring buffer in RAM. > IIRC we ask Chrome to dump the collected events on each WD command (completely > forgot how exactly it's hooked up in chromedriver, it was a couple of years ago, > and then there were some refactorings:), but for large pages that's not enough > – the /url alone apparently exceeds the ring buffer capacity. > > I vaguely recall that there may have been a new way to get the trace events more > dynamically, e.g. whenever the ring buffer overflows or some such, but I no > longer remember the details. > > Definitely separate from this CL, but would be very good to investigate and fix. Cool, that's what I thought too, just wanted to check.
The CQ bit was checked by samuong@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 20001, "attempt_start_ts": 1485198452148620, "parent_rev": "8f4bf2e0d584f7b56059629037d714bf9d4b5abf", "commit_rev": "ea5734bd3270765de34fc7f37b64759fe1d1c1bd"}
Message was sent while issue was closed.
Description was changed from ========== [chromedriver] Prevent GetLog from returning more than 100,000 at a time. BUG=681892 ========== to ========== [chromedriver] Prevent GetLog from returning more than 100,000 at a time. BUG=681892 Review-Url: https://codereview.chromium.org/2644033004 Cr-Commit-Position: refs/heads/master@{#445429} Committed: https://chromium.googlesource.com/chromium/src/+/ea5734bd3270765de34fc7f37b64... ==========
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as https://chromium.googlesource.com/chromium/src/+/ea5734bd3270765de34fc7f37b64...
Message was sent while issue was closed.
A revert of this CL (patchset #2 id:20001) has been created in https://codereview.chromium.org/2655503002/ by waffles@chromium.org. The reason for reverting is: Consistently causes test timeouts in Logging.GetLotsOfLogs of Mac 10.9 [ RUN ] Logging.GetLotsOfLogs [240/240] Logging.GetLotsOfLogs (TIMED OUT) https://build.chromium.org/p/chromium.mac/builders/Mac10.9%20Tests%20%28dbg%2....
Message was sent while issue was closed.
On 2017/01/19 23:07:34, samuong wrote: > On 2017/01/19 21:26:00, klm wrote: > > On 2017/01/19 21:15:37, samuong wrote: > > > ListValue is actually already backed by a std::vector (see the definition of > > > base::ListValue::Storage here: > > > > > > https://cs.chromium.org/chromium/src/base/values.h?q=base::ListV&sq=package:c...). > > > I think it's meant to represent a JSON list/array, more than a linked list. > > > > Yes. And vector is not linked underneath either. I was just saying that if we > 1) > > kept a plain std::vector<base::Value> here instead of a ListValue, 2) added a > > bulk append of values to a ListValue – AppendValues(const > > std::vector<base::Value>& in_values), then this op may speed up by orders of > > magnitude when it runs for 100K's of entries, and have no tangible performance > > change for a low number of entries. > > I think that would work, but unfortunately base::ListValue doesn't have a bulk > AppendValues method, and I don't think we'd be able to add one, especially if > its just for a workaround like this. > > An alternative would be to keep a std::vector<base::ListValue>, where each > ListValue is capped at 100,000 log entries, then we would be fast in all cases. > But again, I'm not sure it's worth the complexity; any effort beyond this CL > should probably go into making net::HttpServer stream its responses. > > Perhaps we should revisit performance only if/when the latency lab finds that > this is too slow for large traces? Indeed we do see a very substantial slowdown with this fix, CC'd you on that bug. |