|
|
Descriptioncachetool: Implement batch mode to speed-up cache processing.
Before, Sandwich was modifying local cache directories by issuing
cachetool commands sequentially. But it was painfully slow.
This CL implement the cachetool's batch mode that process sequence
of command received from the stdin and return the commands results
in the stdout. This way, Sandwich just have to launch cachetool once
in the background and push to its stdin commands to process.
On cache directories populated by big webpages such as cnn.com's
home page, the Sandwich cache processing for NoState-Prefetch
emulation was taking 491s when spawning one cachetool process
for each commands versus 3s with this new online mode.
BUG=582080
Committed: https://crrev.com/fbe837c8513395e7f304999eaaf67c4cbfb77eea
Cr-Commit-Position: refs/heads/master@{#405157}
Patch Set 1 #
Total comments: 27
Patch Set 2 : Addresses egor comments #Patch Set 3 : s/CachetoolIOBase/CommandParser #Patch Set 4 : Addressing egor's comments #Patch Set 5 : Rebase #Patch Set 6 : Fixes comments #Patch Set 7 : s/CommandParser/CommandMarshal/g #Messages
Total messages: 28 (15 generated)
Description was changed from ========== cachetool: Implement online mode to speed cache processing. Before, Sandwich was modifying local cache directories by issuing cachetool commands sequentially. But it was painfully slow. This CL implement the cachetool's online mode that process sequence of command received from the stdin and return the commands results in the stdout. This way, Sandwich just have to launch cachetool once in the background and push to its stdin commands to process. On cache directories populated by big webpages such as cnn.com's home page, the Sandwich cache processing for NoState-Prefetch emulation was taking 491s when spawning one cachetool process for each commands versus 3s with this new online mode. BUG=582080 ========== to ========== cachetool: Implement online mode to speed-up cache processing. Before, Sandwich was modifying local cache directories by issuing cachetool commands sequentially. But it was painfully slow. This CL implement the cachetool's online mode that process sequence of command received from the stdin and return the commands results in the stdout. This way, Sandwich just have to launch cachetool once in the background and push to its stdin commands to process. On cache directories populated by big webpages such as cnn.com's home page, the Sandwich cache processing for NoState-Prefetch emulation was taking 491s when spawning one cachetool process for each commands versus 3s with this new online mode. BUG=582080 ==========
gabadie@chromium.org changed reviewers: + gavinp@chromium.org
Hey Gavin, slight refactor of cachetool to speed up cache processing. PTAL.
The CQ bit was checked by gabadie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
pasko@chromium.org changed reviewers: + pasko@chromium.org
https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... File net/tools/cachetool/cachetool.cc (right): https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:28: const int kResponseInfoIndex = 0; constexpr is preferred now as of https://chromium-cpp.appspot.com/ https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:31: "stop", "get_size", "list_keys", "get_stream_for_key", any reason for these to be separated by multiple spaces? One column would be preferable IMO https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:35: // Print the command line help. s/Print/Prints/ https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:37: std::cout << "cachetool <cache_path> <cache_backend_type> <subcommand> ..." The '...' is slightly confusing, maybe should be: cachetool <cache_path> <cache_backend_type> <subcommand> [<subcommand>]... https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:50: std::cout << " online: Starts cachetool to process serialized commands " Making "online" a subcommand is somewhat confusing, as it is not clear what should happen with multiple "online" subcommands, should better be --online, and an optional positional argument, like: cachetool <cache_path> <cache_backend_type> [--online | <subcommand> [<subcommand>]...] This is unless one absolutely needs some commands executed prior to the online commands start playing. I am not sure we have this usecase, probably not. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:61: std::cout << " 2 (compiled content)" << std::endl; there can be other things in stream 2, so we can say: (metadata or compiled JS code) https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:65: class CachetoolIOBase { Ha, now I understand why you call it 'IO', basically you want the single class to control how we read the subcommands and how they are output. This makes sense because we want to pass the choice as a parameter to various functions. On the other hand, is there value in making the streamed format to be different from stdout? I imagine grabbing newlines and parsing ints from the python side would not slow us down considerably. Another use seems to be to do DCHECK on input when doing output. I am not sure why this is essential, can you explain? If we join the output format for the tool we would have two benefits: 1. less code 2. more intuitive naming of the classes Naming that sounds more familiar to me (and I guess a general reader) is to have a CommandParser, and ProgramArgumentCommandParser/StreamCommandParser to inherit from it. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:110: // Returns weather the command has failed. s/weather/whether/ https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:221: std::cin.read(reinterpret_cast<char*>(&integer), sizeof(integer)); read 4 bytes? why? why not std::cin >> integer? https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:435: // Execute all command from the |common_input|. what is |common_input|?
Thanks Egor. One other idea I just had was to rename "online" command to "batch". https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... File net/tools/cachetool/cachetool.cc (right): https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:28: const int kResponseInfoIndex = 0; On 2016/07/05 15:13:30, pasko wrote: > constexpr is preferred now as of https://chromium-cpp.appspot.com/ Done. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:31: "stop", "get_size", "list_keys", "get_stream_for_key", On 2016/07/05 15:13:30, pasko wrote: > any reason for these to be separated by multiple spaces? One column would be > preferable IMO looks like `git cl format net` did it by itself... https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:35: // Print the command line help. On 2016/07/05 15:13:31, pasko wrote: > s/Print/Prints/ No, Gavin wanted in the previous reviews the comments of the functions to be infinitive. So I will keep. following his request. More over this here I just moved the PrintHelp() up in the file to avoid a declaration but with one change regarding the online and stop commands. So I won't make any changes in this function if it is not on these commands' help to preserve what Gavin originally wanted. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:37: std::cout << "cachetool <cache_path> <cache_backend_type> <subcommand> ..." On 2016/07/05 15:13:30, pasko wrote: > The '...' is slightly confusing, maybe should be: > > cachetool <cache_path> <cache_backend_type> <subcommand> [<subcommand>]... No. the ... stands for sub-command specific parameters. Have not been modified since last CL therefore staying as is. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:50: std::cout << " online: Starts cachetool to process serialized commands " On 2016/07/05 15:13:31, pasko wrote: > Making "online" a subcommand is somewhat confusing, as it is not clear what > should happen with multiple "online" subcommands, should better be --online, and > an optional positional argument, like: > > cachetool <cache_path> <cache_backend_type> [--online | <subcommand> > [<subcommand>]...] > > This is unless one absolutely needs some commands executed prior to the online > commands start playing. I am not sure we have this usecase, probably not. Command line tool only have one subcommand. Either you do the the subcommand directly, or you do the online subcommand to accept the multiple commands in the stdin. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:61: std::cout << " 2 (compiled content)" << std::endl; On 2016/07/05 15:13:31, pasko wrote: > there can be other things in stream 2, so we can say: (metadata or compiled JS > code) Interesting, waiting for Gavin's opinion to change this. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:65: class CachetoolIOBase { On 2016/07/05 15:13:30, pasko wrote: > Ha, now I understand why you call it 'IO', basically you want the single class > to control how we read the subcommands and how they are output. This makes sense > because we want to pass the choice as a parameter to various functions. > > On the other hand, is there value in making the streamed format to be different > from stdout? I imagine grabbing newlines and parsing ints from the python side > would not slow us down considerably. This is not for speed, but rather for reliability. For instance the raw response headers contains null chars. So I definitively want to by passs all the null-ended assumed by non binary stream. > > Another use seems to be to do DCHECK on input when doing output. I am not sure > why this is essential, can you explain? Have not understood. > > If we join the output format for the tool we would have two benefits: > 1. less code Not conviced. > 2. more intuitive naming of the classes Why? And would also be a non binary protocol that would need to support binary passdown -> a pain. I definitely prefer the be full binary. > > Naming that sounds more familiar to me (and I guess a general reader) is to have > a CommandParser, and ProgramArgumentCommandParser/StreamCommandParser to inherit > from it. Nice names. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:110: // Returns weather the command has failed. On 2016/07/05 15:13:31, pasko wrote: > s/weather/whether/ Done. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:221: std::cin.read(reinterpret_cast<char*>(&integer), sizeof(integer)); On 2016/07/05 15:13:30, pasko wrote: > read 4 bytes? why? > > why not std::cin >> integer? Because binary protocol, not ascii protocol. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:435: // Execute all command from the |common_input|. On 2016/07/05 15:13:30, pasko wrote: > what is |common_input|? Oups, variable renaming artifact.
yeah, 'batch' is good with me, thank you https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... File net/tools/cachetool/cachetool.cc (right): https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:31: "stop", "get_size", "list_keys", "get_stream_for_key", On 2016/07/06 09:25:53, gabadie wrote: > On 2016/07/05 15:13:30, pasko wrote: > > any reason for these to be separated by multiple spaces? One column would be > > preferable IMO > > looks like `git cl format net` did it by itself... weird :) works for me then.. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:35: // Print the command line help. On 2016/07/06 09:25:53, gabadie wrote: > On 2016/07/05 15:13:31, pasko wrote: > > s/Print/Prints/ > > No, Gavin wanted in the previous reviews the comments of the functions to be > infinitive. So I will keep. following his request. > > More over this here I just moved the PrintHelp() up in the file to avoid a > declaration but with one change regarding the online and stop commands. So I > won't make any changes in this function if it is not on these commands' help to > preserve what Gavin originally wanted. Ack, this goes back to Gavin. Though I am not sure why Gavin would propose something against the style guide. Was it based on consistency within the file? directory? puzzled :) https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:37: std::cout << "cachetool <cache_path> <cache_backend_type> <subcommand> ..." On 2016/07/06 09:25:53, gabadie wrote: > On 2016/07/05 15:13:30, pasko wrote: > > The '...' is slightly confusing, maybe should be: > > > > cachetool <cache_path> <cache_backend_type> <subcommand> [<subcommand>]... > > No. the ... stands for sub-command specific parameters. Have not been modified > since last CL therefore staying as is. Then the description is still incorrect. Subcommands below are defined as things with parameters. This would be fixed by removing the '...'. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:50: std::cout << " online: Starts cachetool to process serialized commands " On 2016/07/06 09:25:53, gabadie wrote: > On 2016/07/05 15:13:31, pasko wrote: > > Making "online" a subcommand is somewhat confusing, as it is not clear what > > should happen with multiple "online" subcommands, should better be --online, > and > > an optional positional argument, like: > > > > cachetool <cache_path> <cache_backend_type> [--online | <subcommand> > > [<subcommand>]...] > > > > This is unless one absolutely needs some commands executed prior to the online > > commands start playing. I am not sure we have this usecase, probably not. > > Command line tool only have one subcommand. Either you do the the subcommand > directly, or you do the online subcommand to accept the multiple commands in the > stdin. Ah, ok, thanks for explaining. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:65: class CachetoolIOBase { On 2016/07/06 09:25:53, gabadie wrote: > On 2016/07/05 15:13:30, pasko wrote: > > Ha, now I understand why you call it 'IO', basically you want the single class > > to control how we read the subcommands and how they are output. This makes > sense > > because we want to pass the choice as a parameter to various functions. > > > > On the other hand, is there value in making the streamed format to be > different > > from stdout? I imagine grabbing newlines and parsing ints from the python side > > would not slow us down considerably. > This is not for speed, but rather for reliability. For instance the raw response > headers contains null chars. So I definitively want to by passs all the > null-ended assumed by non binary stream. Oh, that's actually a very good point, which I totally missed. Yes, binary format is preferable then. > > > > Another use seems to be to do DCHECK on input when doing output. I am not sure > > why this is essential, can you explain? > Have not understood. > > > > > If we join the output format for the tool we would have two benefits: > > 1. less code > Not conviced. > > 2. more intuitive naming of the classes > Why? > > And would also be a non binary protocol that would need to support binary > passdown -> a pain. I definitely prefer the be full binary. > > > > > Naming that sounds more familiar to me (and I guess a general reader) is to > have > > a CommandParser, and ProgramArgumentCommandParser/StreamCommandParser to > inherit > > from it. > Nice names. yeah, but they would be incorrect if we combine parsing with producing the output.
Thanks Egor! Let's wait for Gavin's review. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... File net/tools/cachetool/cachetool.cc (right): https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:37: std::cout << "cachetool <cache_path> <cache_backend_type> <subcommand> ..." On 2016/07/06 15:18:55, pasko wrote: > On 2016/07/06 09:25:53, gabadie wrote: > > On 2016/07/05 15:13:30, pasko wrote: > > > The '...' is slightly confusing, maybe should be: > > > > > > cachetool <cache_path> <cache_backend_type> <subcommand> [<subcommand>]... > > > > No. the ... stands for sub-command specific parameters. Have not been modified > > since last CL therefore staying as is. > > Then the description is still incorrect. Subcommands below are defined as things > with parameters. This would be fixed by removing the '...'. Done. https://codereview.chromium.org/2114933002/diff/1/net/tools/cachetool/cacheto... net/tools/cachetool/cachetool.cc:65: class CachetoolIOBase { On 2016/07/06 15:18:55, pasko wrote: > On 2016/07/06 09:25:53, gabadie wrote: > > On 2016/07/05 15:13:30, pasko wrote: > > > Ha, now I understand why you call it 'IO', basically you want the single > class > > > to control how we read the subcommands and how they are output. This makes > > sense > > > because we want to pass the choice as a parameter to various functions. > > > > > > On the other hand, is there value in making the streamed format to be > > different > > > from stdout? I imagine grabbing newlines and parsing ints from the python > side > > > would not slow us down considerably. > > This is not for speed, but rather for reliability. For instance the raw > response > > headers contains null chars. So I definitively want to by passs all the > > null-ended assumed by non binary stream. > > Oh, that's actually a very good point, which I totally missed. Yes, binary > format is preferable then. > > > > > > > Another use seems to be to do DCHECK on input when doing output. I am not > sure > > > why this is essential, can you explain? > > Have not understood. > > > > > > > > If we join the output format for the tool we would have two benefits: > > > 1. less code > > Not conviced. > > > 2. more intuitive naming of the classes > > Why? > > > > And would also be a non binary protocol that would need to support binary > > passdown -> a pain. I definitely prefer the be full binary. > > > > > > > > Naming that sounds more familiar to me (and I guess a general reader) is to > > have > > > a CommandParser, and ProgramArgumentCommandParser/StreamCommandParser to > > inherit > > > from it. > > Nice names. > > yeah, but they would be incorrect if we combine parsing with producing the > output. Ok let's wait for Gavin's opinion. But your offline proposal for Marshal is a nice idea in my mind.
Description was changed from ========== cachetool: Implement online mode to speed-up cache processing. Before, Sandwich was modifying local cache directories by issuing cachetool commands sequentially. But it was painfully slow. This CL implement the cachetool's online mode that process sequence of command received from the stdin and return the commands results in the stdout. This way, Sandwich just have to launch cachetool once in the background and push to its stdin commands to process. On cache directories populated by big webpages such as cnn.com's home page, the Sandwich cache processing for NoState-Prefetch emulation was taking 491s when spawning one cachetool process for each commands versus 3s with this new online mode. BUG=582080 ========== to ========== cachetool: Implement batch mode to speed-up cache processing. Before, Sandwich was modifying local cache directories by issuing cachetool commands sequentially. But it was painfully slow. This CL implement the cachetool's batch mode that process sequence of command received from the stdin and return the commands results in the stdout. This way, Sandwich just have to launch cachetool once in the background and push to its stdin commands to process. On cache directories populated by big webpages such as cnn.com's home page, the Sandwich cache processing for NoState-Prefetch emulation was taking 491s when spawning one cachetool process for each commands versus 3s with this new online mode. BUG=582080 ==========
The CQ bit was checked by gabadie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by gabadie@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by gavinp@chromium.org
lgtm lgtm
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== cachetool: Implement batch mode to speed-up cache processing. Before, Sandwich was modifying local cache directories by issuing cachetool commands sequentially. But it was painfully slow. This CL implement the cachetool's batch mode that process sequence of command received from the stdin and return the commands results in the stdout. This way, Sandwich just have to launch cachetool once in the background and push to its stdin commands to process. On cache directories populated by big webpages such as cnn.com's home page, the Sandwich cache processing for NoState-Prefetch emulation was taking 491s when spawning one cachetool process for each commands versus 3s with this new online mode. BUG=582080 ========== to ========== cachetool: Implement batch mode to speed-up cache processing. Before, Sandwich was modifying local cache directories by issuing cachetool commands sequentially. But it was painfully slow. This CL implement the cachetool's batch mode that process sequence of command received from the stdin and return the commands results in the stdout. This way, Sandwich just have to launch cachetool once in the background and push to its stdin commands to process. On cache directories populated by big webpages such as cnn.com's home page, the Sandwich cache processing for NoState-Prefetch emulation was taking 491s when spawning one cachetool process for each commands versus 3s with this new online mode. BUG=582080 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Description was changed from ========== cachetool: Implement batch mode to speed-up cache processing. Before, Sandwich was modifying local cache directories by issuing cachetool commands sequentially. But it was painfully slow. This CL implement the cachetool's batch mode that process sequence of command received from the stdin and return the commands results in the stdout. This way, Sandwich just have to launch cachetool once in the background and push to its stdin commands to process. On cache directories populated by big webpages such as cnn.com's home page, the Sandwich cache processing for NoState-Prefetch emulation was taking 491s when spawning one cachetool process for each commands versus 3s with this new online mode. BUG=582080 ========== to ========== cachetool: Implement batch mode to speed-up cache processing. Before, Sandwich was modifying local cache directories by issuing cachetool commands sequentially. But it was painfully slow. This CL implement the cachetool's batch mode that process sequence of command received from the stdin and return the commands results in the stdout. This way, Sandwich just have to launch cachetool once in the background and push to its stdin commands to process. On cache directories populated by big webpages such as cnn.com's home page, the Sandwich cache processing for NoState-Prefetch emulation was taking 491s when spawning one cachetool process for each commands versus 3s with this new online mode. BUG=582080 Committed: https://crrev.com/fbe837c8513395e7f304999eaaf67c4cbfb77eea Cr-Commit-Position: refs/heads/master@{#405157} ==========
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/fbe837c8513395e7f304999eaaf67c4cbfb77eea Cr-Commit-Position: refs/heads/master@{#405157} |