|
|
Created:
6 years, 10 months ago by shashi Modified:
6 years, 10 months ago CC:
chromium-reviews Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionStore page no for distilled pages undergoing distillation.
In order to support distillation of previous pages and enable
meaningful incremental updates for viewers, distiller needs
to maintain page number information for pages under distillation.
This information will be used to add support for incremental updates
and distilling previous pages of an article.
BUG=288015
TEST=Covered by existing tests + added tests for failure and
page limit for distiller.
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=251546
Patch Set 1 : #Patch Set 2 : #
Total comments: 34
Patch Set 3 : Fix compile on Android clang: s/const_iterator/iterator. #Patch Set 4 : Address Chris' comments. #Patch Set 5 : Add tests for distillation failure. #Patch Set 6 : Partition internal states into 3 sets: started, pending, finished. #
Total comments: 16
Patch Set 7 : Address comments. #
Total comments: 11
Patch Set 8 : fix nitz. #
Messages
Total messages: 14 (0 generated)
Comments on the description: My understanding is that this is actually only needed to support starting a distillation on something other than the first page, but isn't really needed for incremental distillation. (Tracking when the image fetchers are done for a single page is required for that, though). https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... File components/dom_distiller/core/distiller.cc (right): https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:64: if (!IsValidPageNo(page_no) && url.is_valid() && Maybe rename IsValidPageNo to IsUsedPageNumber or IsPageNumberInUse. I found "valid" confusing here since if the number isn't valid I would expect you to reject it. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:65: article_proto_->pages_size() < kMaxPagesInArticle && Should this account for pages in the queue but not yet in the article proto? Or should we just add everything to the queue and only dequeue the first kMaxPagesInArticle? https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:65: article_proto_->pages_size() < kMaxPagesInArticle && Also, article_proto_ isn't actually created until running the distiller callback so I don't think we'll ever hit the max pages. We should add a test that kMaxPagesInArticle actually works. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:108: DCHECK(pages_to_be_distilled_.end() != pages_to_be_distilled_.find(page_no)); nit: swap the order of this comparison: foo.find(_) != foo.end() (generally in chrome the constant goes on the right in a comparison like this) https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:141: in_progress_pages_.erase(page_no); I don't think this is properly handled in RunDistillerCallbackIfDone. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:145: DCHECK(pages_to_be_distilled_.end() == pages_to_be_distilled_.find(page_no)); nit: swap order of this comparison https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:193: DCHECK(in_progress_pages_.end() != in_progress_pages_.find(page_no)); nit: swap order https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... File components/dom_distiller/core/distiller.h (right): https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:134: std::map<int, size_t> distilled_pages_index_; Couldn't this just point right to the DPD in distilled_pages? i.e. std::map<int, DistilledPageData*>? https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:139: // The page number of pages that are either waiting for distillation or image This includes pages waiting in pages_to_be_distilled? https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:142: base::hash_set<int> in_progress_pages_; It's unclear how a page distillation works through these steps. Could we rename these to something like: distilled_pages_ in_progress_pages_ waiting_pages_ other options distilled/finished in_progress/working/distilling/started waiting/pending/queued Maybe waiting/started/finished is best? https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:144: // Set to keep track of which urls are already seen by the distiller. This comment should say what this is used for. Also, this should probably be seen_urls_ or something and we should be adding things to it when we first see them (i.e. when they are added to pages_to_be_distilled). https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:147: scoped_ptr<DistilledArticleProto> article_proto_; This is only actually created and used in RunDistillerCallbackIfDone, so why not just make it local in there?
PTAL https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... File components/dom_distiller/core/distiller.cc (right): https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:64: if (!IsValidPageNo(page_no) && url.is_valid() && On 2014/02/12 20:39:09, cjhopman wrote: > Maybe rename IsValidPageNo to IsUsedPageNumber or IsPageNumberInUse. I found > "valid" confusing here since if the number isn't valid I would expect you to > reject it. Done. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:65: article_proto_->pages_size() < kMaxPagesInArticle && Good catch, I missed that, added a test. On 2014/02/12 20:39:09, cjhopman wrote: > Should this account for pages in the queue but not yet in the article proto? Or > should we just add everything to the queue and only dequeue the first > kMaxPagesInArticle? https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:108: DCHECK(pages_to_be_distilled_.end() != pages_to_be_distilled_.find(page_no)); I also thought so, but replicated the style in this particular file :). I have changed it at all places. On 2014/02/12 20:39:09, cjhopman wrote: > nit: swap the order of this comparison: foo.find(_) != foo.end() > > (generally in chrome the constant goes on the right in a comparison like this) https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:141: in_progress_pages_.erase(page_no); On 2014/02/12 20:39:09, cjhopman wrote: > I don't think this is properly handled in RunDistillerCallbackIfDone. Sorry, I do not understand the comment, can you be more specific. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:145: DCHECK(pages_to_be_distilled_.end() == pages_to_be_distilled_.find(page_no)); On 2014/02/12 20:39:09, cjhopman wrote: > nit: swap order of this comparison Done. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:193: DCHECK(in_progress_pages_.end() != in_progress_pages_.find(page_no)); On 2014/02/12 20:39:09, cjhopman wrote: > nit: swap order Done. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:193: DCHECK(in_progress_pages_.end() != in_progress_pages_.find(page_no)); On 2014/02/12 20:39:09, cjhopman wrote: > nit: swap order Done. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... File components/dom_distiller/core/distiller.h (right): https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:134: std::map<int, size_t> distilled_pages_index_; It could but then I have to manually manage the lifetime of distilled pages, instead of using ScopedVector, I can switch to STLDeleteSecondElements? On 2014/02/12 20:39:09, cjhopman wrote: > Couldn't this just point right to the DPD in distilled_pages? i.e. std::map<int, > DistilledPageData*>? https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:139: // The page number of pages that are either waiting for distillation or image On 2014/02/12 20:39:09, cjhopman wrote: > This includes pages waiting in pages_to_be_distilled? Yes, it does, any unfinished pages. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:142: base::hash_set<int> in_progress_pages_; Done, hopefully more clear now. On 2014/02/12 20:39:09, cjhopman wrote: > It's unclear how a page distillation works through these steps. Could we rename > these to something like: > distilled_pages_ > in_progress_pages_ > waiting_pages_ > > other options > distilled/finished > in_progress/working/distilling/started > waiting/pending/queued > > Maybe waiting/started/finished is best? https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:144: // Set to keep track of which urls are already seen by the distiller. On 2014/02/12 20:39:09, cjhopman wrote: > This comment should say what this is used for. Also, this should probably be > seen_urls_ or something and we should be adding things to it when we first see > them (i.e. when they are added to pages_to_be_distilled). Done. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:147: scoped_ptr<DistilledArticleProto> article_proto_; Because I was concerned about the change in lifetime of article_proto_ Note the api documentation for ViewRequestDelegate: https://code.google.com/p/chromium/codesearch#chromium/src/components/dom_dis... On 2014/02/12 20:39:09, cjhopman wrote: > This is only actually created and used in RunDistillerCallbackIfDone, so why not > just make it local in there?
https://codereview.chromium.org/130543003/diff/70001/components/dom_distiller... File components/dom_distiller/core/distiller.cc (right): https://codereview.chromium.org/130543003/diff/70001/components/dom_distiller... components/dom_distiller/core/distiller.cc:141: in_progress_pages_.erase(page_no); On 2014/02/13 01:03:11, shashi wrote: > On 2014/02/12 20:39:09, cjhopman wrote: > > I don't think this is properly handled in RunDistillerCallbackIfDone. > > Sorry, I do not understand the comment, can you be more specific. I mean that I wasn't sure what happens if one page of a multi-page article fails distillation (maybe there should be a test for that). Looking at it again, it looks like it will work, we just sort of lose the fact that a page failed (I don't know what we should do in that case). https://codereview.chromium.org/130543003/diff/70001/components/dom_distiller... File components/dom_distiller/core/distiller.h (right): https://codereview.chromium.org/130543003/diff/70001/components/dom_distiller... components/dom_distiller/core/distiller.h:134: std::map<int, size_t> distilled_pages_index_; On 2014/02/13 01:03:11, shashi wrote: > It could but then I have to manually manage the lifetime of distilled pages, > instead of using ScopedVector, I can switch to STLDeleteSecondElements? > > On 2014/02/12 20:39:09, cjhopman wrote: > > Couldn't this just point right to the DPD in distilled_pages? i.e. > std::map<int, > > DistilledPageData*>? > I mean it would point into distilled_pages. I.e. the ScopedVector still owns them, and this maps relative page number directly to a DPD*. Both of these approaches basically require that we don't remove something from distilled_pages_ without also removing them from the index map. Maybe it is actually simpler to just use a map to pointers and just manually handle the memory; at least then it is obvious that you need to be careful. Another option is to use a std::map<int, linked_ptr<DPD> >. Or an std::map<int, scoped_refptr<RefCountedData<DPD> > >. That last one might be good because we will probably want to use RefCountedData for incremental updates. https://codereview.chromium.org/130543003/diff/70001/components/dom_distiller... components/dom_distiller/core/distiller.h:142: base::hash_set<int> in_progress_pages_; This isn't really more clear, particularly because pending and waiting mean almost the same thing. My intention was that a particular page_no should only be in one of three states: pending, started, or finished. These three containers should correspond to those three states. Then the flow of a page is more obvious. On 2014/02/13 01:03:11, shashi wrote: > Done, hopefully more clear now. > On 2014/02/12 20:39:09, cjhopman wrote: > > It's unclear how a page distillation works through these steps. Could we > rename > > these to something like: > > distilled_pages_ > > in_progress_pages_ > > waiting_pages_ > > > > other options > > distilled/finished > > in_progress/working/distilling/started > > waiting/pending/queued > > > > Maybe waiting/started/finished is best? > https://codereview.chromium.org/130543003/diff/70001/components/dom_distiller... components/dom_distiller/core/distiller.h:147: scoped_ptr<DistilledArticleProto> article_proto_; Ownership of it is passed to the DistillerCallback. On 2014/02/13 01:03:11, shashi wrote: > Because I was concerned about the change in lifetime of article_proto_ > Note the api documentation for ViewRequestDelegate: > https://code.google.com/p/chromium/codesearch#chromium/src/components/dom_dis... > > On 2014/02/12 20:39:09, cjhopman wrote: > > This is only actually created and used in RunDistillerCallbackIfDone, so why > not > > just make it local in there? >
PTAL https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... File components/dom_distiller/core/distiller.cc (right): https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.cc:141: in_progress_pages_.erase(page_no); I am not sure about the correct behavior either, currently we just truncate the article. I think the right behavior will depend on the capabilities of the distiller, currently we can only get the next url once distillation is finished for the current page, so we have no choice but to truncate the article but if distiller can provide us a list of all next_urls, then we can just add them to the queue and maybe show a 'Failed to convert this page' error. Good point about tests, added two more: to check for single as well as multiple page failures. On 2014/02/13 02:48:48, cjhopman wrote: > On 2014/02/13 01:03:11, shashi wrote: > > On 2014/02/12 20:39:09, cjhopman wrote: > > > I don't think this is properly handled in RunDistillerCallbackIfDone. > > > > Sorry, I do not understand the comment, can you be more specific. > > I mean that I wasn't sure what happens if one page of a multi-page article fails > distillation (maybe there should be a test for that). Looking at it again, it > looks like it will work, we just sort of lose the fact that a page failed (I > don't know what we should do in that case). https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... File components/dom_distiller/core/distiller.h (right): https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:134: std::map<int, size_t> distilled_pages_index_; On 2014/02/13 02:48:48, cjhopman wrote: > On 2014/02/13 01:03:11, shashi wrote: > > It could but then I have to manually manage the lifetime of distilled pages, > > instead of using ScopedVector, I can switch to STLDeleteSecondElements? > > > > On 2014/02/12 20:39:09, cjhopman wrote: > > > Couldn't this just point right to the DPD in distilled_pages? i.e. > > std::map<int, > > > DistilledPageData*>? > > > I mean it would point into distilled_pages. I.e. the ScopedVector still owns > them, and this maps relative page number directly to a DPD*. Both of these > approaches basically require that we don't remove something from > distilled_pages_ without also removing them from the index map. Maybe it is > actually simpler to just use a map to pointers and just manually handle the > memory; at least then it is obvious that you need to be careful. > > Another option is to use a std::map<int, linked_ptr<DPD> >. Or an std::map<int, > scoped_refptr<RefCountedData<DPD> > >. That last one might be good because we > will probably want to use RefCountedData for incremental updates. When I will add incremental updates in my later patch I can switch it to linked_ptr/RefCounted Data as appropriate. Currently the distiller just copies these pages to outgoing article_proto and then deletes these pages once distillation finishes. I think advantage of having them in a ScopedVector, is there is not need to delete these pages when distillation finishes. I do not have strong opinions about storing pointers in the map vs indices. One advantage that I can think of the index based approach is: if there is an invalid index in the map, and the code tries to reference it, the code will crash in an *obvious* way at the point of reference, since the ScopedVector will fail with index out of bounds, DCHECK. In case of pointers, the code will need to insert some DCHECKs to ensure safety. https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:142: base::hash_set<int> in_progress_pages_; in_progress_pages_ = all pages that are not finished, this might include pages waiting in the queue + pages currently not finished. A lot of places inside the distiller we query about any unfinished pages, so it feels cleaner to just have in_progress pages. Although I do not feel strongly about this and can change it and replace the query with a function: bool AnyUnfinishedPages(). On 2014/02/13 02:48:48, cjhopman wrote: > This isn't really more clear, particularly because pending and waiting mean > almost the same thing. > > My intention was that a particular page_no should only be in one of three > states: pending, started, or finished. These three containers should correspond > to those three states. Then the flow of a page is more obvious. > > On 2014/02/13 01:03:11, shashi wrote: > > Done, hopefully more clear now. > > On 2014/02/12 20:39:09, cjhopman wrote: > > > It's unclear how a page distillation works through these steps. Could we > > rename > > > these to something like: > > > distilled_pages_ > > > in_progress_pages_ > > > waiting_pages_ > > > > > > other options > > > distilled/finished > > > in_progress/working/distilling/started > > > waiting/pending/queued > > > > > > Maybe waiting/started/finished is best? > > > https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:147: scoped_ptr<DistilledArticleProto> article_proto_; Duh! Done, that file is task_tracker :). On 2014/02/13 02:48:48, cjhopman wrote: > Ownership of it is passed to the DistillerCallback. > > On 2014/02/13 01:03:11, shashi wrote: > > Because I was concerned about the change in lifetime of article_proto_ > > Note the api documentation for ViewRequestDelegate: > > > https://code.google.com/p/chromium/codesearch#chromium/src/components/dom_dis... > > > > On 2014/02/12 20:39:09, cjhopman wrote: > > > This is only actually created and used in RunDistillerCallbackIfDone, so why > > not > > > just make it local in there? > > >
https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... File components/dom_distiller/core/distiller.h (right): https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:142: base::hash_set<int> in_progress_pages_; On 2014/02/13 20:09:51, shashi wrote: > in_progress_pages_ = all pages that are not finished, this might include pages > waiting in the queue + pages currently not finished. A lot of places inside the > distiller we query about any unfinished pages, so it feels cleaner to just have > in_progress pages. Although I do not feel strongly about this and can change it > and replace the query with a function: bool AnyUnfinishedPages(). I would much prefer that. It was difficult reading this code to figure out what these all meant and what container each thing was expected to be in at each point. I really think a page should only be in one of these containers at a time. > > On 2014/02/13 02:48:48, cjhopman wrote: > > This isn't really more clear, particularly because pending and waiting mean > > almost the same thing. > > > > My intention was that a particular page_no should only be in one of three > > states: pending, started, or finished. These three containers should > correspond > > to those three states. Then the flow of a page is more obvious. > > > > On 2014/02/13 01:03:11, shashi wrote: > > > Done, hopefully more clear now. > > > On 2014/02/12 20:39:09, cjhopman wrote: > > > > It's unclear how a page distillation works through these steps. Could we > > > rename > > > > these to something like: > > > > distilled_pages_ > > > > in_progress_pages_ > > > > waiting_pages_ > > > > > > > > other options > > > > distilled/finished > > > > in_progress/working/distilling/started > > > > waiting/pending/queued > > > > > > > > Maybe waiting/started/finished is best? > > > > > >
PTAL https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... File components/dom_distiller/core/distiller.h (right): https://chromiumcodereview.appspot.com/130543003/diff/70001/components/dom_di... components/dom_distiller/core/distiller.h:142: base::hash_set<int> in_progress_pages_; On 2014/02/13 20:43:19, cjhopman wrote: > On 2014/02/13 20:09:51, shashi wrote: > > in_progress_pages_ = all pages that are not finished, this might include pages > > waiting in the queue + pages currently not finished. A lot of places inside > the > > distiller we query about any unfinished pages, so it feels cleaner to just > have > > in_progress pages. Although I do not feel strongly about this and can change > it > > and replace the query with a function: bool AnyUnfinishedPages(). > > I would much prefer that. It was difficult reading this code to figure out what > these all meant and what container each thing was expected to be in at each > point. I really think a page should only be in one of these containers at a > time. > > > > > On 2014/02/13 02:48:48, cjhopman wrote: > > > This isn't really more clear, particularly because pending and waiting mean > > > almost the same thing. > > > > > > My intention was that a particular page_no should only be in one of three > > > states: pending, started, or finished. These three containers should > > correspond > > > to those three states. Then the flow of a page is more obvious. > > > > > > On 2014/02/13 01:03:11, shashi wrote: > > > > Done, hopefully more clear now. > > > > On 2014/02/12 20:39:09, cjhopman wrote: > > > > > It's unclear how a page distillation works through these steps. Could we > > > > rename > > > > > these to something like: > > > > > distilled_pages_ > > > > > in_progress_pages_ > > > > > waiting_pages_ > > > > > > > > > > other options > > > > > distilled/finished > > > > > in_progress/working/distilling/started > > > > > waiting/pending/queued > > > > > > > > > > Maybe waiting/started/finished is best? > > > > > > > > > > Done.
https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... File components/dom_distiller/core/distiller.cc (right): https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.cc:122: DCHECK(IsPageNumberInUse(page_no)); This could be more specific and check that page_no is in started_pages_ https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.cc:139: seen_urls_.insert(page_url.spec()); I think this should be done in ::AddToDistillationQueue https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... File components/dom_distiller/core/distiller.h (right): https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:74: virtual size_t GetMaxNumPagesInArticle() const; Overriding in tests for something like this is bad. Just provide a (possibly static) SetMaxNumPagesInArticle. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:86: int page_no; Consider doing `s/page_no/page_number` throughout. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:95: void OnFetchImageDone(DistilledPageData* distilled_page_data, So a bunch of functions are called to distill a page: AddToDistillationQueue OnPageDistillationFinished FetchImage OnFetchImageDone CheckAndAddPageIfDone It's awkward to me that some these take a page_no and some take a distilled_page_data. It looks like while a page is "started" we pass the DPD* around to these functions. One issue with that is that ownership of that DPD* is unclear. I would propose that once a page is "started" we add the DPD* to a ScopedVector and then just pass around the page_no and maintains maps to index like in finished_pages_index_. In fact, finished pages and started pages could be stored in the same ScopedVector since we always would access them indirectly through the index maps. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:121: bool NoPendingPages() const; This name doesn't really imply a question... how about IsFinished() or AreAllPagesFinished() or just ArePagesFinished() or invert the return and call it AreAnyPagesPending(). https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:133: void CheckAndAddPageIfDone(DistilledPageData* distilled_page_data); Does this mean `(check and add) if done` or `check and (add if done)`. And what does "Check" mean? Is that any different than the "IfDone" part? I'd just drop the "CheckAnd". This name would then mirror "RunDistillerCallbackIfDone". https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:154: // for an action (distillation or image fetch) to finish. These sentences seem redundant. Remove one of them.
PTAL https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... File components/dom_distiller/core/distiller.cc (right): https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.cc:122: DCHECK(IsPageNumberInUse(page_no)); On 2014/02/14 20:53:52, cjhopman wrote: > This could be more specific and check that page_no is in started_pages_ Done. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.cc:139: seen_urls_.insert(page_url.spec()); On 2014/02/14 20:53:52, cjhopman wrote: > I think this should be done in ::AddToDistillationQueue Done. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... File components/dom_distiller/core/distiller.h (right): https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:74: virtual size_t GetMaxNumPagesInArticle() const; On 2014/02/14 20:53:52, cjhopman wrote: > Overriding in tests for something like this is bad. Just provide a (possibly > static) SetMaxNumPagesInArticle. Done. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:86: int page_no; On 2014/02/14 20:53:52, cjhopman wrote: > Consider doing `s/page_no/page_number` throughout. Done. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:95: void OnFetchImageDone(DistilledPageData* distilled_page_data, Change to use page_num through out. On 2014/02/14 20:53:52, cjhopman wrote: > So a bunch of functions are called to distill a page: > AddToDistillationQueue > OnPageDistillationFinished > FetchImage > OnFetchImageDone > CheckAndAddPageIfDone > > It's awkward to me that some these take a page_no and some take a > distilled_page_data. It looks like while a page is "started" we pass the DPD* > around to these functions. One issue with that is that ownership of that DPD* is > unclear. > > I would propose that once a page is "started" we add the DPD* to a ScopedVector > and then just pass around the page_no and maintains maps to index like in > finished_pages_index_. In fact, finished pages and started pages could be stored > in the same ScopedVector since we always would access them indirectly through > the index maps. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:121: bool NoPendingPages() const; On 2014/02/14 20:53:52, cjhopman wrote: > This name doesn't really imply a question... how about IsFinished() or > AreAllPagesFinished() or just ArePagesFinished() or invert the return and call > it AreAnyPagesPending(). Done. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:133: void CheckAndAddPageIfDone(DistilledPageData* distilled_page_data); On 2014/02/14 20:53:52, cjhopman wrote: > Does this mean `(check and add) if done` or `check and (add if done)`. And what > does "Check" mean? Is that any different than the "IfDone" part? I'd just drop > the "CheckAnd". This name would then mirror "RunDistillerCallbackIfDone". Done. https://codereview.chromium.org/130543003/diff/700001/components/dom_distille... components/dom_distiller/core/distiller.h:154: // for an action (distillation or image fetch) to finish. On 2014/02/14 20:53:52, cjhopman wrote: > These sentences seem redundant. Remove one of them. Done.
Thanks, this is much easier for me to follow. lgtm with some nits https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... File components/dom_distiller/core/distiller.cc (right): https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller.cc:160: AddToDistillationQueue(page_num + 1, next_page_url); Nit: why not do this in the `if (next_page_url.is_valid())` block above? https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... File components/dom_distiller/core/distiller.h (right): https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller.h:75: // as relative page numbers. E.g. if distillation starts at page 2 for a 3 nit: relative to what? Might not be needed since it can be determined from the example, but maybe just say "as page numbers relative to the page where distillation started". https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller.h:146: // The list of pages that are still waiting for distillation to start. Nit: The order of declaration here should be finished/started/waiting (or the reverse of that). https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... File components/dom_distiller/core/distiller_unittest.cc (right): https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller_unittest.cc:31: const size_t kMaxPagesInArticle = 10; nit: this can just be moved into the CheckMaxPageLimit test https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller_unittest.cc:294: TEST_F(DistillerTest, CheckMaxPageLimit) { Nit: maybe test that an article that is exactly the page limit actually works.
https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... File components/dom_distiller/core/distiller.cc (right): https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller.cc:160: AddToDistillationQueue(page_num + 1, next_page_url); On 2014/02/15 02:44:15, cjhopman wrote: > Nit: why not do this in the `if (next_page_url.is_valid())` block above? Done. https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... File components/dom_distiller/core/distiller.h (right): https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller.h:75: // as relative page numbers. E.g. if distillation starts at page 2 for a 3 On 2014/02/15 02:44:15, cjhopman wrote: > nit: relative to what? Might not be needed since it can be determined from the > example, but maybe just say "as page numbers relative to the page where > distillation started". Done. https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller.h:146: // The list of pages that are still waiting for distillation to start. On 2014/02/15 02:44:15, cjhopman wrote: > Nit: The order of declaration here should be finished/started/waiting (or the > reverse of that). Done. https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller.h:146: // The list of pages that are still waiting for distillation to start. On 2014/02/15 02:44:15, cjhopman wrote: > Nit: The order of declaration here should be finished/started/waiting (or the > reverse of that). Done. https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... File components/dom_distiller/core/distiller_unittest.cc (right): https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller_unittest.cc:31: const size_t kMaxPagesInArticle = 10; On 2014/02/15 02:44:15, cjhopman wrote: > nit: this can just be moved into the CheckMaxPageLimit test Done. https://codereview.chromium.org/130543003/diff/910001/components/dom_distille... components/dom_distiller/core/distiller_unittest.cc:294: TEST_F(DistillerTest, CheckMaxPageLimit) { On 2014/02/15 02:44:15, cjhopman wrote: > Nit: maybe test that an article that is exactly the page limit actually works. Done.
The CQ bit was checked by shashishekhar@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/shashishekhar@chromium.org/130543003/1...
Message was sent while issue was closed.
Change committed as 251546 |