Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(637)

Unified Diff: components/dom_distiller/core/distiller.h

Issue 130543003: Store page no for distilled pages undergoing distillation. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src
Patch Set: Created 6 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: components/dom_distiller/core/distiller.h
diff --git a/components/dom_distiller/core/distiller.h b/components/dom_distiller/core/distiller.h
index 54907500557dc2f6886a27976e7bec086e59bbe9..de73630646e1f1f55f2309ca0948deb4ae8a3c0a 100644
--- a/components/dom_distiller/core/distiller.h
+++ b/components/dom_distiller/core/distiller.h
@@ -69,36 +69,83 @@ class DistillerImpl : public Distiller {
const DistillerCallback& callback) OVERRIDE;
private:
- void OnFetchImageDone(DistilledPageProto* distilled_page_proto,
+ // In case of multiple pages, the Distiller maintains state of multiple pages
+ // as relative page numbers. E.g. if distillation starts at page 2 for a 3
+ // page article. The relative page numbers assigned to pages will be [-1,0,1].
+
+ // Class representing the state of a page under distillation.
+ struct DistilledPageData {
+ DistilledPageData();
+ virtual ~DistilledPageData();
+ // Relative page number of the page.
+ int page_no;
+ std::string title;
+ ScopedVector<DistillerURLFetcher> image_fetchers_;
+ scoped_ptr<DistilledPageProto> proto;
+
+ private:
+ DISALLOW_COPY_AND_ASSIGN(DistilledPageData);
+ };
+
+ void OnFetchImageDone(DistilledPageData* distilled_page_data,
DistillerURLFetcher* url_fetcher,
const std::string& id,
const std::string& response);
- void OnPageDistillationFinished(const GURL& page_url,
+ void OnPageDistillationFinished(int page_no,
scoped_ptr<DistilledPageInfo> distilled_page,
bool distillation_successful);
- virtual void FetchImage(DistilledPageProto* distilled_page_proto,
+ virtual void FetchImage(DistilledPageData* distilled_page_data,
const std::string& image_id,
const std::string& item);
- // Distills the page and adds the new page to |article_proto|.
- void DistillPage(const GURL& url);
+ // Distills the next page.
+ void DistillNextPage();
+
+ // Adds the |url| to |pages_to_be_distilled| if |page_no| is a valid relative
+ // page number and |url| is valid. Ignores duplicate pages and urls.
+ void AddToDistillationQueue(int page_no, const GURL& url);
+
+ // Check if |page_no| is a valid relative page number, i.e. page with
+ // |page_no| is either under distillation or has already completed
+ // distillation.
+ bool IsValidPageNo(int page_no) const;
// Runs |distillation_cb_| if all distillation callbacks and image fetches are
// complete.
void RunDistillerCallbackIfDone();
+ // Checks if page |distilled_page_data| has finished distillation, including
+ // all image fetches.
+ void CheckIfPageDone(const DistilledPageData* distilled_page_data);
+
const DistillerURLFetcherFactory& distiller_url_fetcher_factory_;
scoped_ptr<PageDistiller> page_distiller_;
DistillerCallback distillation_cb_;
- ScopedVector<DistillerURLFetcher> image_fetchers_;
- scoped_ptr<DistilledArticleProto> article_proto_;
- bool distillation_in_progress_;
+ // Set of pages which have finished distillation. Note: some pages may be
+ // waiting for image fetches to be complete.
+ // |distilled_pages_index_| maintains the mapping from page number to the
+ // index in |distilled_pages_|.
+ ScopedVector<DistilledPageData> distilled_pages_;
+
+ // Maps page number to the index in |distilled_pages_|.
+ std::map<int, size_t> distilled_pages_index_;
cjhopman 2014/02/12 20:39:09 Couldn't this just point right to the DPD in disti
shashi 2014/02/13 01:03:11 It could but then I have to manually manage the li
cjhopman 2014/02/13 02:48:48 I mean it would point into distilled_pages. I.e. t
shashi 2014/02/13 20:09:51 When I will add incremental updates in my later pa
+
+ // The list of pages that are still waiting for distillation to finish.
+ std::map<int, GURL> pages_to_be_distilled_;
+
+ // The page number of pages that are either waiting for distillation or image
cjhopman 2014/02/12 20:39:09 This includes pages waiting in pages_to_be_distill
shashi 2014/02/13 01:03:11 Yes, it does, any unfinished pages.
+ // fetches. If a page is |in_progress_pages_| that means it is still waiting
+ // for an action (distillation or image fetch) to finish.
+ base::hash_set<int> in_progress_pages_;
cjhopman 2014/02/12 20:39:09 It's unclear how a page distillation works through
shashi 2014/02/13 01:03:11 Done, hopefully more clear now. On 2014/02/12 20:3
cjhopman 2014/02/13 02:48:48 This isn't really more clear, particularly because
shashi 2014/02/13 20:09:51 in_progress_pages_ = all pages that are not finish
cjhopman 2014/02/13 20:43:19 I would much prefer that. It was difficult reading
shashi 2014/02/13 21:57:25 Done.
+
// Set to keep track of which urls are already seen by the distiller.
cjhopman 2014/02/12 20:39:09 This comment should say what this is used for. Als
shashi 2014/02/13 01:03:11 Done.
base::hash_set<std::string> processed_urls_;
+ scoped_ptr<DistilledArticleProto> article_proto_;
cjhopman 2014/02/12 20:39:09 This is only actually created and used in RunDisti
shashi 2014/02/13 01:03:11 Because I was concerned about the change in lifeti
cjhopman 2014/02/13 02:48:48 Ownership of it is passed to the DistillerCallback
shashi 2014/02/13 20:09:51 Duh! Done, that file is task_tracker :). On 2014/
+
DISALLOW_COPY_AND_ASSIGN(DistillerImpl);
};
« no previous file with comments | « no previous file | components/dom_distiller/core/distiller.cc » ('j') | components/dom_distiller/core/distiller.cc » ('J')

Powered by Google App Engine
This is Rietveld 408576698