|
|
DescriptionTraverse PDF page tree only once in CPDF_Document
Try 2: main fix was recursively popping elements from the stack. Since
the Traverse method can be called on non-root nodes from GetPage(), we
have to make sure to properly update the parents.
Try 1 at https://codereview.chromium.org/2414423002/
In our current implementation of CPDF_Document::GetPage, we traverse
the PDF page tree until we find the index we are looking for. This is
slow when we do calls GetPage(0), GetPage(1), ... since in this case
the page tree will be traversed n times if there are n pages. This CL
makes sure the page tree is only traversed once.
Time to load the PDF from the bug below in chrome official build:
Before this CL: around 1 minute 25 seconds
After this CL: around 4 seconds
BUG=chromium:638513
Committed: https://pdfium.googlesource.com/pdfium/+/d3a2009d75eac3cda442f545ef0865afae7b35cf
Patch Set 1 #
Total comments: 16
Patch Set 2 : Comments #
Total comments: 2
Patch Set 3 : Nit #
Messages
Total messages: 24 (15 generated)
The CQ bit was checked by npm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: BUG=chromium:638513 ========== to ========== Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: BUG=chromium:638513 ==========
npm@chromium.org changed reviewers: + thestig@chromium.org, tsepez@chromium.org, weili@chromium.org
Description was changed from ========== Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: BUG=chromium:638513 ========== to ========== Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: around 1 minute 25 seconds After this CL: around 4 seconds BUG=chromium:638513 ==========
PTAL
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: around 1 minute 25 seconds After this CL: around 4 seconds BUG=chromium:638513 ========== to ========== Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. Try 1 at https://codereview.chromium.org/2414423002/ In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: around 1 minute 25 seconds After this CL: around 4 seconds BUG=chromium:638513 ==========
Anyone wants to take this? :) Added some comments for clarity https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... File core/fpdfapi/parser/cpdf_document.cpp (right): https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:405: void CPDF_Document::PopAndPropagate() { Pop current node. Update index for the parent. Pop parent if it is done. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:420: if (!pKidList) { This only happens when Pages dict is actually a page. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:455: m_pTreeTraversal.push(std::make_pair(pKid, -1)); Traverse kid, and break if the number of pages to go is <= number of pages in the kid, since this means that the answer is whatever the kid found. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:499: TraversePDFPages(iPage, iPage - m_iLastPageTraversed); Traverse iPage - m_iLastPageTraversed pages, where the last page processed is iPage https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... File core/fpdfapi/parser/cpdf_document.h (right): https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.h:137: std::stack<std::pair<CPDF_Dictionary*, int>> m_pTreeTraversal; Stack of page nodes to know current position in page tree. Int is the index of last processed child. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.h:138: int m_iLastPageTraversed; Represents the index of the last page (leaf) processed from the page tree
https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... File core/fpdfapi/parser/cpdf_document.cpp (right): https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:406: m_pTreeTraversal.pop(); Do we know that it's not empty to begin with? https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:407: if (!m_pTreeTraversal.empty()) { nit: maybe early return. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:411: static_cast<int>(top->first->GetArrayFor("Kids")->GetCount() - 1)) can count change inbetween subsequent calls so that we might skip over the end and iterate forever? https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... File core/fpdfapi/parser/cpdf_document.h (right): https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.h:137: std::stack<std::pair<CPDF_Dictionary*, int>> m_pTreeTraversal; On 2016/10/26 15:04:06, npm wrote: > Stack of page nodes to know current position in page tree. Int is the index > of last processed child. nit: you might add this as a comment. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.h:138: int m_iLastPageTraversed; On 2016/10/26 15:04:06, npm wrote: > Represents the index of the last page (leaf) processed from the page tree nit: // Index of last page (leaf) processed from page tree.
The CQ bit was checked by npm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... File core/fpdfapi/parser/cpdf_document.cpp (right): https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:406: m_pTreeTraversal.pop(); On 2016/10/26 15:52:23, Tom Sepez wrote: > Do we know that it's not empty to begin with? Yes, it is only be called when it is nonempty. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:407: if (!m_pTreeTraversal.empty()) { On 2016/10/26 15:52:23, Tom Sepez wrote: > nit: maybe early return. Done. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.cpp:411: static_cast<int>(top->first->GetArrayFor("Kids")->GetCount() - 1)) On 2016/10/26 15:52:23, Tom Sepez wrote: > can count change inbetween subsequent calls so that we might skip over the end > and iterate forever? When it is changed in InsertDeletePDFPage, we reset the pdf traversing. Added reset to place in InsertNewPage where it can change. CreateNewDoc sets Count to 1, but always called right after constructor. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... File core/fpdfapi/parser/cpdf_document.h (right): https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.h:137: std::stack<std::pair<CPDF_Dictionary*, int>> m_pTreeTraversal; On 2016/10/26 15:52:23, Tom Sepez wrote: > On 2016/10/26 15:04:06, npm wrote: > > Stack of page nodes to know current position in page tree. Int is the index > > of last processed child. > > nit: you might add this as a comment. Done. https://codereview.chromium.org/2442403002/diff/1/core/fpdfapi/parser/cpdf_do... core/fpdfapi/parser/cpdf_document.h:138: int m_iLastPageTraversed; On 2016/10/26 15:52:23, Tom Sepez wrote: > On 2016/10/26 15:04:06, npm wrote: > > Represents the index of the last page (leaf) processed from the page tree > > nit: // Index of last page (leaf) processed from page tree. Done.
LGTM. I think this works. Let's give it another shot. https://codereview.chromium.org/2442403002/diff/20001/core/fpdfapi/parser/cpd... File core/fpdfapi/parser/cpdf_document.cpp (right): https://codereview.chromium.org/2442403002/diff/20001/core/fpdfapi/parser/cpd... core/fpdfapi/parser/cpdf_document.cpp:412: static_cast<int>(top->first->GetArrayFor("Kids")->GetCount() - 1)) nit: {}
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2442403002/diff/20001/core/fpdfapi/parser/cpd... File core/fpdfapi/parser/cpdf_document.cpp (right): https://codereview.chromium.org/2442403002/diff/20001/core/fpdfapi/parser/cpd... core/fpdfapi/parser/cpdf_document.cpp:412: static_cast<int>(top->first->GetArrayFor("Kids")->GetCount() - 1)) On 2016/10/26 17:24:42, Tom Sepez wrote: > nit: {} Done.
The CQ bit was checked by npm@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from tsepez@chromium.org Link to the patchset: https://codereview.chromium.org/2442403002/#ps40001 (title: "Nit")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. Try 1 at https://codereview.chromium.org/2414423002/ In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: around 1 minute 25 seconds After this CL: around 4 seconds BUG=chromium:638513 ========== to ========== Traverse PDF page tree only once in CPDF_Document Try 2: main fix was recursively popping elements from the stack. Since the Traverse method can be called on non-root nodes from GetPage(), we have to make sure to properly update the parents. Try 1 at https://codereview.chromium.org/2414423002/ In our current implementation of CPDF_Document::GetPage, we traverse the PDF page tree until we find the index we are looking for. This is slow when we do calls GetPage(0), GetPage(1), ... since in this case the page tree will be traversed n times if there are n pages. This CL makes sure the page tree is only traversed once. Time to load the PDF from the bug below in chrome official build: Before this CL: around 1 minute 25 seconds After this CL: around 4 seconds BUG=chromium:638513 Committed: https://pdfium.googlesource.com/pdfium/+/d3a2009d75eac3cda442f545ef0865afae7b... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://pdfium.googlesource.com/pdfium/+/d3a2009d75eac3cda442f545ef0865afae7b...
Message was sent while issue was closed.
A revert of this CL (patchset #3 id:40001) has been created in https://codereview.chromium.org/2461063003/ by npm@chromium.org. The reason for reverting is: Not quite right yet.. |