|
|
DescriptionExtend pdfium_test capability so that more Javascript can be executed
In [1], the lack of support of pdfium_test to some application
level hooks was felt.
More specifically, the lack of implementation of the hook FFI_GetPage,
called when 'this.getAnnot()' is executed in an Acrobar JS context,
makes it non-trivial to JS texts that manipulate PDF annotations.
[1] https://codereview.chromium.org/2265313002/
Here is the failing call stack in pdfium_test:
0 ::RenderPdf (samples/pdfium_test.cc)
1 ::FORM_DoDocumentOpenAction (fpdfsdk/fpdfformfill.cpp)
2 CPDFSDK_Document::ProcOpenAction (fpdfsdk/fsdk_mgr.cpp)
3 CPDFSDK_ActionHandler::DoAction_DocOpen (fpdfsdk/fsdk_actionhandler.cpp)
<----v8---->
4 Document::getAnnot (fpdfsdk/javascript/Document.cpp)
5 CPDFSDK_Document::GetPageView (fpdfsdk/fsdk_mgr.cpp)
6 CPDFDoc_Environment::FFI_GetPage (fpdfsdk/include/fsdk_mgr.h)
(frame 6 returns nullptr, and getAnnot call in frame 4 bails)
CL extends pdfium_test app with a FFI_GetPage hook implementation.
Basically what FFI_GetPage does is returning a FPDF_PAGE instance.
In case of pdfium_test, FPDF_PAGE instances were only created on demand
when the page was going to get rendered, and then discarded.
Since FFI_GetPage can be called by JS before pages are rendered,
CL moved the page creation code into a helper function, and cached
the FPDF_PAGE instances created in a map, so it does not recreate
them needlessly.
BUG=pdfium:492
Committed: https://pdfium.googlesource.com/pdfium/+/3e98158a6c47361ca7d6c2c18d47c9f8f3aabb8a
Patch Set 1 #
Total comments: 5
Messages
Total messages: 16 (4 generated)
Description was changed from ========== Extend pdfium_test capability so that more Javascript can be executed In [1], the lack of support of pdfium_test to some application level hooks was felt. More specifically, the lack of implementation of the hook FFI_GetPage, called when 'this.getAnnot()' is executed in an Acrobar JS context, makes it non-trivial to JS texts that manipulate PDF annotations. Here is the failing call stack in pdfium_test: 0 ::RenderPdf (samples/pdfium_test.cc) 1 ::FORM_DoDocumentOpenAction (fpdfsdk/fpdfformfill.cpp) 2 CPDFSDK_Document::ProcOpenAction (fpdfsdk/fsdk_mgr.cpp) 3 CPDFSDK_ActionHandler::DoAction_DocOpen (fpdfsdk/fsdk_actionhandler.cpp) <----v8----> 4 Document::getAnnot (fpdfsdk/javascript/Document.cpp) 5 CPDFSDK_Document::GetPageView (fpdfsdk/fsdk_mgr.cpp) 6 CPDFDoc_Environment::FFI_GetPage (fpdfsdk/include/fsdk_mgr.h) (frame 6 returns nullptr, and getAnnot call in frame 4 bails) CL extends pdfium_test app with a FFI_GetPage hook implementation. Basically what FFI_GetPage does is returning a FPDF_PAGE instance. In case of pdfium_test, FPDF_PAGE instances were only created on demand when the page was going to get rendered, and then discarded. Since FFI_GetPage can be called by JS before pages are rendered, CL moved the page creation code into a helper function, and cached the FPDF_PAGE instances created in a map, so it does not recreate them needlessly. BUG=pdfium:492 ========== to ========== Extend pdfium_test capability so that more Javascript can be executed In [1], the lack of support of pdfium_test to some application level hooks was felt. More specifically, the lack of implementation of the hook FFI_GetPage, called when 'this.getAnnot()' is executed in an Acrobar JS context, makes it non-trivial to JS texts that manipulate PDF annotations. [1] https://codereview.chromium.org/2265313002/ Here is the failing call stack in pdfium_test: 0 ::RenderPdf (samples/pdfium_test.cc) 1 ::FORM_DoDocumentOpenAction (fpdfsdk/fpdfformfill.cpp) 2 CPDFSDK_Document::ProcOpenAction (fpdfsdk/fsdk_mgr.cpp) 3 CPDFSDK_ActionHandler::DoAction_DocOpen (fpdfsdk/fsdk_actionhandler.cpp) <----v8----> 4 Document::getAnnot (fpdfsdk/javascript/Document.cpp) 5 CPDFSDK_Document::GetPageView (fpdfsdk/fsdk_mgr.cpp) 6 CPDFDoc_Environment::FFI_GetPage (fpdfsdk/include/fsdk_mgr.h) (frame 6 returns nullptr, and getAnnot call in frame 4 bails) CL extends pdfium_test app with a FFI_GetPage hook implementation. Basically what FFI_GetPage does is returning a FPDF_PAGE instance. In case of pdfium_test, FPDF_PAGE instances were only created on demand when the page was going to get rendered, and then discarded. Since FFI_GetPage can be called by JS before pages are rendered, CL moved the page creation code into a helper function, and cached the FPDF_PAGE instances created in a map, so it does not recreate them needlessly. BUG=pdfium:492 ==========
tonikitoo@igalia.com changed reviewers: + dsinclair@chromium.org, jaepark@google.com, thestig@chromium.org, tsepez@chromium.org
I did not get happy with us not being able to add a test in https://codereview.chromium.org/2265313002/ . CL extends pdfium_test to allow annotations (manipulated by JS) to get tested.
https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc File samples/pdfium_test.cc (right): https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; It may actually be closer to what Chromium is doing by not keeping the map and creating the pages on the fly. Chromium will discard pages when they aren't in view in order to save memory. This can at times trigger bugs in PDFium as we execute JS that access pages that are no longer loaded. (Correct me if I'm wrong Lei)
https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc File samples/pdfium_test.cc (right): https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; On 2016/08/25 19:36:49, dsinclair wrote: > It may actually be closer to what Chromium is doing by not keeping the map and > creating the pages on the fly. Chromium keeps track of the pages in src/pdf/pdfium/pdfium_engine.h 44 class PDFiumEngine (..) 621 std::vector<PDFiumPage*> pages_; Then Chromium's FFI_GetPage hook looks like this: 3474 FPDF_PAGE PDFiumEngine::Form_GetPage(FPDF_FORMFILLINFO* param, 3475 FPDF_DOCUMENT document, 3476 int page_index) { 3477 PDFiumEngine* engine = static_cast<PDFiumEngine*>(param); 3478 if (page_index < 0 || page_index >= static_cast<int>(engine->pages_.size())) 3479 return nullptr; 3480 return engine->pages_[page_index]->GetPage(); 3481 } PDfiumPage::GetPage loads the page similarly to what GetPageForIndex is doing. > Chromium will discard pages when they aren't in view in order to save memory. > This can at times trigger bugs in PDFium as we execute JS that access pages that > are no longer loaded. Please keep in mind that when the page is rendered (into PNG or whatever other output format), it gets discarded afterwards. That part of the logic did not change. In summary, we'd load a page 'i' into memory if: 1) JS, for instance, needs it. 2) when RenderPage() is called After (2) runs, page is discarded. > (Correct me if I'm wrong Lei) Thanks for looking, Dan. It'd be indeed great to have Lei's eyes too.
On 2016/08/25 19:48:59, tonikitoo1 wrote: > Chromium keeps track of the pages in src/pdf/pdfium/pdfium_engine.h > > 44 class PDFiumEngine > (..) > 621 std::vector<PDFiumPage*> pages_; > > Then Chromium's FFI_GetPage hook looks like this: > > 3474 FPDF_PAGE PDFiumEngine::Form_GetPage(FPDF_FORMFILLINFO* param, > 3475 FPDF_DOCUMENT document, > 3476 int page_index) { > 3477 PDFiumEngine* engine = static_cast<PDFiumEngine*>(param); > 3478 if (page_index < 0 || page_index >= > static_cast<int>(engine->pages_.size())) > 3479 return nullptr; > 3480 return engine->pages_[page_index]->GetPage(); > 3481 } > > PDfiumPage::GetPage loads the page similarly to what GetPageForIndex is doing. > Right, but if you look at pdfium_engine.cc:2645 when it does a CalculateVisiblePages() it will unload pages from that cache if they are not visible anymore.
On 2016/08/25 20:05:18, dsinclair wrote: > On 2016/08/25 19:48:59, tonikitoo1 wrote: > > Chromium keeps track of the pages in src/pdf/pdfium/pdfium_engine.h > > > > 44 class PDFiumEngine > > (..) > > 621 std::vector<PDFiumPage*> pages_; > > > > Then Chromium's FFI_GetPage hook looks like this: > > > > 3474 FPDF_PAGE PDFiumEngine::Form_GetPage(FPDF_FORMFILLINFO* param, > > 3475 FPDF_DOCUMENT document, > > 3476 int page_index) { > > 3477 PDFiumEngine* engine = static_cast<PDFiumEngine*>(param); > > 3478 if (page_index < 0 || page_index >= > > static_cast<int>(engine->pages_.size())) > > 3479 return nullptr; > > 3480 return engine->pages_[page_index]->GetPage(); > > 3481 } > > > > PDfiumPage::GetPage loads the page similarly to what GetPageForIndex is doing. > > > > Right, but if you look at pdfium_engine.cc:2645 when it does a > CalculateVisiblePages() it will unload pages from that cache if they are not > visible anymore. Good point. I tried to not change this part of the pdfium_test logic. Basically all the existing PDFs already part of PDFium test suite, should not have any changes in the way they are tested: 1) RenderPdf calls 2) Call JS is any 3) RenderPage - which is called in a loop to 3.1) load page x ('x' gets added to the cache, if not there due to JS execution yet) 3.2) render page x 3.3) unloaded page x ('x' is removed from the cache) The JS execution happens before RenderPage starts to get called. If we load page 'x' as part of the JS execution, it will be unloaded when it is rendered just like before. I believe it does diverge from Chromium's behavior. If it is agreed that this** is not something Project would benefit much, it should be fine too. ** by "this" I meant to be able to execute some Acrobat JS in pdfium_test that were not capable before.
On 2016/08/25 20:05:18, dsinclair wrote: > On 2016/08/25 19:48:59, tonikitoo1 wrote: > > Chromium keeps track of the pages in src/pdf/pdfium/pdfium_engine.h > > > > 44 class PDFiumEngine > > (..) > > 621 std::vector<PDFiumPage*> pages_; > > > > Then Chromium's FFI_GetPage hook looks like this: > > > > 3474 FPDF_PAGE PDFiumEngine::Form_GetPage(FPDF_FORMFILLINFO* param, > > 3475 FPDF_DOCUMENT document, > > 3476 int page_index) { > > 3477 PDFiumEngine* engine = static_cast<PDFiumEngine*>(param); > > 3478 if (page_index < 0 || page_index >= > > static_cast<int>(engine->pages_.size())) > > 3479 return nullptr; > > 3480 return engine->pages_[page_index]->GetPage(); > > 3481 } > > > > PDfiumPage::GetPage loads the page similarly to what GetPageForIndex is doing. > > > > > Right, but if you look at pdfium_engine.cc:2645 when it does a > CalculateVisiblePages() it will unload pages from that cache if they are not > visible anymore. (I wrote this comment while on mobile, so I felt like re-posting it in a more understandable way). Good point. It is fair to say that this behavior diverges from Chromium's. Idea here was to open room for a new class of unit/regression testing in pdfium_test. If it is agreed that it is not something Project would benefit much from, it is fine with me too. In summary, it tries to not change the pdfium_test logic for PDFs already part of PDFium test suite today, and so, they should not have any changes in the way they are tested. This is how things are now (without the patch): 1) RenderPdf function is called 2) RenderPdf triggers JS execution, if any, by calling FORM_DoDocumentOpenAction. 2.1) If JS requires a specific page 'x' loaded to run, it simple fails and bail out. 3) RenderPdf calls the RenderPage function in a loop to 3.1) load page x 3.2) render page x 3.3) unload page x This is how it'd be with the patch: 1) RenderPdf function is called 2) RenderPdf triggers JS execution, if any, by calling FORM_DoDocumentOpenAction. 2.1) If, as part of the JS execution, a page 'x' needs to be loaded, it gets loaded and added to the cache so that it is not loaded twice in step 3.1 below. 2.2) JS runs 3) RenderPdf calls the RenderPage function in a loop to 3.1) ask the cache if page 'x' is loaded (by being added during 2.1). If not, load page 'x' and add it *momentarily* to the cache. 3.2) render page 'x' 3.3) unload page 'x' and removed 'x' from the cache. I believe the word "cache", might scare out at first, and give the impression that pages are going to live on it longer than before in the common case. This is not what happens. In the common case, (2.1) does not run (because most PDFs do not have JS) and the step (3) really works equals to how it used to without the patch: each page is loaded, rendered and discarded. What would change with the patch is: if a page is loaded as part of the JS execution, it is added to the cache so that it is not loaded again when it is going to be rendered in (3).
The CQ bit was checked by dsinclair@chromium.org
lgtm Thanks for the explanation, it sounds like what you've got here makes the most sense.
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Extend pdfium_test capability so that more Javascript can be executed In [1], the lack of support of pdfium_test to some application level hooks was felt. More specifically, the lack of implementation of the hook FFI_GetPage, called when 'this.getAnnot()' is executed in an Acrobar JS context, makes it non-trivial to JS texts that manipulate PDF annotations. [1] https://codereview.chromium.org/2265313002/ Here is the failing call stack in pdfium_test: 0 ::RenderPdf (samples/pdfium_test.cc) 1 ::FORM_DoDocumentOpenAction (fpdfsdk/fpdfformfill.cpp) 2 CPDFSDK_Document::ProcOpenAction (fpdfsdk/fsdk_mgr.cpp) 3 CPDFSDK_ActionHandler::DoAction_DocOpen (fpdfsdk/fsdk_actionhandler.cpp) <----v8----> 4 Document::getAnnot (fpdfsdk/javascript/Document.cpp) 5 CPDFSDK_Document::GetPageView (fpdfsdk/fsdk_mgr.cpp) 6 CPDFDoc_Environment::FFI_GetPage (fpdfsdk/include/fsdk_mgr.h) (frame 6 returns nullptr, and getAnnot call in frame 4 bails) CL extends pdfium_test app with a FFI_GetPage hook implementation. Basically what FFI_GetPage does is returning a FPDF_PAGE instance. In case of pdfium_test, FPDF_PAGE instances were only created on demand when the page was going to get rendered, and then discarded. Since FFI_GetPage can be called by JS before pages are rendered, CL moved the page creation code into a helper function, and cached the FPDF_PAGE instances created in a map, so it does not recreate them needlessly. BUG=pdfium:492 ========== to ========== Extend pdfium_test capability so that more Javascript can be executed In [1], the lack of support of pdfium_test to some application level hooks was felt. More specifically, the lack of implementation of the hook FFI_GetPage, called when 'this.getAnnot()' is executed in an Acrobar JS context, makes it non-trivial to JS texts that manipulate PDF annotations. [1] https://codereview.chromium.org/2265313002/ Here is the failing call stack in pdfium_test: 0 ::RenderPdf (samples/pdfium_test.cc) 1 ::FORM_DoDocumentOpenAction (fpdfsdk/fpdfformfill.cpp) 2 CPDFSDK_Document::ProcOpenAction (fpdfsdk/fsdk_mgr.cpp) 3 CPDFSDK_ActionHandler::DoAction_DocOpen (fpdfsdk/fsdk_actionhandler.cpp) <----v8----> 4 Document::getAnnot (fpdfsdk/javascript/Document.cpp) 5 CPDFSDK_Document::GetPageView (fpdfsdk/fsdk_mgr.cpp) 6 CPDFDoc_Environment::FFI_GetPage (fpdfsdk/include/fsdk_mgr.h) (frame 6 returns nullptr, and getAnnot call in frame 4 bails) CL extends pdfium_test app with a FFI_GetPage hook implementation. Basically what FFI_GetPage does is returning a FPDF_PAGE instance. In case of pdfium_test, FPDF_PAGE instances were only created on demand when the page was going to get rendered, and then discarded. Since FFI_GetPage can be called by JS before pages are rendered, CL moved the page creation code into a helper function, and cached the FPDF_PAGE instances created in a map, so it does not recreate them needlessly. BUG=pdfium:492 Committed: https://pdfium.googlesource.com/pdfium/+/3e98158a6c47361ca7d6c2c18d47c9f8f3aa... ==========
Message was sent while issue was closed.
Committed patchset #1 (id:1) as https://pdfium.googlesource.com/pdfium/+/3e98158a6c47361ca7d6c2c18d47c9f8f3aa...
Message was sent while issue was closed.
It would be nice if the sample program can set a good example for those that are learning PDFium code. https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc File samples/pdfium_test.cc (right): https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; On 2016/08/25 19:36:49, dsinclair wrote: > It may actually be closer to what Chromium is doing by not keeping the map and > creating the pages on the fly. > > Chromium will discard pages when they aren't in view in order to save memory. > This can at times trigger bugs in PDFium as we execute JS that access pages that > are no longer loaded. > > (Correct me if I'm wrong Lei) Form_GetPage() does reload the page as shown. The Chrome PDF Viewer does have the choice to unload a page if it like to. The actual implementation is a bit more complex because we need to avoid bonehead maneuvers like unloading while unloading. https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; I believe this actually adds a static initializer, which is not ideal. https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... samples/pdfium_test.cc:63: FPDF_FORMHANDLE g_formHandle; It would have been nice to use GetPageForIndex()'s |param| argument to pass this around, instead of having a global. Maybe make the param a struct to pass in |g_loadedPages| as well.
Message was sent while issue was closed.
On 2016/08/27 01:03:23, Lei Zhang wrote: > It would be nice if the sample program can set a good example for those that are > learning PDFium code. > > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc > File samples/pdfium_test.cc (right): > > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... > samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; > On 2016/08/25 19:36:49, dsinclair wrote: > > It may actually be closer to what Chromium is doing by not keeping the map and > > creating the pages on the fly. > > > > Chromium will discard pages when they aren't in view in order to save memory. > > This can at times trigger bugs in PDFium as we execute JS that access pages > that > > are no longer loaded. > > > > (Correct me if I'm wrong Lei) > > Form_GetPage() does reload the page as shown. The Chrome PDF Viewer does have > the choice to unload a page if it like to. The actual implementation is a bit > more complex because we need to avoid bonehead maneuvers like unloading while > unloading. > > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... > samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; > I believe this actually adds a static initializer, which is not ideal. > > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... > samples/pdfium_test.cc:63: FPDF_FORMHANDLE g_formHandle; > It would have been nice to use GetPageForIndex()'s |param| argument to pass this > around, instead of having a global. Maybe make the param a struct to pass in > |g_loadedPages| as well. Thanks Lei. I will get your comments addressed in a follow up CL.
Message was sent while issue was closed.
On 2016/08/27 04:57:33, tonikitoo wrote: > On 2016/08/27 01:03:23, Lei Zhang wrote: > > It would be nice if the sample program can set a good example for those that > are > > learning PDFium code. > > > > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc > > File samples/pdfium_test.cc (right): > > > > > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... > > samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; > > On 2016/08/25 19:36:49, dsinclair wrote: > > > It may actually be closer to what Chromium is doing by not keeping the map > and > > > creating the pages on the fly. > > > > > > Chromium will discard pages when they aren't in view in order to save > memory. > > > This can at times trigger bugs in PDFium as we execute JS that access pages > > that > > > are no longer loaded. > > > > > > (Correct me if I'm wrong Lei) > > > > Form_GetPage() does reload the page as shown. The Chrome PDF Viewer does have > > the choice to unload a page if it like to. The actual implementation is a bit > > more complex because we need to avoid bonehead maneuvers like unloading while > > unloading. > > > > > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... > > samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; > > I believe this actually adds a static initializer, which is not ideal. > > > > > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc... > > samples/pdfium_test.cc:63: FPDF_FORMHANDLE g_formHandle; > > It would have been nice to use GetPageForIndex()'s |param| argument to pass > this > > around, instead of having a global. Maybe make the param a struct to pass in > > |g_loadedPages| as well. > > Thanks Lei. I will get your comments addressed in a follow up CL. For the sake of completeness/history: https://codereview.chromium.org/2330043002 |