Issue 2277063003: Extend pdfium_test capability so that more Javascript can be executed

Issue 2277063003: Extend pdfium_test capability so that more Javascript can be executed (Closed)

Created:
4 years, 3 months ago by tonikitoo

Modified:
4 years, 3 months ago

Reviewers:
Tom Sepez, dsinclair, jaepark, Lei Zhang

CC:
pdfium-reviews_googlegroups.com

Base URL:
https://pdfium.googlesource.com/pdfium.git@master

Target Ref:
refs/heads/master

Project:
pdfium

Visibility:
Public.

More Reviews

Description

Extend pdfium_test capability so that more Javascript can be executed In [1], the lack of support of pdfium_test to some application level hooks was felt. More specifically, the lack of implementation of the hook FFI_GetPage, called when 'this.getAnnot()' is executed in an Acrobar JS context, makes it non-trivial to JS texts that manipulate PDF annotations. [1] https://codereview.chromium.org/2265313002/ Here is the failing call stack in pdfium_test: 0 ::RenderPdf (samples/pdfium_test.cc) 1 ::FORM_DoDocumentOpenAction (fpdfsdk/fpdfformfill.cpp) 2 CPDFSDK_Document::ProcOpenAction (fpdfsdk/fsdk_mgr.cpp) 3 CPDFSDK_ActionHandler::DoAction_DocOpen (fpdfsdk/fsdk_actionhandler.cpp) <----v8----> 4 Document::getAnnot (fpdfsdk/javascript/Document.cpp) 5 CPDFSDK_Document::GetPageView (fpdfsdk/fsdk_mgr.cpp) 6 CPDFDoc_Environment::FFI_GetPage (fpdfsdk/include/fsdk_mgr.h) (frame 6 returns nullptr, and getAnnot call in frame 4 bails) CL extends pdfium_test app with a FFI_GetPage hook implementation. Basically what FFI_GetPage does is returning a FPDF_PAGE instance. In case of pdfium_test, FPDF_PAGE instances were only created on demand when the page was going to get rendered, and then discarded. Since FFI_GetPage can be called by JS before pages are rendered, CL moved the page creation code into a helper function, and cached the FPDF_PAGE instances created in a map, so it does not recreate them needlessly. BUG=pdfium:492 Committed: https://pdfium.googlesource.com/pdfium/+/3e98158a6c47361ca7d6c2c18d47c9f8f3aabb8a

Patch Set 1 #

Total comments: 5

Created: 4 years, 3 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Stats (+35 lines, -6 lines)			Patch
M	samples/pdfium_test.cc	View	6 chunks	+35 lines, -6 lines	5 comments	Download
A	testing/resources/pixel/bug_492.pdf	View	Binary file		0 comments	Download
A	testing/resources/pixel/bug_492.pdf.0.png	View	Binary file		0 comments	Download

Messages

Total messages: 16 (4 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

tonikitoo

Description was changed from ========== Extend pdfium_test capability so that more Javascript can be executed ...

4 years, 3 months ago (2016-08-25 19:28:24 UTC) #1

Description was changed from

==========
Extend pdfium_test capability so that more Javascript can be executed

In [1], the lack of support of pdfium_test to some application
level hooks was felt.
More specifically, the lack of implementation of the hook FFI_GetPage,
called  when 'this.getAnnot()' is executed in an Acrobar JS context,
makes it non-trivial to JS texts that manipulate PDF annotations.

Here is the failing call stack in pdfium_test:

0 ::RenderPdf                              (samples/pdfium_test.cc)
1 ::FORM_DoDocumentOpenAction              (fpdfsdk/fpdfformfill.cpp)
2 CPDFSDK_Document::ProcOpenAction         (fpdfsdk/fsdk_mgr.cpp)
3 CPDFSDK_ActionHandler::DoAction_DocOpen  (fpdfsdk/fsdk_actionhandler.cpp)
  <----v8---->
4 Document::getAnnot                       (fpdfsdk/javascript/Document.cpp)
5 CPDFSDK_Document::GetPageView            (fpdfsdk/fsdk_mgr.cpp)
6 CPDFDoc_Environment::FFI_GetPage         (fpdfsdk/include/fsdk_mgr.h)

(frame 6 returns nullptr, and getAnnot call in frame 4 bails)

CL extends pdfium_test app with a FFI_GetPage hook implementation.

Basically what FFI_GetPage does is returning a FPDF_PAGE instance.
In case of pdfium_test, FPDF_PAGE instances were only created on demand
when the page was going to get rendered, and then discarded.

Since FFI_GetPage can be called by JS before pages are rendered,
CL moved the page creation code into a helper function, and cached
the FPDF_PAGE instances created in a map, so it does not recreate
them needlessly.

BUG=pdfium:492
==========

to

==========
Extend pdfium_test capability so that more Javascript can be executed

In [1], the lack of support of pdfium_test to some application
level hooks was felt.
More specifically, the lack of implementation of the hook FFI_GetPage,
called  when 'this.getAnnot()' is executed in an Acrobar JS context,
makes it non-trivial to JS texts that manipulate PDF annotations.

[1] https://codereview.chromium.org/2265313002/

Here is the failing call stack in pdfium_test:

0 ::RenderPdf                              (samples/pdfium_test.cc)
1 ::FORM_DoDocumentOpenAction              (fpdfsdk/fpdfformfill.cpp)
2 CPDFSDK_Document::ProcOpenAction         (fpdfsdk/fsdk_mgr.cpp)
3 CPDFSDK_ActionHandler::DoAction_DocOpen  (fpdfsdk/fsdk_actionhandler.cpp)
  <----v8---->
4 Document::getAnnot                       (fpdfsdk/javascript/Document.cpp)
5 CPDFSDK_Document::GetPageView            (fpdfsdk/fsdk_mgr.cpp)
6 CPDFDoc_Environment::FFI_GetPage         (fpdfsdk/include/fsdk_mgr.h)

(frame 6 returns nullptr, and getAnnot call in frame 4 bails)

CL extends pdfium_test app with a FFI_GetPage hook implementation.

Basically what FFI_GetPage does is returning a FPDF_PAGE instance.
In case of pdfium_test, FPDF_PAGE instances were only created on demand
when the page was going to get rendered, and then discarded.

Since FFI_GetPage can be called by JS before pages are rendered,
CL moved the page creation code into a helper function, and cached
the FPDF_PAGE instances created in a map, so it does not recreate
them needlessly.

BUG=pdfium:492
==========

tonikitoo

I did not get happy with us not being able to add a test in ...

4 years, 3 months ago (2016-08-25 19:29:16 UTC) #3

dsinclair

https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc File samples/pdfium_test.cc (right): https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newcode59 samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages; It may actually be closer to ...

4 years, 3 months ago (2016-08-25 19:36:49 UTC) #4

tonikitoo

4 years, 3 months ago (2016-08-25 19:48:59 UTC) #5

dsinclair

On 2016/08/25 19:48:59, tonikitoo1 wrote: > Chromium keeps track of the pages in src/pdf/pdfium/pdfium_engine.h > ...

4 years, 3 months ago (2016-08-25 20:05:18 UTC) #6

tonikitoo

On 2016/08/25 20:05:18, dsinclair wrote: > On 2016/08/25 19:48:59, tonikitoo1 wrote: > > Chromium keeps ...

4 years, 3 months ago (2016-08-25 21:44:39 UTC) #7

tonikitoo

On 2016/08/25 20:05:18, dsinclair wrote: > On 2016/08/25 19:48:59, tonikitoo1 wrote: > > Chromium keeps ...

4 years, 3 months ago (2016-08-26 13:37:36 UTC) #8

On 2016/08/25 20:05:18, dsinclair wrote:
> On 2016/08/25 19:48:59, tonikitoo1 wrote:
> > Chromium keeps track of the pages in src/pdf/pdfium/pdfium_engine.h
> > 
> > 44 class PDFiumEngine 
> > (..)
> > 621   std::vector<PDFiumPage*> pages_; 
> > 
> > Then Chromium's FFI_GetPage hook looks like this:
> > 
> > 3474 FPDF_PAGE PDFiumEngine::Form_GetPage(FPDF_FORMFILLINFO* param,
> > 3475                                      FPDF_DOCUMENT document,
> > 3476                                      int page_index) {
> > 3477   PDFiumEngine* engine = static_cast<PDFiumEngine*>(param);
> > 3478   if (page_index < 0 || page_index >=
> > static_cast<int>(engine->pages_.size()))
> > 3479     return nullptr;
> > 3480   return engine->pages_[page_index]->GetPage();
> > 3481 }   
> > 
> > PDfiumPage::GetPage loads the page similarly to what GetPageForIndex is
doing.
> >  
> 
> 
> Right, but if you look at pdfium_engine.cc:2645 when it does a
> CalculateVisiblePages() it will unload pages from that cache if they are not
> visible anymore.

(I wrote this comment while on mobile, so I felt like re-posting it in a more
understandable way).

Good point. It is fair to say that this behavior diverges from Chromium's.

Idea here was to open room for a new class of unit/regression testing in
pdfium_test.
If it is agreed that it is not something Project would benefit much from, it is
fine with me too.

In summary, it tries to not change the pdfium_test logic for PDFs already part
of PDFium test suite today,
and so, they should not have any changes in the way they are tested.

This is how things are now (without the patch):

1) RenderPdf function is called
2) RenderPdf triggers JS execution, if any, by calling
FORM_DoDocumentOpenAction.
2.1) If JS requires a specific page 'x' loaded to run, it simple fails and bail
out.
3) RenderPdf calls the RenderPage function in a loop to
3.1) load page x
3.2) render page x 
3.3) unload page x

This is how it'd be with the patch:

1) RenderPdf function is called
2) RenderPdf triggers JS execution, if any, by calling
FORM_DoDocumentOpenAction.
2.1) If, as part of the JS execution, a page 'x' needs to be loaded, it gets
loaded and added to
     the cache so that it is not loaded twice in step 3.1 below.
2.2) JS runs
3) RenderPdf calls the RenderPage function in a loop to
3.1) ask the cache if page 'x' is loaded (by being added during 2.1). If not,
load page 'x' and add it *momentarily* to the cache.
3.2) render page 'x' 
3.3) unload page 'x' and removed 'x' from the cache.

I believe the word "cache", might scare out at first, and give the impression
that pages are going to live on it longer than before in the common case. This
is not what happens. 

In the common case, (2.1) does not run (because most PDFs do not have JS) and
the step (3) really works equals to how it used to without the patch: each page
is loaded, rendered and discarded.

What would change with the patch is: if a page is loaded as part of the JS
execution, it is added to the cache so that it is not loaded again when it is
going to be rendered in (3).

dsinclair

lgtm Thanks for the explanation, it sounds like what you've got here makes the most ...

4 years, 3 months ago (2016-08-26 15:16:03 UTC) #10

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2277063003/1

4 years, 3 months ago (2016-08-26 15:16:15 UTC) #11

commit-bot: I haz the power

Description was changed from ========== Extend pdfium_test capability so that more Javascript can be executed ...

4 years, 3 months ago (2016-08-26 15:37:15 UTC) #12

Message was sent while issue was closed.

Description was changed from

==========
Extend pdfium_test capability so that more Javascript can be executed

In [1], the lack of support of pdfium_test to some application
level hooks was felt.
More specifically, the lack of implementation of the hook FFI_GetPage,
called  when 'this.getAnnot()' is executed in an Acrobar JS context,
makes it non-trivial to JS texts that manipulate PDF annotations.

[1] https://codereview.chromium.org/2265313002/

Here is the failing call stack in pdfium_test:

0 ::RenderPdf                              (samples/pdfium_test.cc)
1 ::FORM_DoDocumentOpenAction              (fpdfsdk/fpdfformfill.cpp)
2 CPDFSDK_Document::ProcOpenAction         (fpdfsdk/fsdk_mgr.cpp)
3 CPDFSDK_ActionHandler::DoAction_DocOpen  (fpdfsdk/fsdk_actionhandler.cpp)
  <----v8---->
4 Document::getAnnot                       (fpdfsdk/javascript/Document.cpp)
5 CPDFSDK_Document::GetPageView            (fpdfsdk/fsdk_mgr.cpp)
6 CPDFDoc_Environment::FFI_GetPage         (fpdfsdk/include/fsdk_mgr.h)

(frame 6 returns nullptr, and getAnnot call in frame 4 bails)

CL extends pdfium_test app with a FFI_GetPage hook implementation.

Basically what FFI_GetPage does is returning a FPDF_PAGE instance.
In case of pdfium_test, FPDF_PAGE instances were only created on demand
when the page was going to get rendered, and then discarded.

Since FFI_GetPage can be called by JS before pages are rendered,
CL moved the page creation code into a helper function, and cached
the FPDF_PAGE instances created in a map, so it does not recreate
them needlessly.

BUG=pdfium:492
==========

to

==========
Extend pdfium_test capability so that more Javascript can be executed

In [1], the lack of support of pdfium_test to some application
level hooks was felt.
More specifically, the lack of implementation of the hook FFI_GetPage,
called  when 'this.getAnnot()' is executed in an Acrobar JS context,
makes it non-trivial to JS texts that manipulate PDF annotations.

[1] https://codereview.chromium.org/2265313002/

Here is the failing call stack in pdfium_test:

0 ::RenderPdf                              (samples/pdfium_test.cc)
1 ::FORM_DoDocumentOpenAction              (fpdfsdk/fpdfformfill.cpp)
2 CPDFSDK_Document::ProcOpenAction         (fpdfsdk/fsdk_mgr.cpp)
3 CPDFSDK_ActionHandler::DoAction_DocOpen  (fpdfsdk/fsdk_actionhandler.cpp)
  <----v8---->
4 Document::getAnnot                       (fpdfsdk/javascript/Document.cpp)
5 CPDFSDK_Document::GetPageView            (fpdfsdk/fsdk_mgr.cpp)
6 CPDFDoc_Environment::FFI_GetPage         (fpdfsdk/include/fsdk_mgr.h)

(frame 6 returns nullptr, and getAnnot call in frame 4 bails)

CL extends pdfium_test app with a FFI_GetPage hook implementation.

Basically what FFI_GetPage does is returning a FPDF_PAGE instance.
In case of pdfium_test, FPDF_PAGE instances were only created on demand
when the page was going to get rendered, and then discarded.

Since FFI_GetPage can be called by JS before pages are rendered,
CL moved the page creation code into a helper function, and cached
the FPDF_PAGE instances created in a map, so it does not recreate
them needlessly.

BUG=pdfium:492

Committed:
https://pdfium.googlesource.com/pdfium/+/3e98158a6c47361ca7d6c2c18d47c9f8f3aa...
==========

commit-bot: I haz the power

Committed patchset #1 (id:1) as https://pdfium.googlesource.com/pdfium/+/3e98158a6c47361ca7d6c2c18d47c9f8f3aabb8a

4 years, 3 months ago (2016-08-26 15:37:16 UTC) #13

Lei Zhang

It would be nice if the sample program can set a good example for those ...

4 years, 3 months ago (2016-08-27 01:03:23 UTC) #14

tonikitoo

On 2016/08/27 01:03:23, Lei Zhang wrote: > It would be nice if the sample program ...

4 years, 3 months ago (2016-08-27 04:57:33 UTC) #15

tonikitoo

4 years, 3 months ago (2016-09-12 15:04:40 UTC) #16

Message was sent while issue was closed.

On 2016/08/27 04:57:33, tonikitoo wrote:
> On 2016/08/27 01:03:23, Lei Zhang wrote:
> > It would be nice if the sample program can set a good example for those that
> are
> > learning PDFium code.
> > 
> > https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc
> > File samples/pdfium_test.cc (right):
> > 
> >
>
https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc...
> > samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages;
> > On 2016/08/25 19:36:49, dsinclair wrote:
> > > It may actually be closer to what Chromium is doing by not keeping the map
> and
> > > creating the pages on the fly.
> > > 
> > > Chromium will discard pages when they aren't in view in order to save
> memory.
> > > This can at times trigger bugs in PDFium as we execute JS that access
pages
> > that
> > > are no longer loaded.
> > > 
> > > (Correct me if I'm wrong Lei)
> > 
> > Form_GetPage() does reload the page as shown. The Chrome PDF Viewer does
have
> > the choice to unload a page if it like to. The actual implementation is a
bit
> > more complex because we need to avoid bonehead maneuvers like unloading
while
> > unloading.
> > 
> >
>
https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc...
> > samples/pdfium_test.cc:59: std::map<int, FPDF_PAGE> g_loadedPages;
> > I believe this actually adds a static initializer, which is not ideal.
> > 
> >
>
https://codereview.chromium.org/2277063003/diff/1/samples/pdfium_test.cc#newc...
> > samples/pdfium_test.cc:63: FPDF_FORMHANDLE g_formHandle;
> > It would have been nice to use GetPageForIndex()'s |param| argument to pass
> this
> > around, instead of having a global. Maybe make the param a struct to pass in
> > |g_loadedPages| as well.
> 
> Thanks Lei. I will get your comments addressed in a follow up CL.

For the sake of completeness/history: https://codereview.chromium.org/2330043002

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages