|
|
Chromium Code Reviews|
Created:
6 years, 4 months ago by jun_fang Modified:
5 years, 9 months ago Base URL:
https://pdfium.googlesource.com/pdfium.git@master Visibility:
Public. |
DescriptionFix the issue 'SEGV on unknown address in CPDF_DataAvail::GetObjectSize'
BUG=387983
R=tsepez@chromium.org
Committed: https://pdfium.googlesource.com/pdfium/+/c655167
Patch Set 1 #
Total comments: 8
Patch Set 2 : #
Messages
Total messages: 12 (0 generated)
Please start to review this fix. Thanks!
As an outside observer to this code, I'm a little confused by the logic and I expect there may be a clearer way of expressing what it is we want to do here. Your fix will solve the current problem, but if you think about the following questions, the right structure may emerge. Thanks for all you efforts! https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... File core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp (right): https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4110: if (!m_pFileAvail->IsDataAvail(m_dwLastXRefOffset, (FX_DWORD)(m_dwFileLen - m_dwLastXRefOffset))) { Is this going to change on subsequent calls? In other words, if we try this once and it fails, is there ever a possibility it might succeed? It doesn't seem that way (unless the file is being created underneath us) yet we end up repeating it. Should this move under the if (!m_bmainXrefLoad) branch below? https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4115: if (!m_bMainXRefLoad) { does m_bMainXRefLoad mean "main xref load succeeded" or "main xref load was tried once already"? I think its the later, and if so, the name should be m_bMainXrefLoadTried. https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4123: m_bMainXRefLoadedOK = TRUE; What's the difference between m_bMainXrefLoadedOK and m_bLinearizedDataOK? Is it ever the case that one is true without the other?
https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... File core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp (right): https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4110: if (!m_pFileAvail->IsDataAvail(m_dwLastXRefOffset, (FX_DWORD)(m_dwFileLen - m_dwLastXRefOffset))) { On 2014/08/09 17:55:36, Tom Sepez wrote: > Is this going to change on subsequent calls? In other words, if we try this > once and it fails, is there ever a possibility it might succeed? It doesn't seem > that way (unless the file is being created underneath us) yet we end up > repeating it. Should this move under the if (!m_bmainXrefLoad) branch below? If IsDataAvail returns FALSE, that means that data are not available completely for parsing them into XRefTable which is very important data structure in PDF. So it just needs to return FALSE rather than loading XRefTable because data for parsing are not available. https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4115: if (!m_bMainXRefLoad) { On 2014/08/09 17:55:36, Tom Sepez wrote: > does m_bMainXRefLoad mean "main xref load succeeded" or "main xref load was > tried once already"? I think its the later, and if so, the name should be > m_bMainXrefLoadTried. if |m_bMainXRefLoad| is true, it means that xref is loaded successfully not the second meaning you mentioned. https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4123: m_bMainXRefLoadedOK = TRUE; On 2014/08/09 17:55:36, Tom Sepez wrote: > What's the difference between m_bMainXrefLoadedOK and m_bLinearizedDataOK? Is > it ever the case that one is true without the other? They are different. |m_bMainXrefLoadedOK is true| means that XRef table is loaded successfully. |m_bLinearizedDataOK is true| means that data are downloaded and available for being parsed.
On 2014/08/18 06:58:00, jun_fang wrote: > https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... > File core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp (right): > > https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... > core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4110: if > (!m_pFileAvail->IsDataAvail(m_dwLastXRefOffset, (FX_DWORD)(m_dwFileLen - > m_dwLastXRefOffset))) { > On 2014/08/09 17:55:36, Tom Sepez wrote: > > Is this going to change on subsequent calls? In other words, if we try this > > once and it fails, is there ever a possibility it might succeed? It doesn't > seem > > that way (unless the file is being created underneath us) yet we end up > > repeating it. Should this move under the if (!m_bmainXrefLoad) branch below? > > If IsDataAvail returns FALSE, that means that data are not available completely > for parsing them into XRefTable which is very important data structure in PDF. > So it just needs to return FALSE rather than loading XRefTable because data for > parsing are not available. > > Right. So why retry the operation rather than remembering that it failed and return FALSE? https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... > core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4115: if (!m_bMainXRefLoad) > { > On 2014/08/09 17:55:36, Tom Sepez wrote: > > does m_bMainXRefLoad mean "main xref load succeeded" or "main xref load was > > tried once already"? I think its the later, and if so, the name should be > > m_bMainXrefLoadTried. > > if |m_bMainXRefLoad| is true, it means that xref is loaded successfully not the > second meaning you mentioned. > Right. But wouldn't the first one be more useful? Since we could avoid a retry and return FALSE? We'd have to remember the status the last time around ... > https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... > core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4123: m_bMainXRefLoadedOK = > TRUE; > On 2014/08/09 17:55:36, Tom Sepez wrote: > > What's the difference between m_bMainXrefLoadedOK and m_bLinearizedDataOK? Is > > it ever the case that one is true without the other? > > They are different. |m_bMainXrefLoadedOK is true| means that XRef table is > loaded successfully. |m_bLinearizedDataOK is true| means that data are > downloaded and available for being parsed. Ok.
> > So it just needs to return FALSE rather than loading XRefTable because data > for > > parsing are not available. > > > > I guess what I'm asking is whether you mean data for parsing are not available (ever) vs. data for parsing are not available (yet).
I guess we just need some clarifying documentation for these fields. https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... File core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp (right): https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4115: if (!m_bMainXRefLoad) { > > does m_bMainXRefLoad mean "main xref load succeeded" or "main xref load was > > tried once already"? I think its the later, and if so, the name should be > > m_bMainXrefLoadTried. > > if |m_bMainXRefLoad| is true, it means that xref is loaded successfully not the > second meaning you mentioned. In that case, I would call it |m_bMainXRefLoadSucceeded|. https://codereview.chromium.org/454283002/diff/1/core/src/fpdfapi/fpdf_parser... core/src/fpdfapi/fpdf_parser/fpdf_parser_parser.cpp:4123: m_bMainXRefLoadedOK = TRUE; > They are different. |m_bMainXrefLoadedOK is true| means that XRef table is > loaded successfully. Wait, then what does |m_bMainXRefLoad| mean that is distinct from |m_bMainXRefLoadedOK|?
> Wait, then what does |m_bMainXRefLoad| mean that is distinct from
> |m_bMainXRefLoadedOK|?
Good question. Basically, I'd like to cram this into the following pattern, if
it will fit. So initially,
m_tried = false;
m_result = false
then the code looks like
if (!m_tried) {
m_tried = true;
if (some other stuff works)
m_result = true;
}
return m_result;
Yet there seem to be 3 bools in play here, not 2, and I cant quite grasp.
On 2014/08/19 21:56:09, Tom Sepez wrote:
> > Wait, then what does |m_bMainXRefLoad| mean that is distinct from
> > |m_bMainXRefLoadedOK|?
>
> Good question. Basically, I'd like to cram this into the following pattern,
if
> it will fit. So initially,
> m_tried = false;
> m_result = false
>
> then the code looks like
> if (!m_tried) {
> m_tried = true;
> if (some other stuff works)
> m_result = true;
> }
> return m_result;
>
> Yet there seem to be 3 bools in play here, not 2, and I cant quite grasp.
Let me explain these three parameters |m_bLinearedDataOK|, |m_bMainXRefLoadedOK|
and |m_bMainXRefLoad|. My last answer about the meaning of |m_bMainXRefLoad| is
not correct.
|m_bLinearedDataOK|: TRUE means the lineared data are ready.
For the lineared PDF file, pfium will load pages or load the whole file only
when |m_bLinearedDataOK| is true.
|m_bMainXRefLoadedOK|: TRUE means loading XRef table successfully.
Pdfium will load pages rather than loading the whole file when
|m_bMainXRefLoadedOK| is true. It's because
pdfium knows all the position of objects in pdf files after XRef table is
loaded. If |m_bMainXRefLoadedOK|
is FALSE, pdfium has to read the whole file.
|m_bMainXRefLoad|: TRUE means that XRef has been loaded. It can be
successful or not.
OK, this is starting to make some sense. When we're done here, we'll need to
put the information you've provided above into the .h file as comments for those
members.
I think it would make us happy if m_bMainXrefLoad => m_bMainXrefLoadTried
everywhere.
My take on the code would be something like this:
FX_BOOL CPDF_DataAvail::CheckLinearizedData(IFX_DownloadHints* pHints)
{
if (!m_bMainXrefLoadTried) {
if (!m_pFileAvail->IsDataAvail(m_dwLastXRefOffset,
(FX_DWORD)(m_dwFileLen - m_dwLastXRefOffset))) {
pHints->AddSegment(m_dwLastXRefOffset, (FX_DWORD)(m_dwFileLen -
m_dwLastXRefOffset));
return FALSE; // Data not available now, but may be later on.
Continue trying.
}
// Data is avaliable, so it doesn't work, its never going to work later
on. No more retrying.
m_bMainXrefLoadTried = TRUE;
FX_DWORD dwRet = ((CPDF_Parser
*)m_pDocument->GetParser())->LoadLinearizedMainXRefTable();
if (dwRet != PDFPARSE_ERROR_SUCCESS) {
return FALSE;
}
if (!PreparePageItem()) {
return FALSE;
}
m_bMainXRefLoadedOK = TRUE;
// If there were additional steps to validate the lineared data beyond
the loading of the x table,
// they'd go here. But there aren't
m_bLinearedDataOK = TRUE;
}
return m_bLinearedDataOK;
}
Whether this is correct or not depends on my assertion that once the step where
data is available has completed, the parse will always succeed or always fail,
no matter what else changes. Ditto for the PerparePageItem. True?
Hi Tom, please review this fix again. Thanks!
lgtm
Message was sent while issue was closed.
Committed patchset #2 manually as c655167 (presubmit successful). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
