|
|
DescriptionSimplify CPDF_TextPage::FindTextlineFlowOrientation().
Committed: https://pdfium.googlesource.com/pdfium/+/2fad11a8d9d2704cd9ee28b02373ad7ce19c65e3
Patch Set 1 #Patch Set 2 : GetPageObjectList does not contain nullptrs #Patch Set 3 : Fix tests #Patch Set 4 : Fix bad refactoring some more #Patch Set 5 : Slightly easier to read? #Patch Set 6 : Better variable name #
Total comments: 14
Patch Set 7 : less loops #Messages
Total messages: 13 (3 generated)
thestig@chromium.org changed reviewers: + dsinclair@chromium.org
Nope, sent this too early as well. Got red bots.
Ok, PTAL.
Could you add a description of how this method works? As you can see by my comments, it isn't quite making sense to me at the moment. https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... File core/fpdftext/fpdf_text_int.cpp (right): https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:715: fLineHeight = pPageObj->m_Top - pPageObj->m_Bottom; What happens if we have different line heights on the page? This just happens to pick the first one? If we have an extra large header, and then really small following text won't this value be wrong? https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:720: break; Am I misreading this or, isn't index after this for() always going to be minH since that's the first bit we set and after the next for() always nPageWidth - maxH as that's the max bit we set? What do we need these for loops for? https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:727: const int32_t nEndH = index; Shoudn't this be index - 1? Or, should the mask vectors be one element bigger? Do we go from 0 - nPageWidth or from 0 - (nPageWidth - 1)? https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:744: const FX_FLOAT nSumH = MaskPercentFilled(nHorizontalMask, nStartH, nEndH); Doesn't this always come out to 1? Didn't we set every bit between startH and endH to true in the mask?
Kind of lazy about writing comments because a) this is getting more readable as is, at least in my head and b) we may end up changing it anyway to fix bug 521. https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... File core/fpdftext/fpdf_text_int.cpp (right): https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:715: fLineHeight = pPageObj->m_Top - pPageObj->m_Bottom; On 2016/06/15 13:37:50, dsinclair wrote: > What happens if we have different line heights on the page? This just happens to > pick the first one? > > If we have an extra large header, and then really small following text won't > this value be wrong? Everything here is heuristics based, so this could plausibly work, or one can argue we need to take more samples and average them, or take the max. My goal with this CL is just to refactor. Not fix up the heuristics. https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:720: break; On 2016/06/15 13:37:50, dsinclair wrote: > Am I misreading this or, isn't index after this for() always going to be minH > since that's the first bit we set and after the next for() always nPageWidth - > maxH as that's the max bit we set? > > What do we need these for loops for? Which |minH|? It's the smallest |minH| value that reaches line 711 in that for-loop. I suppose we could have "minminH" to track that in the for loop above and remove this one? https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:727: const int32_t nEndH = index; On 2016/06/15 13:37:50, dsinclair wrote: > Shoudn't this be index - 1? Or, should the mask vectors be one element bigger? > Do we go from 0 - nPageWidth or from 0 - (nPageWidth - 1)? Hmm, this probably helps prevent a divide by 0? It is off by one, but maybe that does not matter that much for the heuristics. https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:744: const FX_FLOAT nSumH = MaskPercentFilled(nHorizontalMask, nStartH, nEndH); On 2016/06/15 13:37:50, dsinclair wrote: > Doesn't this always come out to 1? Didn't we set every bit between startH and > endH to true in the mask? No, we didn't. There's gaps in between because there are multiple text objects.
https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... File core/fpdftext/fpdf_text_int.cpp (right): https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:715: fLineHeight = pPageObj->m_Top - pPageObj->m_Bottom; On 2016/06/15 23:46:33, Lei Zhang wrote: > On 2016/06/15 13:37:50, dsinclair wrote: > > What happens if we have different line heights on the page? This just happens > to > > pick the first one? > > > > If we have an extra large header, and then really small following text won't > > this value be wrong? > > Everything here is heuristics based, so this could plausibly work, or one can > argue we need to take more samples and average them, or take the max. My goal > with this CL is just to refactor. Not fix up the heuristics. Acknowledged. https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:720: break; On 2016/06/15 23:46:33, Lei Zhang wrote: > On 2016/06/15 13:37:50, dsinclair wrote: > > Am I misreading this or, isn't index after this for() always going to be minH > > since that's the first bit we set and after the next for() always nPageWidth - > > maxH as that's the max bit we set? > > > > What do we need these for loops for? > > Which |minH|? It's the smallest |minH| value that reaches line 711 in that > for-loop. I suppose we could have "minminH" to track that in the for loop above > and remove this one? Crap, there is another level of for() loop on 698 that I missed. So, why not move nStartH and nEndH into that for and just set it to the min/max seen instead of doing these for() loops? https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:744: const FX_FLOAT nSumH = MaskPercentFilled(nHorizontalMask, nStartH, nEndH); On 2016/06/15 23:46:33, Lei Zhang wrote: > On 2016/06/15 13:37:50, dsinclair wrote: > > Doesn't this always come out to 1? Didn't we set every bit between startH and > > endH to true in the mask? > > No, we didn't. There's gaps in between because there are multiple text objects. Right, missed the outer for() above, though we always set the same bits in the vector.
https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... File core/fpdftext/fpdf_text_int.cpp (right): https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:720: break; On 2016/06/16 15:06:10, dsinclair wrote: > On 2016/06/15 23:46:33, Lei Zhang wrote: > > On 2016/06/15 13:37:50, dsinclair wrote: > > > Am I misreading this or, isn't index after this for() always going to be > minH > > > since that's the first bit we set and after the next for() always nPageWidth > - > > > maxH as that's the max bit we set? > > > > > > What do we need these for loops for? > > > > Which |minH|? It's the smallest |minH| value that reaches line 711 in that > > for-loop. I suppose we could have "minminH" to track that in the for loop > above > > and remove this one? > > > Crap, there is another level of for() loop on 698 that I missed. So, why not > move nStartH and nEndH into that for and just set it to the min/max seen instead > of doing these for() loops? Done. https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:744: const FX_FLOAT nSumH = MaskPercentFilled(nHorizontalMask, nStartH, nEndH); On 2016/06/16 15:06:10, dsinclair wrote: > On 2016/06/15 23:46:33, Lei Zhang wrote: > > On 2016/06/15 13:37:50, dsinclair wrote: > > > Doesn't this always come out to 1? Didn't we set every bit between startH > and > > > endH to true in the mask? > > > > No, we didn't. There's gaps in between because there are multiple text > objects. > > > Right, missed the outer for() above, though we always set the same bits in the > vector. Not sure what you mean exactly. Are you still trying to say you think this always comes out as 1? If so, I assure you that's not the case.
The CQ bit was checked by dsinclair@chromium.org
lgtm https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... File core/fpdftext/fpdf_text_int.cpp (right): https://codereview.chromium.org/2066043002/diff/40002/core/fpdftext/fpdf_text... core/fpdftext/fpdf_text_int.cpp:744: const FX_FLOAT nSumH = MaskPercentFilled(nHorizontalMask, nStartH, nEndH); On 2016/06/16 22:47:03, Lei Zhang wrote: > On 2016/06/16 15:06:10, dsinclair wrote: > > On 2016/06/15 23:46:33, Lei Zhang wrote: > > > On 2016/06/15 13:37:50, dsinclair wrote: > > > > Doesn't this always come out to 1? Didn't we set every bit between startH > > and > > > > endH to true in the mask? > > > > > > No, we didn't. There's gaps in between because there are multiple text > > objects. > > > > > > Right, missed the outer for() above, though we always set the same bits in the > > vector. > > Not sure what you mean exactly. Are you still trying to say you think this > always comes out as 1? If so, I assure you that's not the case. Sorry, was trying to say my comment was wrong, and you were right. I missed the outer loop above which is what confused me.
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2066043002/100001
Message was sent while issue was closed.
Description was changed from ========== Simplify CPDF_TextPage::FindTextlineFlowOrientation(). ========== to ========== Simplify CPDF_TextPage::FindTextlineFlowOrientation(). Committed: https://pdfium.googlesource.com/pdfium/+/2fad11a8d9d2704cd9ee28b02373ad7ce19c... ==========
Message was sent while issue was closed.
Committed patchset #7 (id:100001) as https://pdfium.googlesource.com/pdfium/+/2fad11a8d9d2704cd9ee28b02373ad7ce19c... |