Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1199)

Issue 1568723002: Improve extraction of accessible text from PDF. (Closed)

Created:
4 years, 11 months ago by dmazzoni
Modified:
4 years, 10 months ago
Reviewers:
jbreiden, Lei Zhang, raymes
CC:
chromium-reviews
Base URL:
https://chromium.googlesource.com/chromium/src.git@fix_pdf
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Improve extraction of accessible text from PDF. This improves the data that ChromeVox accesses when making PDF files accessible. BUG=434175 Committed: https://crrev.com/ee8c002360097305a3b058c0bcb5befdd843ab16 Cr-Commit-Position: refs/heads/master@{#373667}

Patch Set 1 #

Total comments: 16

Patch Set 2 : Address feedback #

Total comments: 4

Patch Set 3 : Fix last-line issue #

Total comments: 19

Patch Set 4 : Address more feedback #

Total comments: 2

Patch Set 5 : Address feedback, rebase #

Patch Set 6 : Get rid of unneccessary assertion in test #

Unified diffs Side-by-side diffs Delta from patch set Stats (+153 lines, -119 lines) Patch
M chrome/test/data/pdf/basic_plugin_test.js View 1 2 3 4 5 1 chunk +0 lines, -1 line 0 comments Download
M pdf/pdfium/pdfium_page.h View 1 1 chunk +0 lines, -6 lines 0 comments Download
M pdf/pdfium/pdfium_page.cc View 1 2 3 4 3 chunks +153 lines, -112 lines 0 comments Download

Depends on Patchset:

Messages

Total messages: 45 (8 generated)
dmazzoni
Ready for an initial look. This loses support for links, but I'm okay with that ...
4 years, 11 months ago (2016-01-06 22:33:33 UTC) #2
dmazzoni
+thestig as an alternate reviewer since he's also cc'd on this.
4 years, 11 months ago (2016-01-06 22:36:05 UTC) #4
raymes
I defer to thestig@
4 years, 11 months ago (2016-01-06 23:08:18 UTC) #5
jbreiden
There is a fundamental question here. The original bug report was garbled ChromeVox output on ...
4 years, 11 months ago (2016-01-07 00:58:11 UTC) #6
jbreiden
I patched in this code (manually) and took a look at a small suite of ...
4 years, 11 months ago (2016-01-07 18:10:09 UTC) #7
jbreiden
I patched in this code (manually) and took a look at a small suite of ...
4 years, 11 months ago (2016-01-07 18:10:10 UTC) #8
jbreiden
I took a few screenshots. Here you can see English looks pretty good (many people ...
4 years, 11 months ago (2016-01-07 18:43:11 UTC) #9
dmazzoni
On 2016/01/07 00:58:11, jbreiden wrote: > not appear to overlap. So in some sense the ...
4 years, 11 months ago (2016-01-07 20:46:46 UTC) #10
dmazzoni
On 2016/01/07 18:10:10, jbreiden wrote: > How can I enable text-to-speech on a Linux > ...
4 years, 11 months ago (2016-01-07 20:53:26 UTC) #11
dmazzoni
On 2016/01/07 18:43:11, jbreiden wrote: > I took a few screenshots. Here you can see ...
4 years, 11 months ago (2016-01-07 21:00:03 UTC) #12
jbreiden
Does it make sense to refactor or share code with text selection? See PDFiumEngine::GetSelectedText() and ...
4 years, 11 months ago (2016-01-07 21:59:17 UTC) #13
Lei Zhang
I'll let jbreiden verify the functionality is good. So is FPDFText_GetBoundedText() fundamentally broken, or can ...
4 years, 11 months ago (2016-01-08 04:01:36 UTC) #14
jbreiden
* On Linux, --enable-speech-dispatcher gave a very crude rendition of Hebrew, spelling the words out ...
4 years, 11 months ago (2016-01-09 01:48:52 UTC) #15
dmazzoni
https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#newcode37 pdf/pdfium/pdfium_page.cc:37: pp::Rect PageRectToGViewRect(const pp::Rect &input, FPDF_PAGE page) { On 2016/01/08 ...
4 years, 11 months ago (2016-01-11 19:58:02 UTC) #16
dmazzoni
On 2016/01/09 01:48:52, jbreiden wrote: > * As discussed, this changelist makes a ChromeVox visual ...
4 years, 11 months ago (2016-01-11 20:17:08 UTC) #17
dmazzoni
The Hebrew example is working better now. Lei caught a bug where I wasn't using ...
4 years, 11 months ago (2016-01-11 23:07:39 UTC) #18
jbreiden
Much better on most Hebrew and Arabic. However, I'm seeing a regression on single line ...
4 years, 11 months ago (2016-01-12 01:25:51 UTC) #19
jbreiden
This changelist eats the final line of every PDF. It's just much more obvious with ...
4 years, 11 months ago (2016-01-12 01:29:17 UTC) #20
jbreiden
https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.cc#newcode56 pdf/pdfium/pdfium_page.cc:56: VLOG(9) << "xml-invalid-rectangle"; There is no XML here, so ...
4 years, 11 months ago (2016-01-13 20:58:32 UTC) #21
dmazzoni
Fixed the last-line issue. Thanks. https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.cc#newcode56 pdf/pdfium/pdfium_page.cc:56: VLOG(9) << "xml-invalid-rectangle"; On ...
4 years, 11 months ago (2016-01-13 23:05:05 UTC) #22
jbreiden
We depended on null terminate strings? Yikes. Thanks for figuring this out. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc ...
4 years, 11 months ago (2016-01-14 01:07:56 UTC) #23
jbreiden
or something like that. Hard to write correct code here without trial and error.
4 years, 11 months ago (2016-01-14 02:18:13 UTC) #24
Lei Zhang
https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc#newcode41 pdf/pdfium/pdfium_page.cc:41: int min_x, min_y; One var per line please. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc#newcode70 ...
4 years, 11 months ago (2016-01-14 02:45:06 UTC) #25
Lei Zhang
https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc#newcode211 pdf/pdfium/pdfium_page.cc:211: for (int i = 0; i <= chars_count; i++) ...
4 years, 11 months ago (2016-01-14 02:47:58 UTC) #26
jbreiden
I am happy with functionality and do not need to see further work on Hebrew/Arabic ...
4 years, 11 months ago (2016-01-14 03:31:18 UTC) #27
dmazzoni
Thanks for your patience, I got sidetracked. Ready for another look. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): ...
4 years, 11 months ago (2016-01-21 23:10:43 UTC) #28
chromium-reviews
Not related to this review, but hit me up if you need to check any ...
4 years, 11 months ago (2016-01-22 09:03:25 UTC) #29
dmazzoni
> > Not related to this review, but hit me up if you need to ...
4 years, 11 months ago (2016-01-22 17:50:50 UTC) #30
chromium-reviews
Sounds great. Please make sure to read the code supporting copy-paste if you haven't already. ...
4 years, 10 months ago (2016-01-30 03:40:13 UTC) #31
dmazzoni
Friendly ping - Lei, could you take another look?
4 years, 10 months ago (2016-02-02 17:55:17 UTC) #32
Lei Zhang
lgtm https://codereview.chromium.org/1568723002/diff/60001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/60001/pdf/pdfium/pdfium_page.cc#newcode294 pdf/pdfium/pdfium_page.cc:294: node->Set(kPageTextBox, text); // Takes ownership of |text| Given ...
4 years, 10 months ago (2016-02-04 02:32:50 UTC) #33
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1568723002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1568723002/80001
4 years, 10 months ago (2016-02-04 17:15:47 UTC) #36
dmazzoni
https://codereview.chromium.org/1568723002/diff/60001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/60001/pdf/pdfium/pdfium_page.cc#newcode294 pdf/pdfium/pdfium_page.cc:294: node->Set(kPageTextBox, text); // Takes ownership of |text| On 2016/02/04 ...
4 years, 10 months ago (2016-02-04 17:18:44 UTC) #37
commit-bot: I haz the power
Try jobs failed on following builders: win_chromium_x64_rel_ng on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/163749)
4 years, 10 months ago (2016-02-04 17:58:56 UTC) #39
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1568723002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1568723002/100001
4 years, 10 months ago (2016-02-04 22:07:02 UTC) #42
commit-bot: I haz the power
Committed patchset #6 (id:100001)
4 years, 10 months ago (2016-02-04 23:53:13 UTC) #43
commit-bot: I haz the power
4 years, 10 months ago (2016-02-04 23:54:35 UTC) #45
Message was sent while issue was closed.
Patchset 6 (id:??) landed as
https://crrev.com/ee8c002360097305a3b058c0bcb5befdd843ab16
Cr-Commit-Position: refs/heads/master@{#373667}

Powered by Google App Engine
This is Rietveld 408576698