|
|
DescriptionImprove extraction of accessible text from PDF.
This improves the data that ChromeVox accesses when making PDF files
accessible.
BUG=434175
Committed: https://crrev.com/ee8c002360097305a3b058c0bcb5befdd843ab16
Cr-Commit-Position: refs/heads/master@{#373667}
Patch Set 1 #
Total comments: 16
Patch Set 2 : Address feedback #
Total comments: 4
Patch Set 3 : Fix last-line issue #
Total comments: 19
Patch Set 4 : Address more feedback #
Total comments: 2
Patch Set 5 : Address feedback, rebase #Patch Set 6 : Get rid of unneccessary assertion in test #
Depends on Patchset: Messages
Total messages: 45 (8 generated)
dmazzoni@chromium.org changed reviewers: + jbreiden@google.com, raymes@chromium.org
Ready for an initial look. This loses support for links, but I'm okay with that if the quality of the text is improved. We could add back support for links later.
dmazzoni@chromium.org changed reviewers: + thestig@chromium.org
+thestig as an alternate reviewer since he's also cc'd on this.
I defer to thestig@
There is a fundamental question here. The original bug report was garbled ChromeVox output on millions of digital books. We now understand that it was due to FPDFText_GetRect getting confused, because there was overlapping symbolic text in the PDF. So in some sense it was a garbage in, garbage out situation. It turns out that other PDFium API calls (used by copy-paste) are robust to overlapping text, and this patch essentially switches to those calls. This is also the rare case where we control the PDF generation side as well. On or near March 31, 2015 the chief producers of this "garbage" (Google Books, Tesseract OCR, and one other program) were changed and now the bounding boxes do not appear to overlap. So in some sense the problem has evaporated. So the question is, does this changelist stand on its own merits in terms of simplicity, straightforwardness, elegance, maintainability, etc? Or is it better to leave things as they are? I checked with an OCR expert just now. It is possible for some digital books to still have overlapping words, because it is possible to print dead tree books that way. But this was speculative, not expected to be common, and I'm reluctant consider it without examples in hand. Especially given that the concept of word is not particularly well defined in PDF. Sorry that I did not report all this earlier; it only became clear very recently.
I patched in this code (manually) and took a look at a small suite of challenging PDFs - Arabic, Hebrew, vertical Japanese, tilted base lines, etc. This changelist produces very different results visually, but I'm not sure what is going to be better aurally. How can I enable text-to-speech on a Linux Chromium build to find out? http://jbreiden3.mtv.corp.google.com/results/
I patched in this code (manually) and took a look at a small suite of challenging PDFs - Arabic, Hebrew, vertical Japanese, tilted base lines, etc. This changelist produces very different results visually, but I'm not sure what is going to be better aurally. How can I enable text-to-speech on a Linux Chromium build to find out? http://jbreiden3.mtv.corp.google.com/results/
I took a few screenshots. Here you can see English looks pretty good (many people will consider it an improvement) but Hebrew going very wrong. https://code.google.com/p/chromium/issues/detail?id=434175#c34
On 2016/01/07 00:58:11, jbreiden wrote: > not appear to overlap. So in some sense the problem has evaporated. So the > question is, does this changelist stand on its own merits in terms of > simplicity, straightforwardness, elegance, maintainability, etc? Or is it better > to leave things as they are? Yes, now that I understand it better too I think the previous approach was fundamentally broken, because it iterated over text blocks, but then tried to transform those into start and end character indexes in the linearized text. That's prone to problems not only with overlapping blocks but also off-by-one errors and ordering issues. This approach seems fundamentally more sound, it's based on exposing the text from the FPDFTEXT interface, which already linearizes everything. The code as written essentially outputs every character exactly once, the line breaking is heuristic-based, but the sequence of characters is not, whereas before it was.
On 2016/01/07 18:10:10, jbreiden wrote: > How can I enable text-to-speech on a Linux > Chromium build to find out? Install speech-dispatcher on Linux, then run Chrome with --enable-speech-dispatcher It should install espeak by default, which is poor quality synthesis but supports a lot of languages. If you just want English at better quality, you can install this extension instead: https://chrome.google.com/webstore/detail/us-english-female-text-to/pkidpnnap...
On 2016/01/07 18:43:11, jbreiden wrote: > I took a few screenshots. Here you can see English looks pretty > good (many people will consider it an improvement) but Hebrew > going very wrong. > > https://code.google.com/p/chromium/issues/detail?id=434175#c34 I'm not convinced the Hebrew is that wrong. Previously every word was positioned on the page independently, so the overall structure of the page looked okay, but the boxes could appear in any order, so there was no guarantee the Hebrew would be spoken in the correct right-to-left order. With the new code, the characters are in the output HTML in the correct reading order. The problem just seems to be with the bounding boxes. It's not immediately obvious to me why, since I swap left/right after doing the coordinate transformation.
Does it make sense to refactor or share code with text selection? See PDFiumEngine::GetSelectedText() and related helper functions such as FormatStringWithHyphens() Regarding Hebrew and other right-to-left languages, it is really tough to do a good job with PDF. For example, it's very common to write these in LTR order and reverse the order of the letters. This test set doesn't do that; we go strictly in reading order. Users have been generally surprised and ecstatic how well copy-paste works. Especially for Indic languages.
I'll let jbreiden verify the functionality is good. So is FPDFText_GetBoundedText() fundamentally broken, or can the old code work if FPDFText_GetBoundedText() behaved better? https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:37: pp::Rect PageRectToGViewRect(const pp::Rect &input, FPDF_PAGE page) { "pp::Rect& input", put |page| as the first param. https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:86: 0xA, 0xB, 0xC, 0xD, 0X85, 0x2028, 0x2029, 0 Don't need the 0 sentinel value at the end? https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:208: pp::Rect block_rect; Is this needed? It's being written to, but not read from. https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:214: for (int i = 0; i <= chars_count; i++) { Isn't the last iteration going out of bounds? https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:218: // Due to b/9598615 there are spurious STX chars appearing in place Has that bug been fixed? https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:231: IsEol(character) || is_intraword_linebreak) { Check |is_intraword_linebreak| first since that's the cheapest check? https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:241: // Add a 0-width hyphen. Projector will treat the first word of the Reference to internal project name? https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:279: line_rect.Union(char_rect); Union() calls like this have no effect.
* On Linux, --enable-speech-dispatcher gave a very crude rendition of Hebrew, spelling the words out letter by letter. About all that I could tell that we were starting with the correct character. With and without this changelist. Good. * As discussed, this changelist makes a ChromeVox visual regression for Arabic and Hebrew. These languages work fine in Projector and Chrome text selection. I don't think the changelist should land with this regression. * Vertical Japanese is a visual disaster in ChromeVox both with and without this changelist. Vertical Japanese almost perfectly in Projector and Chrome. Just mentioning as an FYI. I apologize again for causing all this effort and trouble in the first place.
https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:37: pp::Rect PageRectToGViewRect(const pp::Rect &input, FPDF_PAGE page) { On 2016/01/08 04:01:36, Lei Zhang wrote: > "pp::Rect& input", put |page| as the first param. Done. https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:86: 0xA, 0xB, 0xC, 0xD, 0X85, 0x2028, 0x2029, 0 On 2016/01/08 04:01:36, Lei Zhang wrote: > Don't need the 0 sentinel value at the end? Done. https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:208: pp::Rect block_rect; On 2016/01/08 04:01:36, Lei Zhang wrote: > Is this needed? It's being written to, but not read from. Done. https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:214: for (int i = 0; i <= chars_count; i++) { On 2016/01/08 04:01:36, Lei Zhang wrote: > Isn't the last iteration going out of bounds? Done. https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:218: // Due to b/9598615 there are spurious STX chars appearing in place On 2016/01/08 04:01:36, Lei Zhang wrote: > Has that bug been fixed? @jbreiden: Not sure. Honestly it looks reasonable to include this code without the IsSoftHyphen check - if the next character doesn't overlap on the Y axis shouldn't we be starting a new line? https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:231: IsEol(character) || is_intraword_linebreak) { On 2016/01/08 04:01:36, Lei Zhang wrote: > Check |is_intraword_linebreak| first since that's the cheapest check? Done. https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:241: // Add a 0-width hyphen. Projector will treat the first word of the On 2016/01/08 04:01:36, Lei Zhang wrote: > Reference to internal project name? Done. https://codereview.chromium.org/1568723002/diff/1/pdf/pdfium/pdfium_page.cc#n... pdf/pdfium/pdfium_page.cc:279: line_rect.Union(char_rect); On 2016/01/08 04:01:36, Lei Zhang wrote: > Union() calls like this have no effect. Thanks! I meant to assign it.
On 2016/01/09 01:48:52, jbreiden wrote: > * As discussed, this changelist makes a ChromeVox visual regression for Arabic > and > Hebrew. These languages work fine in Projector and Chrome text selection. I > don't > think the changelist should land with this regression. Let's fix this. Can you send me a good test document? > * Vertical Japanese is a visual disaster in ChromeVox both with and without this > > changelist. Vertical Japanese almost perfectly in Projector and Chrome. Just > mentioning > as an FYI. Let's split this into a new bug. I think a bigger impact than fixing these bugs will be to make Pdfium accessible on all platforms rather than just via ChromeVox. I'm looking into that now. If we do that we'll probably get a lot more user bug reports, too.
The Hebrew example is working better now. Lei caught a bug where I wasn't using pp::Rect's Union correctly, so our rects were reflecting only the first character of each line rather than the whole line. PTAL.
Much better on most Hebrew and Arabic. However, I'm seeing a regression on single line PDF files, which are coming up totally blank. Question: Is this code making use of the PDFium reported font size for space characters? Will it be mad if such fontsize is automatically set to "1" ? http://jbreiden3.mtv/results/simple.pdf
This changelist eats the final line of every PDF. It's just much more obvious with single line PDFs.
https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:56: VLOG(9) << "xml-invalid-rectangle"; There is no XML here, so the debug message is misleading. Maybe remove "xml" or kill the debug message entirely. https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:280: The above logic assumes an EOL character terminates the page. If not, the last line of text will never be written. Evidence suggests that this assumption does not hold, although I have not confirmed directly. (PDFium generates all whitespace characters including linebreaks. None of that is embedded into the PDF itself.)
Fixed the last-line issue. Thanks. https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:56: VLOG(9) << "xml-invalid-rectangle"; On 2016/01/13 20:58:31, jbreiden wrote: > There is no XML here, so the debug message is misleading. Maybe remove "xml" or > kill the debug message entirely. Done. https://codereview.chromium.org/1568723002/diff/20001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:280: On 2016/01/13 20:58:32, jbreiden wrote: > The above logic assumes an EOL character terminates the page. > If not, the last line of text will never be written. Evidence > suggests that this assumption does not hold, although I have > not confirmed directly. (PDFium generates all whitespace > characters including linebreaks. None of that is embedded > into the PDF itself.) OK, looks like the reason it was working before is that the for() loop iterated over <= chars_count, so it went past the end of the array, and character was 0 at the end, and IsEol returned true for a 0 character. I changed it to be safer and more explicit: now we explicitly pretend the last character is '\n'.
We depended on null terminate strings? Yikes. Thanks for figuring this out. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:48: if (max_x < min_x) Hmmm.... this works best for left-to-right languages. See how jbreiden3.mtv/results/ara.pdf has bad justification? Something like this should fix it. int w = abs(max_x - min_x); // width int h = abs(max_y - min_y); // height if (max_y < min_y) std::swap(min_y, max_y); if (max_x < min_x) pp::Rect output_rect(max_x - w, max_y - h, w, h); else pp::Rect output_rect(min_x, min_y, w, h); But honestly, the world of Arabic and Hebrew is a mess in PDF there is a lot of variation. So getting this working beautifully for Google Books does not mean we will do well with arbitrary Hebrew and Arabic PDF. E.g. I think the suggested code change above will work nicely and not harm anything, but it is hard to be sure. Maybe it is best just to leave the code alone add a comment mentioning our weakness. Something like: // This code left justifies text in the bounding box, which // makes sense for left-to-right languages such as English. // This is not correct for a right to left language such as // Arabic. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:62: if (right < left) Same as above. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:231: is_intraword_linebreak = !OverlapsOnYAxis(char_rect, next_char_rect); Maybe a comment somewhere in this function mentioning that we are assuming horizontal text and will do badly with vertical text such as Japanese or Chinese. This is not a regression (old ChromeVox was also a disaster) but we might want to get this this working some day in the future and a comment may be useful.
or something like that. Hard to write correct code here without trial and error.
https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:41: int min_x, min_y; One var per line please. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:70: // This is the character Pfdium inserts where a word is broken across lines. PDFium https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:211: for (int i = 0; i <= chars_count; i++) { This went it i < chars_count in patch set 2 and back to <= in patch set 3. Do we need to go past the last char or not? https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:234: base::IsUnicodeWhitespace(character) || funny indentation https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:242: if (IsEol(character) || is_intraword_linebreak) { You can also check |is_intraword_linebreak| first here. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:248: base::DictionaryValue* line_node = new base::DictionaryValue(); Can you create/initialize the Values in order? |text_node| -> |text_nodes| -> |line_node|.
https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:211: for (int i = 0; i <= chars_count; i++) { On 2016/01/14 02:45:06, Lei Zhang wrote: > This went it i < chars_count in patch set 2 and back to <= in patch set 3. Do we > need to go past the last char or not? "This went to"
I am happy with functionality and do not need to see further work on Hebrew/Arabic before submit.
Thanks for your patience, I got sidetracked. Ready for another look. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:41: int min_x, min_y; On 2016/01/14 02:45:06, Lei Zhang (OOO) wrote: > One var per line please. Done. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:48: if (max_x < min_x) On 2016/01/14 01:07:56, jbreiden wrote: > Hmmm.... this works best for left-to-right languages. See how No, the issue here is that a pp::Rect isn't allowed to be "backwards", i.e. with a left coordinate that's larger than its right coordinate. This code is correct no matter what the text direction. However, in order to correctly *display* the resulting text properly, we need to extract the text direction and pass that as part of the data structure. FWIW, I think this will be moot once I switch to implementing accessibility natively. The way it will work then is that we'll just export the bounds of every character, and the user will see the full rendered PDF visually with an accessible bounding box for whatever range they have selected. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:62: if (right < left) On 2016/01/14 01:07:56, jbreiden wrote: > Same as above. Same answer. We have to swap or it doesn't work. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:70: // This is the character Pfdium inserts where a word is broken across lines. On 2016/01/14 02:45:06, Lei Zhang (OOO) wrote: > PDFium Done. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:211: for (int i = 0; i <= chars_count; i++) { On 2016/01/14 02:47:58, Lei Zhang (OOO) wrote: > On 2016/01/14 02:45:06, Lei Zhang wrote: > > This went it i < chars_count in patch set 2 and back to <= in patch set 3. Do > we > > need to go past the last char or not? > > "This went to" This is on purpose. When i == chars_count, we pretend the character is a newline in order to flush it. I added a comment so it's clear this isn't a bug. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:231: is_intraword_linebreak = !OverlapsOnYAxis(char_rect, next_char_rect); On 2016/01/14 01:07:56, jbreiden wrote: > Maybe a comment somewhere in this function mentioning that we are assuming > horizontal text and will do badly with vertical text such as Japanese or > Chinese. > > This is not a regression (old ChromeVox was also a disaster) but we might want > to get this this working some day in the future and a comment may be useful. Added a link to a bug. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:234: base::IsUnicodeWhitespace(character) || On 2016/01/14 02:45:06, Lei Zhang (OOO) wrote: > funny indentation Done. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:242: if (IsEol(character) || is_intraword_linebreak) { On 2016/01/14 02:45:06, Lei Zhang (OOO) wrote: > You can also check |is_intraword_linebreak| first here. Done. https://codereview.chromium.org/1568723002/diff/40001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:248: base::DictionaryValue* line_node = new base::DictionaryValue(); On 2016/01/14 02:45:06, Lei Zhang (OOO) wrote: > Can you create/initialize the Values in order? |text_node| -> |text_nodes| -> > |line_node|. Done.
Not related to this review, but hit me up if you need to check any assumptions about PDF character bounding boxes. Depending on language they can be pretty weird including total overlap. Ligatures are also extremely common. Also it can be a ton of data for a book length document. So much that Google Books avoids it. We only embed word bounding boxes when generating PDF. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
> > Not related to this review, but hit me up if you need to check any > assumptions about PDF character bounding boxes. Depending on language they > can be pretty weird including total overlap. Ligatures are also extremely > common. > So my plan is to implement native accessibility support. This solves the problem of needing to actually reproduce the text - visually, the user will just see the real PDF file, but the assistive tech will be accessing an accessibility tree consisting of the accessible text. As the user is reading the text, like with a screen reader, the assistive tech will ask for the bounds of a range of text - which could be a whole line, one word, or even one character. As long as we return something reasonably sane we should be fine. There are no APIs to ask for the direction of the text at this level - just the accessible text, and the bounding box of any arbitrary range. So basically we just iterate over the characters in the intended range and union the bounding rects, not worrying about direction. > Also it can be a ton of data for a book length document. So much that > Google Books avoids it. We only embed word bounding boxes when generating > PDF. > Good to know. I don't see any reason that wouldn't fail gracefully, returning the correct result for any range that includes words and returning the bounds of a whole word when asked about one character. In terms of speed, implementing this natively will mean we can only process visible pages at a time rather than the whole document, just like how rendering is done. Should be pretty fast. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Sounds great. Please make sure to read the code supporting copy-paste if you haven't already. It is nearby, super simple, works great, sounds similar. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Friendly ping - Lei, could you take another look?
lgtm https://codereview.chromium.org/1568723002/diff/60001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/60001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:294: node->Set(kPageTextBox, text); // Takes ownership of |text| Given how far away |text| is initialized, I would recomment making it a scoped_ptr, and use text.release() here.
The CQ bit was checked by dmazzoni@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from thestig@chromium.org Link to the patchset: https://codereview.chromium.org/1568723002/#ps80001 (title: "Address feedback, rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1568723002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1568723002/80001
https://codereview.chromium.org/1568723002/diff/60001/pdf/pdfium/pdfium_page.cc File pdf/pdfium/pdfium_page.cc (right): https://codereview.chromium.org/1568723002/diff/60001/pdf/pdfium/pdfium_page.... pdf/pdfium/pdfium_page.cc:294: node->Set(kPageTextBox, text); // Takes ownership of |text| On 2016/02/04 02:32:49, Lei Zhang wrote: > Given how far away |text| is initialized, I would recomment making it a > scoped_ptr, and use text.release() here. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_x64_rel_ng on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by dmazzoni@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from thestig@chromium.org Link to the patchset: https://codereview.chromium.org/1568723002/#ps100001 (title: "Get rid of unneccessary assertion in test")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1568723002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1568723002/100001
Message was sent while issue was closed.
Committed patchset #6 (id:100001)
Message was sent while issue was closed.
Description was changed from ========== Improve extraction of accessible text from PDF. This improves the data that ChromeVox accesses when making PDF files accessible. BUG=434175 ========== to ========== Improve extraction of accessible text from PDF. This improves the data that ChromeVox accesses when making PDF files accessible. BUG=434175 Committed: https://crrev.com/ee8c002360097305a3b058c0bcb5befdd843ab16 Cr-Commit-Position: refs/heads/master@{#373667} ==========
Message was sent while issue was closed.
Patchset 6 (id:??) landed as https://crrev.com/ee8c002360097305a3b058c0bcb5befdd843ab16 Cr-Commit-Position: refs/heads/master@{#373667} |