Issue 2293403003: Check whether the annotation content is empty using CFX_WideString.

jaepark

jaepark@google.com changed reviewers: + dsinclair@chromium.org, thestig@chromium.org

4 years, 3 months ago (2016-09-01 01:07:33 UTC) #1

jaepark

The test case for this is in https://codereview.chromium.org/2301793002 .

4 years, 3 months ago (2016-09-01 01:07:33 UTC) #2

Lei Zhang

Can you be more specific about why CFX_ByteString cannot properly check for emptiness?

4 years, 3 months ago (2016-09-01 01:17:52 UTC) #3

jaepark

Description was changed from ========== Check whether the annotation content is empty using CFX_WideString. CFX_ByteString ...

4 years, 3 months ago (2016-09-01 01:47:21 UTC) #4

jaepark

Description was changed from ========== Check whether the annotation content is empty using CFX_WideString. CFX_ByteString ...

4 years, 3 months ago (2016-09-01 01:48:02 UTC) #5

jaepark

Description was changed from ========== Check whether the annotation content is empty using CFX_WideString. CFX_ByteString ...

4 years, 3 months ago (2016-09-01 01:50:29 UTC) #6

jaepark

On 2016/09/01 01:17:52, Lei Zhang wrote: > Can you be more specific about why CFX_ByteString ...

4 years, 3 months ago (2016-09-01 01:50:35 UTC) #7

dsinclair

On 2016/09/01 01:50:35, jaepark wrote: > On 2016/09/01 01:17:52, Lei Zhang wrote: > > Can ...

4 years, 3 months ago (2016-09-01 14:21:29 UTC) #8

jaepark

On 2016/09/01 14:21:29, dsinclair wrote: > > We have a unicode BOM at the beginning ...

4 years, 3 months ago (2016-09-01 17:20:27 UTC) #9

dsinclair

On 2016/09/01 17:20:27, jaepark wrote: > On 2016/09/01 14:21:29, dsinclair wrote: > > > > ...

4 years, 3 months ago (2016-09-01 17:22:16 UTC) #10

Lei Zhang

On 2016/09/01 17:22:16, dsinclair wrote: > It seems strange that we'd store the BOM for ...

4 years, 3 months ago (2016-09-01 21:18:39 UTC) #11

dsinclair

On 2016/09/01 21:18:39, Lei Zhang wrote: > On 2016/09/01 17:22:16, dsinclair wrote: > > It ...

4 years, 3 months ago (2016-09-06 12:53:14 UTC) #12

Lei Zhang

On 2016/09/06 12:53:14, dsinclair wrote: > We is PDFium. If the PDF has BOM's we ...

4 years, 3 months ago (2016-09-07 01:47:14 UTC) #13

dsinclair

On 2016/09/07 01:47:14, Lei Zhang wrote: > On 2016/09/06 12:53:14, dsinclair wrote: > > We ...

4 years, 3 months ago (2016-09-07 02:20:36 UTC) #14

dsinclair

On 2016/09/07 02:20:36, dsinclair wrote: > On 2016/09/07 01:47:14, Lei Zhang wrote: > > On ...

4 years, 3 months ago (2016-09-07 12:59:34 UTC) #15

Lei Zhang

On 2016/09/07 12:59:34, dsinclair wrote: > On 2016/09/07 02:20:36, dsinclair wrote: > > On 2016/09/07 ...

4 years, 3 months ago (2016-09-07 22:13:50 UTC) #16

dsinclair

On 2016/09/07 22:13:50, Lei Zhang wrote: > On 2016/09/07 12:59:34, dsinclair wrote: > > On ...

4 years, 3 months ago (2016-09-08 12:52:49 UTC) #17

On 2016/09/07 22:13:50, Lei Zhang wrote:
> On 2016/09/07 12:59:34, dsinclair wrote:
> > On 2016/09/07 02:20:36, dsinclair wrote:
> > > On 2016/09/07 01:47:14, Lei Zhang wrote:
> > > > On 2016/09/06 12:53:14, dsinclair wrote:
> > > > > We is PDFium. If the PDF has BOM's we should, internally, be removing
> them
> > > > when
> > > > > we stick the string into a WideString. We should be using the BOM to
> make
> > > sure
> > > > > we're doing the right conversion interally (I think UTF16-BE is what
we
> > > use?).
> > > > > 
> > > > > There is no reason to keep the BOM on the string after we've converted
> > into
> > > a
> > > > > wide string.
> > > > 
> > > > Are you suggesting we change CPDF_String? It's hard to predict what
> effects
> > > that
> > > > would have...
> > > 
> > > No, CPDF_String shouldn't know where the string contents came from, I
think
> we
> > > should be putting the right value in there to begin with. Otherwise, how
> will
> > we
> > > ever know what strings need the BOM skipped and which don't? We should fix
> the
> > > parser to do the same thing in all cases.
> > 
> > 
> > My question is:
> >   * We have strings that do not have BOMs
> >   * We have strings that do have BOMs
> >   * On a case-by-case basis in the code we'll have to skip the first 2
> > characters
> >   * I'm guessing these strings are read straight from the PDF files, is that
> > correct?
> >   * When we read the string from the file, we can strip the BOM and convert
to
> > the format that CPDF_String expects
> >   * Then, we always have strings without BOMs internally
> >   * Then, we never have to do the magic skip 2 characters dance and
> everythings
> > consistent.
> > 
> > This just feels like a bit of inconsistency in the code that is going to
cause
> > problems in the future.
> 
> Makes sense, shall we file a bug? I think it's outside the scope of Jae's
> annotation work, so we may end up dropping this CL.


Sounds good, Jae, can you file a bug for this and reference which strings you
know of have a BOM at the beginning of them to give us a starting point?

jaepark

On 2016/09/08 12:52:49, dsinclair wrote: > > Sounds good, Jae, can you file a bug ...

4 years, 3 months ago (2016-09-08 17:20:29 UTC) #18

jaepark

Rebased and added a comment with TODO. Test case is in https://codereview.chromium.org/2301793002 .

4 years, 3 months ago (2016-09-08 18:11:16 UTC) #19

jaepark

Description was changed from ========== Check whether the annotation content is empty using CFX_WideString. CFX_ByteString ...

4 years, 3 months ago (2016-09-08 18:55:26 UTC) #22

jaepark

The CQ bit was checked by jaepark@google.com to run a CQ dry run

4 years, 3 months ago (2016-09-08 18:56:15 UTC) #23