|
|
Created:
7 years, 3 months ago by Alpha Left Google Modified:
7 years, 2 months ago CC:
blink-reviews, eae+blinkwatch, dglazkov+blink, jeez Base URL:
svn://svn.chromium.org/blink/trunk Visibility:
Public. |
DescriptionImprove GIF decoding performance
This change improves LZW decoding performance for GIF decoding.
BACKGROUND
LZW decoding uses a dictionary to store LZW codes. The data structure
stores the last character for each code and the index of last code.
This allows back tracing to construct the entire code. Current
implementation builds the code in reverse order in a temporary buffer
and then copies it into the row buffer.
ALGORITHM
This change eliminates the temporary buffer and saves a memory copy.
If the code length is known in advance then reconstruction can write
directly to the row buffer. The index output buffer is advanced by the
code length, and then reconstruct as usual.
The code length is saved such that it can be used next time.
Testing is done locally with a corpus of 70k images. There's no change
in output and roughly result in a 20% performance gain.
BUG=233076
R=pkasting@chromium.org
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=158842
Patch Set 1 #
Total comments: 48
Patch Set 2 : fixed comments #
Total comments: 13
Patch Set 3 : fixed bug #Patch Set 4 : bad code initial dictionary #
Total comments: 10
Patch Set 5 : test with bad code #
Total comments: 2
Patch Set 6 : comments #Patch Set 7 : comments #Patch Set 8 : test data moved to Source/core/platform/image-decoders/testing #
Total comments: 2
Messages
Total messages: 24 (0 generated)
This looks better than the number of comments would indicate -- most are either about better comment clarity, or attempt to address pre-existing issues with the code. This change is a nice improvement overall. We're going from "write to stack > copy to rowbuffer > read back rowbuffer to convert to colors and write to framebuffer" to just "write to rowbuffer > read back ...". It seems like a subsequent change could buy more performance by skipping the rowbuffer entirely, and as we decode in doLZW(), converting the decoded values directly to colors and writing straight to the framebuffer, no rowbuffer necessary. Do you agree? https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... File Source/core/platform/image-decoders/gif/GIFImageReader.cpp (right): https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:205: int code; Nit: Declare these first three where they are first used. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:243: // The previous code is not available either because this is the first Tiny nit: It seems like your comments in this section are "almost less than 80 chars, but not quite" and the code actually managed to fit in 80 chars. While Blink code has no line length restriction, if you're already close to fitting in 80 chars anyway, I might go ahead and wrap the comments so that they do fit inside that. Makes Rietveld reviews a little easier to read, at least... https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:245: // current code to stream. The code will have only one character. Nit: Maybe better second sentence: This code must therefore just be a single character, which we can write directly to the stream. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:254: // This is a complete code. Code length and code content are known. Nit: How about just: This is a pre-existing code, so we already know what it encodes. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:257: if (rowIter >= rowBuffer.end()) Shouldn't this be '>', not ">=", since the row iterator currently points to one past where we're going to write? (Also in the block below) https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:262: // by the first character of previous code. Nit: How about: This is a new code just being added to the dictionary. It must encode the contents of the previous code, plus the first character of the previous code again. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:269: // previous code. Write it to output and then write the previous code. Nit: This comment seems unnecessary given the above comment. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:274: // to |avail|. Nit: How about just: This is an invalid code. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:279: if (code >= MAX_BYTES || code == prefix[code]) |code| must be less than MAX_BYTES, given how it was initialized (i.e. by masking against codemask). |code| also cannot be equal to prefix[code]. For this to occur we would have had to have oldcode == avail at the beginning of this loop iteration, which can never happen since that would mean that during the previous loop iteration, code == avail + 1, which would result in bailing out of the function in the conditionals above. Therefore, this conditional should be removed entirely, and the comment below adjusted to note that "Since |code| is guaranteed to be smaller than sizeof(suffix), ..." https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:293: if (avail < 4096) { Nit: Use MAX_DICTIONARY_ENTRIES here in place of 4096 (see comments in header file) https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:297: avail++; Nit: I suggest using preincrement form wherever postincrement is not needed (looks like 3 more spots in this patch?). It doesn't really matter with integral types but it's a good habit to be in for consistency with iterators and such. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:301: // exceed the specified maximum codeword size of 12 bits. Nit: Remove "of 12 bits". Here and elsewhere, I'd rather avoid recopying the constant value into comments. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:302: if ((!(avail & codemask)) && (avail < 4096)) { Nit: Instead of "(avail < 4096)", the equivalent "(codesize < MAX_DICTIONARY_ENTRY_BITS)" (see comments in header file) would be clearer. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:313: continue; Nit: It seems slightly clearer to me to skip this, and then below the subsequent loop, insert a conditional: if (rowBegin != rowBuffer.begin()) { ... memcpy ... But I suppose in theory this could hurt perf. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:317: for (;rowBegin + width <= rowIter; rowBegin += width) { Nit: Space after first semicolon. If you don't take the previous nit about nuking the conditional above, then this loop could be converted to "do { ... } while" format and save a comparison. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:326: // of buffer. Nit: How about just: Move the remaining bytes to the beginning of the buffer. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:783: // This sets the largest possible codemask correctly at 4095. Nit: Remove last sentence, and remove "(12)" from previous sentence. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:787: if (clearCode >= MAX_BYTES) This conditional cannot fail given the previous conditional; remove it. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:799: rowBuffer.resize(m_frameContext->width() + MAX_BYTES); (See my comment atop GIFImageReader.h as to why I want to nuke MAX_BYTES and introduce MAX_DICTIONARY_ENTRIES.) Use the correct value here, and add comments: // We want to know the longest sequence encodable by a dictionary with // MAX_DICTIONARY_ENTRIES entries. If we ignore the need to encode the base // values themselves at the beginning of the dictionary, as well as the need // for a clear code or a termination code, we could use every entry to // encode a series of multiple values. If the input value stream looked // like "AAAAA..." (a long string of just one value), the first dictionary // entry would encode AA, the next AAA, the next AAAA, and so forth. Thus // the longest sequence would be MAX_DICTIONARY_ENTRIES + 1 values. // // However, we have to account for reserved entries. The first |datasize| // bits are reserved for the base values, and the next two entries are // reserved for the clear code and termination code. In theory a GIF can // set the datasize to 0, meaning we have just two reserved entries, making // the longest sequence (MAX_DICTIONARY_ENTIRES + 1) - 2 values long. Since // each value is a byte, this is also the number of bytes in the longest // encodable sequence. const size_t maxBytes = MAX_DICTIONARY_ENTRIES - 1; // Now allocate the output buffer. We decode directly into this buffer // until we have at least one row worth of data, then call outputRow(). // This means worst case we may have (row width - 1) bytes in the buffer and // then decode a sequence |maxBytes| long to append. rowBuffer.resize(m_frameContext->width() - 1 + maxBytes); https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:804: memset(suffix, 0, sizeof(suffix)); Nit: Seems like we could save a little time on the memset and be clearer to boot by doing: for (int i = 0; i < clearCode; ++i) suffix[i] = i; // Clearing the remainder of the suffix table lets us be more tolerant of bad data. memset(suffix + clearCode, 0, sizeof(suffix) - clearCode); https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... File Source/core/platform/image-decoders/gif/GIFImageReader.h (right): https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.h:50: #define MAX_BYTES 4097 /* 2^MAX_LZW_BITS+1 */ Of the uses of this value in this header, all but |stack| -- which you're removing -- really should have been using 2^MAX_LZW_BITS anyway, since they want to have an entry for each possible dictionary code, and don't care about the length of the sequence encodable by the dictionary. This is also true of all but one spot in the .cpp file. Therefore, I suggest replacing this with: #define MAX_DICTIONARY_ENTRY_BITS 12 #define MAX_DICTIONARY_ENTRIES 4096 // 2^MAX_DICTIONARY_ENTRY_BITS ...and then using that everywhere in this header, and almost everywhere in the .cpp file. The exception is in prepareToDecode(), which I comment separately about there. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.h:52: #define GIF_COLORS 3 Nit: While here, can you name this BYTES_PER_COLORMAP_ENTRY? https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.h:81: typedef Vector<unsigned char> GIFRow; Nit: Maybe instead of declaring this here we can declare it in GIFImageDecoder.h, so we can use it to replace the explicit Vector<unsigned char>s that appear in that file (and the .cpp file). https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.h:123: unsigned short prefix[MAX_BYTES]; These three arrays don't actually need to be MAX_BYTES long; they can simply be (2 ^ MAX_LZW_BITS) long.
Getting rid of the rowBuffer is possible but a bit tricky. We'll have to write directly to the RGBA bitmap. The tricky part is that LZW decoding can exceed a row buffer and it goes reverse, which makes it hard to get right. Also this direct mode doesn't work with progressive GIFs, and to handle both cases will add more complexity. It's worth trying though. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... File Source/core/platform/image-decoders/gif/GIFImageReader.cpp (right): https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:205: int code; On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Declare these first three where they are first used. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:243: // The previous code is not available either because this is the first On 2013/08/29 23:46:25, Peter Kasting wrote: > Tiny nit: It seems like your comments in this section are "almost less than 80 > chars, but not quite" and the code actually managed to fit in 80 chars. > > While Blink code has no line length restriction, if you're already close to > fitting in 80 chars anyway, I might go ahead and wrap the comments so that they > do fit inside that. Makes Rietveld reviews a little easier to read, at least... Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:245: // current code to stream. The code will have only one character. On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Maybe better second sentence: > > This code must therefore just be a single character, which we can write directly > to the stream. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:254: // This is a complete code. Code length and code content are known. On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: How about just: > > This is a pre-existing code, so we already know what it encodes. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:257: if (rowIter >= rowBuffer.end()) On 2013/08/29 23:46:25, Peter Kasting wrote: > Shouldn't this be '>', not ">=", since the row iterator currently points to one > past where we're going to write? (Also in the block below) Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:262: // by the first character of previous code. On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: How about: > > This is a new code just being added to the dictionary. It must encode the > contents of the previous code, plus the first character of the previous code > again. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:269: // previous code. Write it to output and then write the previous code. On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: This comment seems unnecessary given the above comment. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:274: // to |avail|. On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: How about just: > > This is an invalid code. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:279: if (code >= MAX_BYTES || code == prefix[code]) I removed this block of comments entirely. Having code >= avail actually has unintended behavior which is prevented by the added check above. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:293: if (avail < 4096) { On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Use MAX_DICTIONARY_ENTRIES here in place of 4096 (see comments in header > file) Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:297: avail++; On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: I suggest using preincrement form wherever postincrement is not needed > (looks like 3 more spots in this patch?). It doesn't really matter with > integral types but it's a good habit to be in for consistency with iterators and > such. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:301: // exceed the specified maximum codeword size of 12 bits. On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Remove "of 12 bits". Here and elsewhere, I'd rather avoid recopying the > constant value into comments. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:302: if ((!(avail & codemask)) && (avail < 4096)) { On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Instead of "(avail < 4096)", the equivalent "(codesize < > MAX_DICTIONARY_ENTRY_BITS)" (see comments in header file) would be clearer. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:313: continue; On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: It seems slightly clearer to me to skip this, and then below the subsequent > loop, insert a conditional: > > if (rowBegin != rowBuffer.begin()) { > ... memcpy ... > > But I suppose in theory this could hurt perf. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:317: for (;rowBegin + width <= rowIter; rowBegin += width) { On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Space after first semicolon. > > If you don't take the previous nit about nuking the conditional above, then this > loop could be converted to "do { ... } while" format and save a comparison. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:326: // of buffer. On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: How about just: > > Move the remaining bytes to the beginning of the buffer. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:783: // This sets the largest possible codemask correctly at 4095. On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Remove last sentence, and remove "(12)" from previous sentence. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:787: if (clearCode >= MAX_BYTES) On 2013/08/29 23:46:25, Peter Kasting wrote: > This conditional cannot fail given the previous conditional; remove it. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:799: rowBuffer.resize(m_frameContext->width() + MAX_BYTES); On 2013/08/29 23:46:25, Peter Kasting wrote: > (See my comment atop GIFImageReader.h as to why I want to nuke MAX_BYTES and > introduce MAX_DICTIONARY_ENTRIES.) > > Use the correct value here, and add comments: > > // We want to know the longest sequence encodable by a dictionary with > // MAX_DICTIONARY_ENTRIES entries. If we ignore the need to encode the base > // values themselves at the beginning of the dictionary, as well as the need > // for a clear code or a termination code, we could use every entry to > // encode a series of multiple values. If the input value stream looked > // like "AAAAA..." (a long string of just one value), the first dictionary > // entry would encode AA, the next AAA, the next AAAA, and so forth. Thus > // the longest sequence would be MAX_DICTIONARY_ENTRIES + 1 values. > // > // However, we have to account for reserved entries. The first |datasize| > // bits are reserved for the base values, and the next two entries are > // reserved for the clear code and termination code. In theory a GIF can > // set the datasize to 0, meaning we have just two reserved entries, making > // the longest sequence (MAX_DICTIONARY_ENTIRES + 1) - 2 values long. Since > // each value is a byte, this is also the number of bytes in the longest > // encodable sequence. > const size_t maxBytes = MAX_DICTIONARY_ENTRIES - 1; > > // Now allocate the output buffer. We decode directly into this buffer > // until we have at least one row worth of data, then call outputRow(). > // This means worst case we may have (row width - 1) bytes in the buffer and > // then decode a sequence |maxBytes| long to append. > rowBuffer.resize(m_frameContext->width() - 1 + maxBytes); Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:804: memset(suffix, 0, sizeof(suffix)); On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Seems like we could save a little time on the memset and be clearer to boot > by doing: > > for (int i = 0; i < clearCode; ++i) > suffix[i] = i; > // Clearing the remainder of the suffix table lets us be more tolerant of bad > data. > memset(suffix + clearCode, 0, sizeof(suffix) - clearCode); Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... File Source/core/platform/image-decoders/gif/GIFImageReader.h (right): https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.h:50: #define MAX_BYTES 4097 /* 2^MAX_LZW_BITS+1 */ On 2013/08/29 23:46:25, Peter Kasting wrote: > Of the uses of this value in this header, all but |stack| -- which you're > removing -- really should have been using 2^MAX_LZW_BITS anyway, since they want > to have an entry for each possible dictionary code, and don't care about the > length of the sequence encodable by the dictionary. This is also true of all > but one spot in the .cpp file. > > Therefore, I suggest replacing this with: > > #define MAX_DICTIONARY_ENTRY_BITS 12 > #define MAX_DICTIONARY_ENTRIES 4096 // 2^MAX_DICTIONARY_ENTRY_BITS > > ...and then using that everywhere in this header, and almost everywhere in the > .cpp file. The exception is in prepareToDecode(), which I comment separately > about there. Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.h:52: #define GIF_COLORS 3 On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: While here, can you name this BYTES_PER_COLORMAP_ENTRY? Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.h:81: typedef Vector<unsigned char> GIFRow; On 2013/08/29 23:46:25, Peter Kasting wrote: > Nit: Maybe instead of declaring this here we can declare it in > GIFImageDecoder.h, so we can use it to replace the explicit Vector<unsigned > char>s that appear in that file (and the .cpp file). Done. https://codereview.chromium.org/23646005/diff/1/Source/core/platform/image-de... Source/core/platform/image-decoders/gif/GIFImageReader.h:123: unsigned short prefix[MAX_BYTES]; On 2013/08/29 23:46:25, Peter Kasting wrote: > These three arrays don't actually need to be MAX_BYTES long; they can simply be > (2 ^ MAX_LZW_BITS) long. Done.
On 2013/08/30 23:25:36, Alpha wrote: > Getting rid of the rowBuffer is possible but a bit tricky. We'll have to write > directly to the RGBA bitmap. The tricky part is that LZW decoding can exceed a > row buffer and it goes reverse, which makes it hard to get right. Sure, but the existing patch has to worry about the same issues. Seems like the biggest issue is if the data for each row is not stored in memory immediately after the previous row. If it is, then the difficulties are similar to what we have here, we just check for running off the end of the framebuffer instead of the rowbuffer. If rows have gaps in memory between them, though, things get nastier. I also worry that we might decrease the processor dcache efficiency here and thus not get as much of a perf win as we'd hope. Only way to know is to try, though, probably. > Also this > direct mode doesn't work with progressive GIFs, and to handle both cases will > add more complexity. I didn't think of that one at all. Frankly I don't think I even know anything about progressive GIFs. OK, now to re-review.
https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... File Source/core/platform/image-decoders/gif/GIFImageReader.cpp (right): https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:210: for (const unsigned char *ch = block; bytesInBlock-- > 0; ch++) { Nit: Put * on typename https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:239: if (oldcode == -1) { I just realized that because we don't check that (code < avail) here, an attacker can sneak a bad code into the beginning of the dictionary and cause bad behavior later. This was a problem with the old code as well. (For example, by carefully picking the first couple of codes, I think an attacker can cause the decoder to infinite loop. I plan to create a test GIF and check to see who's vulnerable.) I think the right solution is to eliminate this block; instead unconditionally go to what's currently in the other conditional arm. There, modify the "} else if (code == avail) {" to be "} else if ((code == avail) && (oldcode != -1)) {". Added bonus: shorter code. https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:310: // Move the remaining bytes to the beginning of buffer. Nit: buffer -> the buffer https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:311: size_t bytesToCopy = rowIter - rowBegin; Nit: Can be const (since you seem to be willing to use local const elsewhere around here) https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:805: // Clearing the whole suffix table lets us be more tolerant of bad data. I think this comment and the memset can just be nuked. Assuming you follow my bugfix comment above, I don't think we ever read from the unset portion of the suffix table even in the case of bad data.
https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... File Source/core/platform/image-decoders/gif/GIFImageReader.cpp (right): https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:239: if (oldcode == -1) { On 2013/08/31 00:08:49, Peter Kasting wrote: > (For > example, by carefully picking the first couple of codes, I think an attacker can > cause the decoder to infinite loop. I plan to create a test GIF and check to > see who's vulnerable.) Update: The old code defended against this with the "if (stackp == MAX_BYTES)" check inside the while loop, so it would eventually catch the loop and bail. https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:245: if (rowIter >= rowBuffer.end()) How can this condition or either of the ones below fail, once the above issue is addressed? The comment in prepareToDecode already explained why rowBuffer should have as many bytes as we can conceivably need. It seems like we can just nuke all three of these and buy a little more speed.
I've updated the code to fix the bug for intentional bad initial code. Good catch! I found that since we have guaranteed that a new code strictly references a defined code. There's no need to clear entire suffix, suffixLength and prefix. It is true that we're now more strict about the code. The previous version did allow code to be greater than the defined dictionary, this means giving an invalid value to the pixel. Even worse it means accessing an invalid |prefix| and causes unexpected behavior. I fixed this a while back by clearing the entire |prefix| and the problem I just mentioned wouldn't happen in Blink's version of gif decoder. But before that the bug certainly was there. I doubt the code was really tolerant as the comment would suggest. Even today Mozilla's version of GIF decoder has the same bug: http://mxr.mozilla.org/mozilla-central/source/image/decoders/nsGIFDecoder2.cp... The code doesn't clear suffix nor prefix. An undefined code would cause trouble. https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... File Source/core/platform/image-decoders/gif/GIFImageReader.cpp (right): https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:210: for (const unsigned char *ch = block; bytesInBlock-- > 0; ch++) { On 2013/08/31 00:08:49, Peter Kasting wrote: > Nit: Put * on typename Done. https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:239: if (oldcode == -1) { I removed this branch and let the other branch handle the initial code. The only change is we need to have: if (avail < MAX_DICTIONARY_ENTRIES && oldcode != -1) Such that he dictionary doesn't reference the invalid code -1. https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:245: if (rowIter >= rowBuffer.end()) That is true. Since we can be sure there's no cycles in the dictionary we won't be writing outside of the row buffer. https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:310: // Move the remaining bytes to the beginning of buffer. On 2013/08/31 00:08:49, Peter Kasting wrote: > Nit: buffer -> the buffer Done. https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:311: size_t bytesToCopy = rowIter - rowBegin; On 2013/08/31 00:08:49, Peter Kasting wrote: > Nit: Can be const (since you seem to be willing to use local const elsewhere > around here) Done. https://codereview.chromium.org/23646005/diff/7001/Source/core/platform/image... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:805: // Clearing the whole suffix table lets us be more tolerant of bad data. On 2013/08/31 00:08:49, Peter Kasting wrote: > I think this comment and the memset can just be nuked. Assuming you follow my > bugfix comment above, I don't think we ever read from the unset portion of the > suffix table even in the case of bad data. Thinking about this a bit more since we're now more strict about the code and have established that a new code always references a defined and smaller code it is not necessary to clear the entire suffixLength as well. Clearing prefix is also not necessary.
So at the end I think the new behavior is better. The corrupted image will just fail at decoding stage, it will still display but animation would be paused. This is better than writing undefined values as the older version code would.
https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... File Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp (right): https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp:458: RefPtr<SharedBuffer> testData = readFile("/Source/web/tests/data/bad-initial-code.gif"); Nit: Might want a comment here about what this test actually verifies. https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... File Source/core/platform/image-decoders/gif/GIFImageReader.cpp (right): https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:246: } else if (code >= avail && oldcode != -1) { Why use >= here instead of ==? This allows an attacker to introduce error to the code table (because tempCode, and thus eventually oldcode down below, will be set to the invalid code). The new testcase I gave you won't catch this, but you could create one with the invalid code after the beginning of the file that would. https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:252: *--rowIter = firstchar; Nit: This subtraction isn't necessary... maybe codeLength = suffixLength[oldcode]; rowIter += codeLength++; *rowIter = firstchar; or const int length = suffixLength[oldcode]; rowIter += length; *rowIter = firstchar; codeLength += length + 1; Probably the compiler optimizes all these to the same thing, though, so it doesn't end up mattering. https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:269: // dictionary is not in initial state. Nit: This comment isn't actually correct (checking for oldcode != -1 is not a check for "the dictionary being in its initial state"). How about: Define a new codeword in the dictionary as long as we've read more than one value from the stream. https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:279: if ((!(avail & codemask)) && (avail < MAX_DICTIONARY_ENTRIES)) { Nit: No parens necessary around unary operator. I wouldn't put parens around the binary op either since you're not doing it elsewhere in this function.
https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... File Source/core/platform/image-decoders/gif/GIFImageReader.cpp (right): https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:246: } else if (code >= avail && oldcode != -1) { On 2013/09/03 18:49:45, Peter Kasting wrote: > Why use >= here instead of ==? This allows an attacker to introduce error to > the code table (because tempCode, and thus eventually oldcode down below, will > be set to the invalid code). > > The new testcase I gave you won't catch this, but you could create one with the > invalid code after the beginning of the file that would. Nice catch. I didn't realize tempCode would mess up things. I was planning to handle bad code like the original but missed the tempCode. Let's make this correct and if needed be more tolerant of bad data. https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:252: *--rowIter = firstchar; I think my original code was the second one but I decided that doesn't matter performance. I'd like this to better reflect the comments. https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:269: // dictionary is not in initial state. On 2013/09/03 18:49:45, Peter Kasting wrote: > Nit: This comment isn't actually correct (checking for oldcode != -1 is not a > check for "the dictionary being in its initial state"). How about: > > Define a new codeword in the dictionary as long as we've read more than one > value from the stream. Done. https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:279: if ((!(avail & codemask)) && (avail < MAX_DICTIONARY_ENTRIES)) { On 2013/09/03 18:49:45, Peter Kasting wrote: > Nit: No parens necessary around unary operator. I wouldn't put parens around > the binary op either since you're not doing it elsewhere in this function. Done.
LGTM https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... File Source/core/platform/image-decoders/gif/GIFImageReader.cpp (right): https://codereview.chromium.org/23646005/diff/33001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageReader.cpp:252: *--rowIter = firstchar; On 2013/09/04 23:47:28, Alpha wrote: > I think my original code was the second one but I decided that doesn't matter > performance. I'd like this to better reflect the comments. OK. I'm not sure how any of these reflect the comments better than any other, but since they're probably all equivalent from a performance perspective, I don't mind using whatever you like best. https://codereview.chromium.org/23646005/diff/38001/Source/core/platform/imag... File Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp (right): https://codereview.chromium.org/23646005/diff/38001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp:455: // The first LZW code in the image is invalid. Decoding should fail. Nit: Perhaps: The first LZW codes in the image are invalid values that try to create a loop in the dictionary. Decoding should fail, but not infinitely loop or corrupt memory. (And something similar below)
https://codereview.chromium.org/23646005/diff/38001/Source/core/platform/imag... File Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp (right): https://codereview.chromium.org/23646005/diff/38001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp:455: // The first LZW code in the image is invalid. Decoding should fail. On 2013/09/05 06:13:01, Peter Kasting wrote: > Nit: Perhaps: > > The first LZW codes in the image are invalid values that try to create a loop in > the dictionary. Decoding should fail, but not infinitely loop or corrupt > memory. > > (And something similar below) Done.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/hclam@chromium.org/23646005/50001
Can't process patch for file Source/web/tests/data/bad-code.gif. Binary file support is temporarilly disabled due to a bug. Please commit blindly the binary files first then commit the source change as a separate CL. Sorry for the annoyance.
+jamesr: OWNERS approval for the added data files in Source/web
Could you put these data files somewhere else? It doesn't make sense to add files to Source/web/ purely for the purpose of testing code in Source/core/.
On 2013/09/05 23:27:18, jamesr wrote: > Could you put these data files somewhere else? It doesn't make sense to add > files to Source/web/ purely for the purpose of testing code in Source/core/. Do you have a suggestion where? I guess I can put them under LayoutTests/fast/images/resources/.
If they aren't for LayoutTests then why would you put them in LayoutTests/? Your test is in Source/core/platform/image-decoders/.. so put the test data somewhere at least close to there.
Picking this up after vacation. It appears that other test files for images are also located in Source/web/tests/data. For example: https://codereview.chromium.org/25422003. I can't find a suitable place to put data files under Source/core, do you have a suggestion?
Source/core/platform/image-decoders/testing/ ? maybe /test/ ? I don't think it makes a huge difference so long as it's clear that it's test data and it's reasonably close to the code it's meant for.
Okay. I'll create that dir and gradually move files there.
-jamesr since this is only in Source/core/platform/image-decoders. pkasting: please review again as I created Source/core/platform/image-decoders/testing to hold test images.
LGTM https://codereview.chromium.org/23646005/diff/64001/Source/core/platform/imag... File Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp (right): https://codereview.chromium.org/23646005/diff/64001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp:456: // dictionary. Decoding should fail, but not infinitely loop or corrupt memory. Nit: This first sentence got quite mangled and is now very hard to understand. Let me re-suggest: "The first LZW codes in the image are invalid values that try to create a loop in the dictionary." https://codereview.chromium.org/23646005/diff/64001/Source/core/platform/imag... Source/core/platform/image-decoders/gif/GIFImageDecoderTest.cpp:469: // Image has an invalid LZW code that exceeds dictionary size. Decoding should fail. Nit: Add missing articles: "The image has an invalid LZW code that exceeds the dictionary size."
Message was sent while issue was closed.
Committed patchset #8 manually as r158842 (presubmit successful). |