|
|
Chromium Code Reviews
DescriptionConsidering HTMLTokenizer state in HTMLSourceTracker
HTMLSourceTracker takes account of HTMLTokenizer's temporary buffor when
tracking source location of token. But in some of ScriptDataXXX states,
characters are already pushed as token's characters so in that cases we
don't need to consider temporary buffer of tokenizer.
BUG=692910
TEST=fast/frames/viewsource/*
Review-Url: https://codereview.chromium.org/2717303002
Cr-Commit-Position: refs/heads/master@{#455427}
Committed: https://chromium.googlesource.com/chromium/src/+/e39d4d6f3fb1b4d558bd2d2eca1d603bec1c588f
Patch Set 1 #Patch Set 2 : Considering HTMLTokenizer state in HTMLSourceTracker #
Total comments: 1
Patch Set 3 : Considering HTMLTokenizer state in HTMLSourceTracker #Patch Set 4 : Considering HTMLTokenizer state in HTMLSourceTracker #
Total comments: 6
Patch Set 5 : Considering HTMLTokenizer state in HTMLSourceTracker #
Messages
Total messages: 40 (24 generated)
Description was changed from ========== Considering HTMLTokenizer state in HTMLSourceTracker HTMLSourceTracker takes account of HTMLTokenizer's temporary buffor when tracking source location of token. But in some of ScriptDataXXX states, characters are already pushed as token's characters so in that cases we don't need to consider temporary buffer of tokenizer. BUG=692910 TEST=fast/frames/viewsource/* ========== to ========== Considering HTMLTokenizer state in HTMLSourceTracker HTMLSourceTracker takes account of HTMLTokenizer's temporary buffor when tracking source location of token. But in some of ScriptDataXXX states, characters are already pushed as token's characters so in that cases we don't need to consider temporary buffer of tokenizer. BUG=692910 TEST=fast/frames/viewsource/* ==========
wanchang.ryu@lge.com changed reviewers: + csharrison@chromium.org
I uploaded the fix for the bug https://bugs.chromium.org/p/chromium/issues/detail?id=692910. After debugging the problem, I found that HTMLSourceTracker takes account of temporary buffer of HTMLTokenizer to find source location of token. But in ScriptDataDoubleEscapeStartState or ScriptDataDoubleEscapeStartState, tokenizer does both of bufferCharater() and m_temporaryBuffer.push_back(). After that, temporary buffer remained so it cause miss match when calculating source location by utilizing temporary buffer size before it cleared. In this patch under some case of ScriptDataXXX states, ignore temporary buffer size to calculate source location of token. Some states (ScriptDataEndTagOpenState,ScriptDataEndTagNameState, ScriptDataEscapedEndTagOpenState,ScriptDataEscapedEndTagNameState) are still required to consider temporary buffer because in this state, temporary buffer can have characters not called bufferCharater(). Please review the patch and any comments will be appreciated. Thanks
csharrison@chromium.org changed reviewers: + kouhei@chromium.org
This seems ok but it feels very fragile to have this static list of states disjoint from the actual parser. The two could get out of sync quickly if something changes. Also, your expected output is very hard to follow. Is it possible to simplify? +kouhei, do you know this code? I can't think of a good solution here.
On 2017/02/27 14:42:03, Charlie Harrison wrote: > This seems ok but it feels very fragile to have this static list of states > disjoint from the actual parser. The two could get out of sync quickly if > something changes. > > Also, your expected output is very hard to follow. Is it possible to simplify? > > +kouhei, do you know this code? I can't think of a good solution here. Thanks for review, How about export isEndTagBufferingState from HTMLTokenizer.cpp then use this function ? Actually, temporary buffer involves in endtagbufferingstate, so we can use this function to decide whether take account of temporary_buffer size to locate the token position. Regarding the test case, I uses the file that attached in the issue https://bugs.chromium.org/p/chromium/issues/detail?id=692910. The issue occurs when tokenizer input becomes empty in ScriptDataDoubleEscapeStartState. Within this state, if the input of tokenizer is empty it returns the Character token with non empty temporary buffer. That's why test case has so big strings. During I debug it, I found the same situation is happened when input meet EOF in ScriptDataDoubleEscapeStartState. I think I can simplify this test case by cutting the last long string.
On 2017/02/28 00:28:24, wanchang wrote: > On 2017/02/27 14:42:03, Charlie Harrison wrote: > > This seems ok but it feels very fragile to have this static list of states > > disjoint from the actual parser. The two could get out of sync quickly if > > something changes. > > > > Also, your expected output is very hard to follow. Is it possible to simplify? > > > > +kouhei, do you know this code? I can't think of a good solution here. > > Thanks for review, How about export isEndTagBufferingState from > HTMLTokenizer.cpp then use this function ? > Actually, temporary buffer involves in endtagbufferingstate, so we can use this > function to decide whether take account of temporary_buffer size to locate the > token position. This sounds ok but it looks like isEndTagBufferingState doesn't cover all the cases that needToCheckTokenizerBuffer. Feel free to upload a PS in case I am misunderstanding. > Regarding the test case, I uses the file that attached in the issue > https://bugs.chromium.org/p/chromium/issues/detail?id=692910. The issue occurs > when tokenizer input becomes empty in ScriptDataDoubleEscapeStartState. Within > this state, if the input of tokenizer is empty it returns the Character token > with non empty temporary buffer. That's why test case has so big strings. > > During I debug it, I found the same situation is happened when input meet EOF in > ScriptDataDoubleEscapeStartState. I think I can simplify this test case by > cutting the last long string.
On 2017/02/28 13:55:43, Charlie Harrison wrote: > On 2017/02/28 00:28:24, wanchang wrote: > > On 2017/02/27 14:42:03, Charlie Harrison wrote: > > > This seems ok but it feels very fragile to have this static list of states > > > disjoint from the actual parser. The two could get out of sync quickly if > > > something changes. > > > > > > Also, your expected output is very hard to follow. Is it possible to > simplify? > > > > > > +kouhei, do you know this code? I can't think of a good solution here. > > > > Thanks for review, How about export isEndTagBufferingState from > > HTMLTokenizer.cpp then use this function ? > > Actually, temporary buffer involves in endtagbufferingstate, so we can use > this > > function to decide whether take account of temporary_buffer size to locate the > > token position. > This sounds ok but it looks like isEndTagBufferingState doesn't cover all the > cases that needToCheckTokenizerBuffer. Feel free to upload a PS in case I am > misunderstanding. > > > Regarding the test case, I uses the file that attached in the issue > > https://bugs.chromium.org/p/chromium/issues/detail?id=692910. The issue occurs > > when tokenizer input becomes empty in ScriptDataDoubleEscapeStartState. Within > > this state, if the input of tokenizer is empty it returns the Character token > > with non empty temporary buffer. That's why test case has so big strings. > > > > During I debug it, I found the same situation is happened when input meet EOF > in > > ScriptDataDoubleEscapeStartState. I think I can simplify this test case by > > cutting the last long string. Now I'm on vacation,I will upload PS next monday. Thanks.
On 2017/02/28 22:05:52, wanchang wrote: > On 2017/02/28 13:55:43, Charlie Harrison wrote: > > On 2017/02/28 00:28:24, wanchang wrote: > > > On 2017/02/27 14:42:03, Charlie Harrison wrote: > > > > This seems ok but it feels very fragile to have this static list of states > > > > disjoint from the actual parser. The two could get out of sync quickly if > > > > something changes. > > > > > > > > Also, your expected output is very hard to follow. Is it possible to > > simplify? > > > > > > > > +kouhei, do you know this code? I can't think of a good solution here. > > > > > > Thanks for review, How about export isEndTagBufferingState from > > > HTMLTokenizer.cpp then use this function ? > > > Actually, temporary buffer involves in endtagbufferingstate, so we can use > > this > > > function to decide whether take account of temporary_buffer size to locate > the > > > token position. > > This sounds ok but it looks like isEndTagBufferingState doesn't cover all the > > cases that needToCheckTokenizerBuffer. Feel free to upload a PS in case I am > > misunderstanding. > > I uploaded new PS, isEndTagBufferingState checks the state the opposite way. It checks only true states but needToCheckTokenizerBuffer checks false states. I think it makes more sense that considering temporary buffer takes into account only in endtagbufferingstate. > > > Regarding the test case, I uses the file that attached in the issue > > > https://bugs.chromium.org/p/chromium/issues/detail?id=692910. The issue > occurs > > > when tokenizer input becomes empty in ScriptDataDoubleEscapeStartState. > Within > > > this state, if the input of tokenizer is empty it returns the Character > token > > > with non empty temporary buffer. That's why test case has so big strings. > > > > > > During I debug it, I found the same situation is happened when input meet > EOF > > in > > > ScriptDataDoubleEscapeStartState. I think I can simplify this test case by > > > cutting the last long string. > > Now I'm on vacation,I will upload PS next monday. Thanks. Also simpler test case is uploaded. Please take a look. Thanks.
This LGTM (with a small nit). I'm going to run the try bots on the change just to make sure all tests pass. https://codereview.chromium.org/2717303002/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/html/parser/HTMLSourceTracker.cpp (right): https://codereview.chromium.org/2717303002/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/core/html/parser/HTMLSourceTracker.cpp:60: size_t numberOfBufferedCharacters = 0; nit: s/0/0u
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
Looks like existing tests are failing with this change. I'm going to Not LGTM until those are resolved. Sorry for the run-around.
On 2017/03/06 17:55:01, Charlie Harrison wrote: > Looks like existing tests are failing with this change. I'm going to Not LGTM > until those are resolved. Sorry for the run-around. I upload new PS. The fail case was when tokenizer completed parsing scriptendtag and emit character token and scriptendtag token. At this time, tokenizer set state to HTMLTokenizer::DataState then return character token. This case was not checked with isEndTagBufferingState() call. So it is required to check tokenizer temporary buffer in two cases. 1. in the middle of endtagbufferingstate ( if input buffer is empty while processing endtagbufferingstate ) 2. after completing scrpt end tag ( tokenizer emit character token first ) PS 3 covers above two cases. I checked below failing cases with my local build. virtual/mojo-loading/http/tests/security/xssAuditor/ http/tests/security/xssAuditor/ fast/frames/viewsource/ Thanks
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Looks like tests are happy, just few last minute nits. https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/html/parser/HTMLSourceTracker.cpp (right): https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/core/html/parser/HTMLSourceTracker.cpp:44: m_previousSource.append(m_currentSource); while you are here could you add braces to the else statement to fit our style guide? https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/core/html/parser/HTMLSourceTracker.cpp:102: // considering tokenizer temporarybuffer only if in the middle of Phrasing suggestion: "Consider checking the tokenizer's temporary buffer if it is in an end tag buffering state, or the token has finished parsing. The temporary buffer must not be used unconditionally, because in some states (e.g. ScriptDataDoubleEscapeStartState), data is appended to both the temporary buffer and the token itself. " https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/html/parser/HTMLTokenizer.h (right): https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/core/html/parser/HTMLTokenizer.h:269: inline bool isEndTagBufferingState(HTMLTokenizer::State state) { Can you just make this an inline static member function, so it is accessed like HTMLTokenizer::isEndTagBufferingState(state)?
Please take a look. https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/html/parser/HTMLSourceTracker.cpp (right): https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/core/html/parser/HTMLSourceTracker.cpp:44: m_previousSource.append(m_currentSource); On 2017/03/07 18:37:48, Charlie Harrison wrote: > while you are here could you add braces to the else statement to fit our style > guide? Done. https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/core/html/parser/HTMLSourceTracker.cpp:102: // considering tokenizer temporarybuffer only if in the middle of On 2017/03/07 18:37:47, Charlie Harrison wrote: > Phrasing suggestion: > "Consider checking the tokenizer's temporary buffer if it is in an end tag > buffering state, or the token has finished parsing. > > The temporary buffer must not be used unconditionally, because in some states > (e.g. ScriptDataDoubleEscapeStartState), data is appended to both the temporary > buffer and the token itself. > " Done. https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/html/parser/HTMLTokenizer.h (right): https://codereview.chromium.org/2717303002/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/core/html/parser/HTMLTokenizer.h:269: inline bool isEndTagBufferingState(HTMLTokenizer::State state) { On 2017/03/07 18:37:49, Charlie Harrison wrote: > Can you just make this an inline static member function, so it is accessed like > HTMLTokenizer::isEndTagBufferingState(state)? Done.
Thanks, this LGTM!
The CQ bit was checked by wanchang.ryu@lge.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by wanchang.ryu@lge.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 80001, "attempt_start_ts": 1488966570919270,
"parent_rev": "dec2b22c83c792439a84157ee7bc6c900ea89c09", "commit_rev":
"e39d4d6f3fb1b4d558bd2d2eca1d603bec1c588f"}
Message was sent while issue was closed.
Description was changed from ========== Considering HTMLTokenizer state in HTMLSourceTracker HTMLSourceTracker takes account of HTMLTokenizer's temporary buffor when tracking source location of token. But in some of ScriptDataXXX states, characters are already pushed as token's characters so in that cases we don't need to consider temporary buffer of tokenizer. BUG=692910 TEST=fast/frames/viewsource/* ========== to ========== Considering HTMLTokenizer state in HTMLSourceTracker HTMLSourceTracker takes account of HTMLTokenizer's temporary buffor when tracking source location of token. But in some of ScriptDataXXX states, characters are already pushed as token's characters so in that cases we don't need to consider temporary buffer of tokenizer. BUG=692910 TEST=fast/frames/viewsource/* Review-Url: https://codereview.chromium.org/2717303002 Cr-Commit-Position: refs/heads/master@{#455427} Committed: https://chromium.googlesource.com/chromium/src/+/e39d4d6f3fb1b4d558bd2d2eca1d... ==========
Message was sent while issue was closed.
Committed patchset #5 (id:80001) as https://chromium.googlesource.com/chromium/src/+/e39d4d6f3fb1b4d558bd2d2eca1d... |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
