Issue 2827653003: HTMLTokenizer: Fold isASCIIUpper() / isASCIILower() cases

Issue 2827653003: HTMLTokenizer: Fold isASCIIUpper() / isASCIILower() cases (Closed)

Created:
3 years, 8 months ago by hans

Modified:
3 years, 8 months ago

Reviewers:
Nico, kouhei (in TOK), Charlie Harrison

CC:
chromium-reviews, blink-reviews, dglazkov+blink, kinuko+watch, blink-reviews-html_chromium.org, loading-reviews+parser_chromium.org

Target Ref:
refs/heads/master

Project:
chromium

Visibility:
Public.

More Reviews

Description

HTMLTokenizer: Fold isASCIIUpper() / isASCIILower() cases I noticed that NextToken() had a bunch of code doing: if (IsASCIIUpper(c)) { doStuff(ToLowerCase(c)); } else if (IsASCIILower(c)) { doStuff(c); } Since lowercasing is done by just setting the 0x20 bit, we can fold it into the range check and always lowercase: if (IsASCIIAlpha(c)) { doStuff(ToLowerCase(c)); } This actually brings the code closer to the tokenization spec which for these states does not separate between upper- and lower-case inputs. I tried to measure the performance of this by modifying PerformanceTests/resources/runner.js to run 1000 iterations and running the html-parser.html test: $ tools/perf/run_benchmark run blink_perf.parser \ --story-filter=html-parser.html \ --browser-executable=/work/chromium/src/out/release/content_shell However I could see no change in performance. Without my change: avg 379.1670899999993 ms median 381.46249999999054 ms stdev 45.56871307699382 ms min 253.66499999997905 ms max 1199.0950000000007 ms With my change: avg 376.9821549999998 ms median 381.8125 ms stdev 42.23547601324007 ms min 254.27000000000407 ms max 1127.37 ms This does remove ~100 lines of code and shave ~1 KB off the object file size which is always something. And it should be faster, really. BUG=none Review-Url: https://codereview.chromium.org/2827653003 Cr-Commit-Position: refs/heads/master@{#465690} Committed: https://chromium.googlesource.com/chromium/src/+/d2e7b9cf0ef86f473dfce65d0317ead63e7984a8

Patch Set 1 #

Patch Set 2 : Fix presubmit checks about braces #

Total comments: 1

Patch Set 3 : Rebase #

Created: 3 years, 8 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+43 lines, -115 lines)			Patch
	M	third_party/WebKit/Source/core/html/parser/HTMLTokenizer.cpp	View	1 2	22 chunks	+43 lines, -115 lines	0 comments	Download

Messages

Total messages: 18 (7 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

hans

Please take a look. And let me know if you have good ideas for a ...

3 years, 8 months ago (2017-04-18 22:37:30 UTC) #2

Charlie Harrison

Code wise this LGTM, and is a great cleanup. Have you tried analyzing this with ...

3 years, 8 months ago (2017-04-19 00:03:38 UTC) #3

kouhei (in TOK)

lgtm Thanks for the clean up! > Have you tried analyzing this with a profiler? ...

3 years, 8 months ago (2017-04-19 00:52:27 UTC) #4

hans

I can see the BackgroundHTMLParser::pumpTokenizer event in about:tracing, but it's not clear what to do ...

3 years, 8 months ago (2017-04-19 16:44:19 UTC) #5

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2827653003/20001

3 years, 8 months ago (2017-04-19 16:45:08 UTC) #7

commit-bot: I haz the power

Try jobs failed on following builders: android_arm64_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_arm64_dbg_recipe/builds/251293) android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, ...

3 years, 8 months ago (2017-04-19 16:49:46 UTC) #9

Charlie Harrison

You can use the tracing UI to measure the duration of the tokenize step. If ...

3 years, 8 months ago (2017-04-19 16:51:12 UTC) #10

Charlie Harrison

Just curious, how did you notice this inefficiency? Just by scanning the code?

3 years, 8 months ago (2017-04-19 16:53:52 UTC) #11

hans

On 2017/04/19 16:53:52, Charlie Harrison wrote: > Just curious, how did you notice this inefficiency? ...

3 years, 8 months ago (2017-04-19 17:11:14 UTC) #12

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2827653003/40001

3 years, 8 months ago (2017-04-19 17:11:40 UTC) #15

commit-bot: I haz the power

Description was changed from ========== HTMLTokenizer: Fold isASCIIUpper() / isASCIILower() cases I noticed that NextToken() ...

3 years, 8 months ago (2017-04-19 19:06:01 UTC) #17

Message was sent while issue was closed.

Description was changed from

==========
HTMLTokenizer: Fold isASCIIUpper() / isASCIILower() cases

I noticed that NextToken() had a bunch of code doing:

  if (IsASCIIUpper(c)) {
    doStuff(ToLowerCase(c));
  } else if (IsASCIILower(c)) {
    doStuff(c);
  }

Since lowercasing is done by just setting the 0x20 bit, we can fold it into the
range check and always lowercase:

  if (IsASCIIAlpha(c)) {
    doStuff(ToLowerCase(c));
  }

This actually brings the code closer to the tokenization spec which for these
states does not separate between upper- and lower-case inputs.

I tried to measure the performance of this by modifying
PerformanceTests/resources/runner.js to run 1000 iterations and running the
html-parser.html test:

$ tools/perf/run_benchmark run blink_perf.parser \
  --story-filter=html-parser.html \
  --browser-executable=/work/chromium/src/out/release/content_shell

However I could see no change in performance.

Without my change:

avg 379.1670899999993 ms
median 381.46249999999054 ms
stdev 45.56871307699382 ms
min 253.66499999997905 ms
max 1199.0950000000007 ms

With my change:

avg 376.9821549999998 ms
median 381.8125 ms
stdev 42.23547601324007 ms
min 254.27000000000407 ms
max 1127.37 ms

This does remove ~100 lines of code and shave ~1 KB off the object file size
which is always something. And it should be faster, really.

BUG=none
==========

to

==========
HTMLTokenizer: Fold isASCIIUpper() / isASCIILower() cases

I noticed that NextToken() had a bunch of code doing:

  if (IsASCIIUpper(c)) {
    doStuff(ToLowerCase(c));
  } else if (IsASCIILower(c)) {
    doStuff(c);
  }

Since lowercasing is done by just setting the 0x20 bit, we can fold it into the
range check and always lowercase:

  if (IsASCIIAlpha(c)) {
    doStuff(ToLowerCase(c));
  }

This actually brings the code closer to the tokenization spec which for these
states does not separate between upper- and lower-case inputs.

I tried to measure the performance of this by modifying
PerformanceTests/resources/runner.js to run 1000 iterations and running the
html-parser.html test:

$ tools/perf/run_benchmark run blink_perf.parser \
  --story-filter=html-parser.html \
  --browser-executable=/work/chromium/src/out/release/content_shell

However I could see no change in performance.

Without my change:

avg 379.1670899999993 ms
median 381.46249999999054 ms
stdev 45.56871307699382 ms
min 253.66499999997905 ms
max 1199.0950000000007 ms

With my change:

avg 376.9821549999998 ms
median 381.8125 ms
stdev 42.23547601324007 ms
min 254.27000000000407 ms
max 1127.37 ms

This does remove ~100 lines of code and shave ~1 KB off the object file size
which is always something. And it should be faster, really.

BUG=none

Review-Url: https://codereview.chromium.org/2827653003
Cr-Commit-Position: refs/heads/master@{#465690}
Committed:
https://chromium.googlesource.com/chromium/src/+/d2e7b9cf0ef86f473dfce65d0317...
==========

commit-bot: I haz the power

3 years, 8 months ago (2017-04-19 19:06:02 UTC) #18

Message was sent while issue was closed.

Committed patchset #3 (id:40001) as
https://chromium.googlesource.com/chromium/src/+/d2e7b9cf0ef86f473dfce65d0317...

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages