Issue 600013003: Avoid 64-bit divide when sampling timestamp

Issue 600013003: Avoid 64-bit divide when sampling timestamp (Closed)

Created:
6 years, 3 months ago by brianderson

Modified:
6 years, 2 months ago

Reviewers:
jamesr, Peter Kasting, awong, willchan no longer on Chromium

CC:
chromium-reviews, erikwright+watch_chromium.org, jar (doing other things)

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Avoid 64-bit divide when sampling timestamp On some lower-end platforms, 64-bit integer division is emulated or is significantly slower than 32-bit integer division even if not emulated. When converting ns to us in TimeTicks::Now(), we don't need the range of a 64-bit integer because we are starting with a 32-bit integer and making it smaller. There is no need to convert it to 64-bits before the division. BUG=none Committed: https://crrev.com/738fcb80812fefc1811d7a0954d853a9cf294310 Cr-Commit-Position: refs/heads/master@{#297287}

Patch Set 1 #

Patch Set 2 : formatting #

Total comments: 2

Created: 6 years, 3 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+1 line, -1 line)			Patch
	M	base/time/time_posix.cc	View	1	1 chunk	+1 line, -1 line	2 comments	Download

Messages

Total messages: 20 (4 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

awong

On 2014/09/29 19:06:04, brianderson wrote: Can you give a rationale for this?

6 years, 2 months ago (2014-09-29 19:08:01 UTC) #3

brianderson

I stumbled upon this when investigating something @jar was looking at where TimeTicks::Now() was potentially ...

6 years, 2 months ago (2014-09-29 19:37:12 UTC) #4

awong

On 2014/09/29 19:37:12, brianderson wrote: > I stumbled upon this when investigating something @jar was ...

6 years, 2 months ago (2014-09-29 19:43:47 UTC) #5

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/600013003/20001

6 years, 2 months ago (2014-09-29 21:27:10 UTC) #9

commit-bot: I haz the power

Committed patchset #2 (id:20001) as 4d8eb2ac774e64e77eaf793a4f3cd1cc7781b039

6 years, 2 months ago (2014-09-29 22:29:59 UTC) #10

commit-bot: I haz the power

Patchset 2 (id:??) landed as https://crrev.com/738fcb80812fefc1811d7a0954d853a9cf294310 Cr-Commit-Position: refs/heads/master@{#297287}

6 years, 2 months ago (2014-09-29 22:31:13 UTC) #11

jamesr

https://codereview.chromium.org/600013003/diff/20001/base/time/time_posix.cc File base/time/time_posix.cc (right): https://codereview.chromium.org/600013003/diff/20001/base/time/time_posix.cc#newcode101 base/time/time_posix.cc:101: (static_cast<int64>(ts.tv_nsec / base::Time::kNanosecondsPerMicrosecond)); i'm pretty sure this doesn't actually ...

6 years, 2 months ago (2014-09-29 23:37:31 UTC) #13

jamesr

Yeah, on my box this makes no difference in the way ClockNow is compiled, both ...

6 years, 2 months ago (2014-09-30 00:19:03 UTC) #14

Message was sent while issue was closed.

Yeah, on my box this makes no difference in the way ClockNow is compiled, both
with and without this patch using gcc the disassembly looks like:

00000000 <_ZN12_GLOBAL__N_18ClockNowEi>:
   0:	b530      	push	{r4, r5, lr}
   2:	b083      	sub	sp, #12
   4:	4604      	mov	r4, r0
   6:	4608      	mov	r0, r1
   8:	4669      	mov	r1, sp
   a:	f7ff fffe 	bl	0 <clock_gettime>
   e:	2200      	movs	r2, #0
  10:	2300      	movs	r3, #0
  12:	b988      	cbnz	r0, 38 <_ZN12_GLOBAL__N_18ClockNowEi+0x38>
  14:	f644 50d3 	movw	r0, #19923	; 0x4dd3
  18:	f244 2540 	movw	r5, #16960	; 0x4240
  1c:	f2c1 0062 	movt	r0, #4194	; 0x1062
  20:	f2c0 050f 	movt	r5, #15
  24:	9a01      	ldr	r2, [sp, #4]
  26:	fb82 0100 	smull	r0, r1, r2, r0
  2a:	17d2      	asrs	r2, r2, #31
  2c:	ebc2 12a1 	rsb	r2, r2, r1, asr #6
  30:	9900      	ldr	r1, [sp, #0]
  32:	17d3      	asrs	r3, r2, #31
  34:	fbc5 2301 	smlal	r2, r3, r5, r1
  38:	4620      	mov	r0, r4
  3a:	e9c4 2300 	strd	r2, r3, [r4]
  3e:	b003      	add	sp, #12
  40:	bd30      	pop	{r4, r5, pc}
  42:	bf00      	nop

clang compiles this slightly differently but i think it's doing the same thing. 
With a bit of tweaking:

diff --git a/base/time/time_posix.cc b/base/time/time_posix.cc
index ad00c51..237bdb5 100644
--- a/base/time/time_posix.cc
+++ b/base/time/time_posix.cc
@@ -97,8 +97,8 @@ base::TimeTicks ClockNow(clockid_t clk_id) {
   }
 
   absolute_micro =
-      (static_cast<int64>(ts.tv_sec) * base::Time::kMicrosecondsPerSecond) +
-      (static_cast<int64>(ts.tv_nsec /
base::Time::kNanosecondsPerMicrosecond));
+      (static_cast<uint64_t>(ts.tv_sec) * base::Time::kMicrosecondsPerSecond) +
+      (static_cast<uint32_t>(ts.tv_nsec) / 1000u);
 
   return base::TimeTicks::FromInternalValue(absolute_micro);


I can get GCC to produce this instead:

00000000 <_ZN12_GLOBAL__N_18ClockNowEi>:
   0:   b510            push    {r4, lr}
   2:   b082            sub     sp, #8
   4:   4604            mov     r4, r0
   6:   4608            mov     r0, r1
   8:   4669            mov     r1, sp
   a:   f7ff fffe       bl      0 <clock_gettime>
   e:   2200            movs    r2, #0
  10:   2300            movs    r3, #0
  12:   b978            cbnz    r0, 34 <_ZN12_GLOBAL__N_18ClockNowEi+0x34>
  14:   f644 52d3       movw    r2, #19923      ; 0x4dd3
  18:   f2c1 0262       movt    r2, #4194       ; 0x1062
  1c:   f244 2140       movw    r1, #16960      ; 0x4240
  20:   9b01            ldr     r3, [sp, #4]
  22:   f2c0 010f       movt    r1, #15
  26:   fba3 2302       umull   r2, r3, r3, r2
  2a:   099a            lsrs    r2, r3, #6
  2c:   2300            movs    r3, #0
  2e:   9800            ldr     r0, [sp, #0]
  30:   fbc1 2300       smlal   r2, r3, r1, r0
  34:   4620            mov     r0, r4
  36:   e9c4 2300       strd    r2, r3, [r4]
  3a:   b002            add     sp, #8
  3c:   bd10            pop     {r4, pc}
  3e:   bf00            nop

which seems a little better but doesn't benchmark any faster (or maybe I just
suck at benchmarking, my perf seems busted so I'm not sure exactly what I'm
hitting).

:/

Peter Kasting

6 years, 2 months ago (2014-09-30 00:22:15 UTC) #16

jamesr

This should be a 32-bit operation when faster, however, since tv_nsec is always less than ...

6 years, 2 months ago (2014-09-30 00:27:02 UTC) #17

brianderson

Thanks for catching that, James. Sad that actually making it a 32-bit divide doesn't really ...

6 years, 2 months ago (2014-09-30 01:25:00 UTC) #18

jamesr

On 2014/09/30 01:25:00, brianderson wrote: > Thanks for catching that, James. > > Sad that ...

6 years, 2 months ago (2014-09-30 01:26:57 UTC) #19

Peter Kasting

6 years, 2 months ago (2014-09-30 01:31:52 UTC) #20

Message was sent while issue was closed.

On 2014/09/30 01:25:00, brianderson wrote:
> I've seen enough opcode latency charts to know that 64-bit divides have higher
> latencies than 32-bit divides in theory, so I'd still like to avoid 64-bit
> division where it make sense.

Please don't try and fix any other cases of this unless you have clear evidence
that the fix you're landing actually improves perf measurably.

We currently have a ridiculous number of places in our code that do a mix of 32-
and 64-bit calculations with time constants.  These produce tons of warnings in
MSVC, which is part of why the warning in question has been disabled.  In trying
to fix these, it's often unclear whether doing the operation in 32 bits is safe.
 Every place you add more 32-bit operations, it adds more places where either
it's not necessarily obvious to the reader that the operation is safe, or it IS
safe now but potentially could be unsafe someday.  The right answer is to make
sure we do most of these calculations in 64 bits and only use 32 bits for the
truly performance-critical exceptions, which should be very rare, and
measurable.

> Only a few of the Time constants actually need 64-bits to avoid overflow, but
> I'd be wary of just making the others 32-bits in case some code relies on
> implicit conversions.

You are right to be worried.  I don't think such a change would necessarily be
safe.  You *might* be OK as long as you leave most of the other Time functions
that return 64-bit quantities as 64 bits, but it'd be almost impossible to
verify the safety of such a change.

> Would it make sense to add separate 32-bit versions of the
> constants?

See comments above.  Don't do this unless you need to do it in order to fix
specific perf issues.  In general, we should not be attempting to purposefully
lower the math we do with Time-related data to 32 bits.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages