|
|
Created:
5 years, 11 months ago by miu Modified:
4 years, 5 months ago Reviewers:
jamesr, Alexander Potapenko, brucedawson, cpu_(ooo_6.6-7.5), brianderson, fmeawad, darin (slow to review), DaleCurtis CC:
chromium-reviews Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Description[Windows] One TimeTicks clock: efficient/reliable high-res, with low-res fallback.
This change resolves a critical issue where two different clocks' values
were being mixed in computations involving TimeTicks for ~32% of users
on Windows. The solution is to force the use of a single clock for
TimeTicks::Now() and HighResNow(). In other words, we're moving towards
a world where the TimeTicks type should guarantee all time values are on
the same timeline.
New behavior: On systems where QPC is reliable AND efficient, Now() will
simply use the QPC clock. This is consistent with how Now() behaves on
the Mac and POSIX platforms. However, if QPC is working but is
inefficient, Now() will use the low-res clock. Finally, like before, if
the QPC is not working on the current system, Now() will use the low-res
clock.
NowFromSystemTraceTime() will always use the QPC clock if it is
working, just as before, and regardless of whether it is expensive to
use. This is to ensure tracing data remains useful. When QPC uses either
HPET or ACPI PM as a clock source, online literature suggests clock
sampling will be ~20 to ~40 times as costly (about 1.2 to 2.4us per
call).
Note: From discussion with stakeholders, there is a concern this change
will increase latency and jitter in the compositor pipeline on the
affected Windows machines. Future changes will take alternate approaches
to resolve these problems.
Note 2: This change minimizes TimeTicks API changes, in favor of future
clean-up changes.
BUG=396384, 435931, 439656
Committed: https://crrev.com/5990719bcd1403b5a5b45874ce808727c033e7ec
Cr-Commit-Position: refs/heads/master@{#311414}
Patch Set 1 : #
Total comments: 19
Patch Set 2 : Resolve lack of std::abs(int64) overload on some toolchains. #
Total comments: 10
Patch Set 3 : Addressed comments from PS1 and PS2. Threading fixes in InitializeNowFunctionPointers(). #
Total comments: 8
Patch Set 4 : One more memory barrier, and update technical discussion. #
Total comments: 2
Patch Set 5 : Use std::atomic, and everyone is happy. #
Total comments: 2
Patch Set 6 : Use std::atomic_thread_fence() instead, to avoid subtle sync error. #Patch Set 7 : Workaround because use of <atomic> is not allowed yet. :-( #
Total comments: 8
Patch Set 8 : Fix comment about atomic_thread_fence. #Patch Set 9 : Fix macro naming style. #
Total comments: 2
Patch Set 10 : Better testing of TimeTicks::FromQPCValue(). #
Total comments: 5
Patch Set 11 : Rewrite FromQPCValue() test again. #Patch Set 12 : rebase #
Total comments: 1
Messages
Total messages: 54 (10 generated)
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
dalecurtis@chromium.org changed reviewers: + dalecurtis@chromium.org
Does this respect the API contract for TimeTicks when the low resolution clock is used? I.e. time won't go backwards?
On 2015/01/06 01:15:03, DaleCurtis wrote: > Does this respect the API contract for TimeTicks when the low resolution clock > is used? I.e. time won't go backwards? Yes. For Chrome, there has long-been a population of Windows systems that exclusively use the low-res clock (i.e., timeGetTime()) for TimeTicks, and a separate population that exclusively uses the high-res clock. I would expect we would have seen bugs by now, or unit test failures.
miu@chromium.org changed reviewers: + brianderson@chromium.org, cpu@chromium.org, darin@chromium.org, fmeawad@chromium.org
Patchset #1 (id:40001) has been deleted
all reviewers: PTAL.
Do we have data which shows exactly what a 20x-40x cost in QPC calls will actually mean for Chrome performance? I.e., does this result in say a 30% pagecycler hit, etc? It seems we could simply always use QPC (except when it's deemed unreliable) even if it's more expensive if the cost doesn't actually mean anything. As with pinning the resolution at 5ms for XP users, it would seemingly eliminate most issues for most of that 32%. /handwavy. Otherwise, as a non-owner of any to these classes, this lgtm :)
Please add a more significant test section as well. I.e. what does this look like for animations, playback on older XP systems.
https://codereview.chromium.org/797893003/diff/60001/base/time/time.h File base/time/time.h (right): https://codereview.chromium.org/797893003/diff/60001/base/time/time.h#newcode105 base/time/time.h:105: bool is_negative() const { This is not used and appears unrelated. If the intention is to make calling code more readable, maybe make a separate patch? https://codereview.chromium.org/797893003/diff/60001/base/time/time.h#newcode110 base/time/time.h:110: TimeDelta magnitude() const { This is only used for testing. Do you plan to use this in production code too? https://codereview.chromium.org/797893003/diff/60001/base/time/time_unittest.cc File base/time/time_unittest.cc (right): https://codereview.chromium.org/797893003/diff/60001/base/time/time_unittest.... base/time/time_unittest.cc:798: TEST(TimeDelta, ParanoidAboutAbsFunction) { Do you plan to remove this test before landing? https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc File base/time/time_win.cc (left): https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ol... base/time/time_win.cc:447: static base::LazyInstance<HighResNowSingleton>::Leaky You are no longer using a LazyInstance for QPC. We'll need to be extra careful that there are no races in your new code. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:410: g_qpc_ticks_per_second = ticks_per_sec.QuadPart; g_qpc_ticks_per_second is 64-bits. According to http://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx simple 64-bit reads/writes are not guaranteed to be atomic on 32-bit Windows. Would you need to use InterlockedExchange64 to set it and something like InterlockedAdd64(*,0) to read it? QueryPerformanceFrequency likely returns the same value no matter how often it is called, but is that an assumption we want to make? If it doesn't (for example by returning 0 then non-0), then it would not work. If it does, we're probably okay, but it's hard to reason about. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:431: (cpu.vendor_name() == "AuthenticAMD" && cpu.family() == 15)) { I liked the IsBuggyAthlon function, it would make this if easier to understand. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:432: now_function = system_trace_now_function = &RolloverProtectedNow; I also think system trace should always use HighResNow, even if it is slow or unreliable.
On 2015/01/07 00:14:12, DaleCurtis wrote: > Do we have data which shows exactly what a 20x-40x cost in QPC calls will > actually mean for Chrome performance? I.e., does this result in say a 30% > pagecycler hit, etc? No, but FWICT people who did prior work on base/time_win.cc gathered the data and made the call. > It seems we could simply always use QPC (except when it's deemed unreliable) > even if it's more expensive if the cost doesn't actually mean anything. As with > pinning the resolution at 5ms for XP users, it would seemingly eliminate most > issues for most of that 32%. /handwavy. Agreed. I think you've identified the point from the discussion: If compositor scheduling/management can use a known-inexpensive clock with ~5ms granularity, that might be all that is neeeded to do a good job.
On 2015/01/07 00:14:55, DaleCurtis wrote: > Please add a more significant test section as well. I.e. what does this look > like for animations, playback on older XP systems. I don't understand what additional testing in src/base/time should be added for those areas. I'd expect UI and compositor unit tests and perf tests to cover these areas.
I just meant you should add some commit notes (TEST=?) about exactly what you've done to verify this change doesn't blow up the world. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:410: g_qpc_ticks_per_second = ticks_per_sec.QuadPart; On 2015/01/07 00:53:44, brianderson wrote: > g_qpc_ticks_per_second is 64-bits. > > According to > http://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx > simple > 64-bit reads/writes are not guaranteed to be atomic on 32-bit Windows. > > Would you need to use InterlockedExchange64 to set it and something like > InterlockedAdd64(*,0) to read it? > > QueryPerformanceFrequency likely returns the same value no matter how often it > is called, but is that an assumption we want to make? If it doesn't (for example > by returning 0 then non-0), then it would not work. If it does, we're probably > okay, but it's hard to reason about. Yeah this seems fishy, for the same reason that the InterlockedExchange calls are required for the pointers below. You may have multiple InitializeNowFunctionPointers() calls in flight. You should probably use a local value for most of the function and an InterlockedExchange64() below with the pointer setup functions. Presumably that's safe for the same reason the pointer setup is safe: these values aren't going to change between calls. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:478: if (g_now_function == &InitialNowFunction) This needs an interlocked read. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:496: return TimeTicks() + QPCValueToTimeDelta(qpc_value); Any chance someone might call FromQPCValue() below any of the above calls?
brucedawson@chromium.org changed reviewers: + brucedawson@chromium.org
https://codereview.chromium.org/797893003/diff/80001/base/time/time.h File base/time/time.h (right): https://codereview.chromium.org/797893003/diff/80001/base/time/time.h#newcode643 base/time/time.h:643: // perform math with TimeTicks values returned by Now() and expect meaningful Improve comment clarity? < perform math with TimeTicks values returned by Now() and expect meaningful > perform math between TimeTicks values returned by Now() and ThreadNow() and expect meaningful https://codereview.chromium.org/797893003/diff/80001/base/time/time.h#newcode657 base/time/time.h:657: // perform math with TimeTicks values returned by Now() and expect meaningful Same comment clarification applies. https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc#ne... base/time/time_win.cc:440: InterlockedExchangePointer( What protection is InterlockedExchangePointer supposed to give us over a normal write, given that all of the reads are normal reads? If there is a risk that the writes are not atomic then presumably there is an identical risk that the reads are not atomic, so atomic writes alone don't fix things.
https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc#ne... base/time/time_win.cc:440: InterlockedExchangePointer( On 2015/01/07 01:49:26, brucedawson wrote: > What protection is InterlockedExchangePointer supposed to give us over a normal > write, given that all of the reads are normal reads? If there is a risk that the > writes are not atomic then presumably there is an identical risk that the reads > are not atomic, so atomic writes alone don't fix things. Pointers will be 32-bits on 32-bit systems, so simple reads and writes should be atomic. Does that mean we don't need these InterlockedExchangePointer's?
https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc#ne... base/time/time_win.cc:440: InterlockedExchangePointer( On 2015/01/07 01:53:07, brianderson wrote: > On 2015/01/07 01:49:26, brucedawson wrote: > > What protection is InterlockedExchangePointer supposed to give us over a > normal > > write, given that all of the reads are normal reads? If there is a risk that > the > > writes are not atomic then presumably there is an identical risk that the > reads > > are not atomic, so atomic writes alone don't fix things. > > Pointers will be 32-bits on 32-bit systems, so simple reads and writes should be > atomic. Does that mean we don't need these InterlockedExchangePointer's? If we do get rid of the InterlockedExchanges, we'd probably need to add barriers to make sure the g_qpc_ticks_per_second is really initialized first.
> If we do get rid of the InterlockedExchanges, we'd probably need to add barriers > to make sure the g_qpc_ticks_per_second is really initialized first. If the Interlocks are intended to be barriers then remember that barriers always need to occur in pairs -- in addition to a barrier between the writes the reader would need a barrier between the reads. Luckily barriers on Windows (x86/x64) are super cheap. Alternately, can we just make sure the code is initialized while we are still single threaded?
Thanks, everyone. All comments addressed. PTAL. https://codereview.chromium.org/797893003/diff/60001/base/time/time.h File base/time/time.h (right): https://codereview.chromium.org/797893003/diff/60001/base/time/time.h#newcode105 base/time/time.h:105: bool is_negative() const { On 2015/01/07 00:53:43, brianderson wrote: > This is not used and appears unrelated. If the intention is to make calling code > more readable, maybe make a separate patch? Ah. Yes, it was used in an earlier iteration of this change (before magnitude() used std::abs()). I'll remove for this change, and follow-up with a clean-up change since this functionality is being used all over the place (especially in media code): $ git grep ' < base::TimeDelta()' | wc -l 29 https://codereview.chromium.org/797893003/diff/60001/base/time/time.h#newcode110 base/time/time.h:110: TimeDelta magnitude() const { On 2015/01/07 00:53:43, brianderson wrote: > This is only used for testing. Do you plan to use this in production code too? Yes. I've personally written a lot of code in src/media that would really benefit from having this. Like is_negative(), I'll follow up with a clean-up change. https://codereview.chromium.org/797893003/diff/60001/base/time/time_unittest.cc File base/time/time_unittest.cc (right): https://codereview.chromium.org/797893003/diff/60001/base/time/time_unittest.... base/time/time_unittest.cc:798: TEST(TimeDelta, ParanoidAboutAbsFunction) { On 2015/01/07 00:53:44, brianderson wrote: > Do you plan to remove this test before landing? The trybots on PS1 forced me to change tactics. So, this test was somewhat rewritten to confirm my inlined abs() implementation. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc File base/time/time_win.cc (left): https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ol... base/time/time_win.cc:447: static base::LazyInstance<HighResNowSingleton>::Leaky On 2015/01/07 00:53:44, brianderson wrote: > You are no longer using a LazyInstance for QPC. We'll need to be extra careful > that there are no races in your new code. Acknowledged. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:410: g_qpc_ticks_per_second = ticks_per_sec.QuadPart; On 2015/01/07 01:38:36, DaleCurtis wrote: > On 2015/01/07 00:53:44, brianderson wrote: > > 64-bit reads/writes are not guaranteed to be atomic on 32-bit Windows. > ... > You may have multiple > InitializeNowFunctionPointers() calls in flight. You should probably use a local > value for most of the function > ... > Presumably that's safe for the same reason the pointer setup is safe: these > values aren't going to change between calls. Fixed. My solution is to get rid of all the InterlockedExchangeXXX() mess and simply put a single memory barrier in-place to guarantee visibility ordering. This resolves both the atomicity problem Brian pointed out as well as the parallel InitializeNowFunctionPointers() calls. See comments I added to the code for further explanation. > QueryPerformanceFrequency likely returns the same value no matter how often it > is called, but is that an assumption we want to make? The MSDN documentation for this function states, "The frequency of the performance counter is fixed at system boot and is consistent across all processors." Also, this is what the old code assumed. So, I think we're good here. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:431: (cpu.vendor_name() == "AuthenticAMD" && cpu.family() == 15)) { On 2015/01/07 00:53:44, brianderson wrote: > I liked the IsBuggyAthlon function, it would make this if easier to understand. Done. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:432: now_function = system_trace_now_function = &RolloverProtectedNow; On 2015/01/07 00:53:44, brianderson wrote: > I also think system trace should always use HighResNow, even if it is slow or > unreliable. Done. https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:478: if (g_now_function == &InitialNowFunction) On 2015/01/07 01:38:36, DaleCurtis wrote: > This needs an interlocked read. With the new, simpler memory-ordering model (PS3), it doesn't anymore. The new value for |g_now_function| is either visible to the thread, or it isn't and a parallel call to InitializeNowFunctionPointers() could result (which is okay). https://codereview.chromium.org/797893003/diff/60001/base/time/time_win.cc#ne... base/time/time_win.cc:496: return TimeTicks() + QPCValueToTimeDelta(qpc_value); On 2015/01/07 01:38:36, DaleCurtis wrote: > Any chance someone might call FromQPCValue() below any of the above calls? No. I changed the header comment in time.h to explicitly read "Do NOT attempt to use this if IsHighResolution() returns false." This means IsHighResolution() must be called before this function, which will guarantee all the globals are initialized first. I've confirmed all uses of this function make the IsHighResolution() call first, and there's a DCHECK in QPCValueToTimeDelta() to ensure it stays that way for the future. https://codereview.chromium.org/797893003/diff/80001/base/time/time.h File base/time/time.h (right): https://codereview.chromium.org/797893003/diff/80001/base/time/time.h#newcode643 base/time/time.h:643: // perform math with TimeTicks values returned by Now() and expect meaningful On 2015/01/07 01:49:26, brucedawson wrote: > Improve comment clarity? > > < perform math with TimeTicks values returned by Now() and expect meaningful > > perform math between TimeTicks values returned by Now() and ThreadNow() and > expect meaningful Done. https://codereview.chromium.org/797893003/diff/80001/base/time/time.h#newcode657 base/time/time.h:657: // perform math with TimeTicks values returned by Now() and expect meaningful On 2015/01/07 01:49:26, brucedawson wrote: > Same comment clarification applies. Done. https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc#ne... base/time/time_win.cc:440: InterlockedExchangePointer( On 2015/01/07 01:49:26, brucedawson wrote: > What protection is InterlockedExchangePointer supposed to give us over a normal > write, given that all of the reads are normal reads? If there is a risk that the > writes are not atomic then presumably there is an identical risk that the reads > are not atomic, so atomic writes alone don't fix things. There was a subtlety in the old code. Use of the volatile keyword with MSVC forces it to do acquire barriers before loads and release barriers after stores. I've resolved all of this in PS3 without this voodoo magic. ;-) https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc#ne... base/time/time_win.cc:440: InterlockedExchangePointer( On 2015/01/07 01:53:07, brianderson wrote: > On 2015/01/07 01:49:26, brucedawson wrote: > > What protection is InterlockedExchangePointer supposed to give us over a > normal > > write, given that all of the reads are normal reads? If there is a risk that > the > > writes are not atomic then presumably there is an identical risk that the > reads > > are not atomic, so atomic writes alone don't fix things. > > Pointers will be 32-bits on 32-bit systems, so simple reads and writes should be > atomic. Does that mean we don't need these InterlockedExchangePointer's? Yep. Eliminated in PS3. https://codereview.chromium.org/797893003/diff/80001/base/time/time_win.cc#ne... base/time/time_win.cc:440: InterlockedExchangePointer( On 2015/01/07 01:54:53, brianderson wrote: > On 2015/01/07 01:53:07, brianderson wrote: > > On 2015/01/07 01:49:26, brucedawson wrote: > > > What protection is InterlockedExchangePointer supposed to give us over a > > normal > > > write, given that all of the reads are normal reads? If there is a risk that > > the > > > writes are not atomic then presumably there is an identical risk that the > > reads > > > are not atomic, so atomic writes alone don't fix things. > > > > Pointers will be 32-bits on 32-bit systems, so simple reads and writes should > be > > atomic. Does that mean we don't need these InterlockedExchangePointer's? > > If we do get rid of the InterlockedExchanges, we'd probably need to add barriers > to make sure the g_qpc_ticks_per_second is really initialized first. Done.
https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc#n... base/time/time_win.cc:362: // RDTSC on each processor are consistent with each other, and apply a handful comment section 361 - 367 needs update? https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc#n... base/time/time_win.cc:370: // to 55 milliseconds) time stamp but is comparatively less expensive to 55 ? https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc#n... base/time/time_win.cc:447: // remind me again why we don't simply initialize this once, cleanly when we are single threaded?
ah one more important question. We are branching in ~3 days. Do we want to land this after the branch?
On 2015/01/07 17:42:33, cpu wrote: > ah one more important question. We are branching in ~3 days. Do we want to land > this after the branch? Arguments in favor of landing before the branch: 1. Most of this change is clean-up/refactoring of the existing code. The only logic change is the decision to use the low-res clock on more systems. Granted, that's a pretty invasive change, but what we have now is already broken for the same set of users. 2. This fixes a serious problem that is causing a very bad user experience w.r.t. A/V sync for video playback, tab mirroring (e.g., to Chromecast), etc. See bugs referenced in change description. 3. Throughout this process, I've had access to an affected machine in our testing lab. So, I'm pretty confident it works without breaking anything obvious. 4. We still have 6 weeks of "bake time" and can always revert if problems arise.
https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc#n... base/time/time_win.cc:362: // RDTSC on each processor are consistent with each other, and apply a handful On 2015/01/07 17:41:31, cpu wrote: > comment section 361 - 367 needs update? Good idea. I'll update this discussion with all my recent findings in the next patch set. https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc#n... base/time/time_win.cc:370: // to 55 milliseconds) time stamp but is comparatively less expensive to On 2015/01/07 17:41:31, cpu wrote: > 55 ? Not sure where these numbers come from, but I'll update them. I've been doing enough debugging on several different machines to believe the well-known 1ms - 15.6ms is the correct range. https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc#n... base/time/time_win.cc:447: // On 2015/01/07 17:41:31, cpu wrote: > remind me again why we don't simply initialize this once, cleanly when we are > single threaded? We could. I was just sticking with what we had. Where would the "startup one-time init" call be made? Is there a base::Init() or something similar already being used to initialize the library? Also, how do we account for startup in all other non-chrome projects/libraries? I think we're all focused on this one-time-init threading issue where, in reality, I doubt there ever is a race. Nevertheless, the MemoryBarrier is a dirt-simple solution to resolve that, and I'm sure MSAN will hound me if I'm wrong. ;-)
https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc#n... base/time/time_win.cc:384: TimeDelta QPCValueToTimeDelta(LONGLONG qpc_value) { @brucedawson makes a good point about needing to protect the reads. If we stick with the current approach, do we also need a barrier at the start of this function to make sure the value of g_pqc_ticks_per_second isn't fetched before g_now_function is set? It's *extremely* unlikely since a CPU wouldn't know to predictively branch here until after g_now_function is set, but theoretically possible given sufficiently crazy optimizations+reordering. Would definitely prefer a single threaded init if possible, but is there a way to do that just from code in base without requiring all base users to call some init function?
we do this kind of stuff in several places for example content_main_runner.cc Init() is one of my favorites For the test binaries there is something equivalent. Check TestSuite::PreInitialize Usually there are other binaries that might not get it but you can find them by looking at all of the ones that construct base::AtExitManager A second option is to use base::Singleton which already does all the tricks for memory coherency. The code you are re-factoring was written before that.
PTAL. The one-time-init threading issues should be all resolved now. There's been some additional discussion about whether the init should happen in a single-threaded environment, rather than on-demand in a multi-threaded environment. Regardless of which is the better path to take, I feel now is not a good time to chase down all the applications (including non-chrome ones) that use base::TimeTicks and modify their startup routines. My solution to this problem--just add two memory barriers to otherwise single-threaded code to make it thread-safe--is a very simple solution that should work correctly and efficiently. https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/100001/base/time/time_win.cc#n... base/time/time_win.cc:384: TimeDelta QPCValueToTimeDelta(LONGLONG qpc_value) { On 2015/01/07 19:53:11, brianderson wrote: > @brucedawson makes a good point about needing to protect the reads. > > If we stick with the current approach, do we also need a barrier at the start of > this function to make sure the value of g_pqc_ticks_per_second isn't fetched > before g_now_function is set? It's *extremely* unlikely since a CPU wouldn't > know to predictively branch here until after g_now_function is set, but > theoretically possible given sufficiently crazy optimizations+reordering. Done. I had considered this, but deemed it impossible because how would a compiler/CPU know how to predictively branch w/o knowing the function pointer value? However, a teammate pointed out to me that the compiler could decide not to use function pointers at all (e.g., if it noticed g_now_function could only take 2 possible values, it might choose to replace the function pointer with a branch instruction + inlined code at each call point instead). So, to be absolutely explicit (and since it is cheap), I've added the second memory barrier here. Now the code is correct from the theoretical standpoint too. :)
Consider not inserting a memory barrier *instruction* on the read. https://codereview.chromium.org/797893003/diff/120001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/120001/base/time/time_win.cc#n... base/time/time_win.cc:392: base::subtle::MemoryBarrier(); Well darn. I got what I requested and now I'm sad. I hadn't realized that the only explicit memory barrier available in Chrome is a full memory barrier -- an actual instruction executed into the stream that prevents all reordering. All we really need is a read-acquire barrier to stop the compiler from rearranging code, but Acquire_Load is the only way that atomicops.h provides that, and that only allows loading a 32-bit value. Since this is Windows specific code we can use _ReadBarrier() to get the intended effect. It's marked deprecated but that's okay. http://msdn.microsoft.com/en-us/library/z055s48f.aspx We could add a 64-bit Acquire_Load but that would further increase the scope. BTW, since a barrier goes *between* two accesses I always like to say what accesses it is going between. i.e.; "Barrier between read of g_now_function and g_qpc_ticks_per_second to prevent read-reordering." But that just be what best fits my mental model.
Bruce: PTAL. Addressed your comment: https://codereview.chromium.org/797893003/diff/120001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/120001/base/time/time_win.cc#n... base/time/time_win.cc:392: base::subtle::MemoryBarrier(); On 2015/01/07 22:29:13, brucedawson wrote: > Well darn. I got what I requested and now I'm sad. > > I hadn't realized that the only explicit memory barrier available in Chrome is a > full memory barrier -- an actual instruction executed into the stream that > prevents all reordering. All we really need is a read-acquire barrier to stop > the compiler from rearranging code, but Acquire_Load is the only way that > atomicops.h provides that, and that only allows loading a 32-bit value. Done. Since our Windows toolchain was recently updated to MSVC 2013, we now have std::atomic support. Therefore, I was able to do an atomic store with barrier-release ordering; as well as an atomic load with barrier-acquire ordering.
https://codereview.chromium.org/797893003/diff/140001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/140001/base/time/time_win.cc#n... base/time/time_win.cc:463: std::memory_order_release); I *think* this is wrong. I believe that it is the write to g_now_function that needs std::memory_order_release. g_qpc_ticks_per_second is the payload, and g_now_function is the flag that makes it available, so g_now_function is what is releasing access to g_qpc_ticks_per_second. See the example from the top of one of my favorite lockless resources: http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-you... This would mean that reads from g_now_function would need the std::memory_order_acquire. See also: http://preshing.com/20130823/the-synchronizes-with-relation/ Note that the alternative is to use std::atomic_thread_fence(std::memory_order_acquire); and std::atomic_thread_fence(std::memory_order_release);, placing these between the reads and between the writes. The acquire fence could either go right before the read of g_qpc_ticks_per_second or right after the read of g_now_function -- whichever is most convenient. All this debate over a race that in practice might never happen. Such fun.
https://codereview.chromium.org/797893003/diff/140001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/140001/base/time/time_win.cc#n... base/time/time_win.cc:463: std::memory_order_release); On 2015/01/08 01:36:57, brucedawson wrote: > I *think* this is wrong. I believe that it is the write to g_now_function that > needs std::memory_order_release. g_qpc_ticks_per_second is the payload, and > g_now_function is the flag that makes it available, so g_now_function is what is > releasing access to g_qpc_ticks_per_second. Ah, yes. I understand now. I fixed this by using the std::atomic_thread_fence() approach, as it turned out to be much simpler.
wait, can we use <atomic> ? last I heard most library features were disallowed.
If <atomic> is disallowed (and it looks like it is -- no sign of it on http://chromium-cpp.appspot.com/) then a simple fix is required. Just replace the two existing fences with _ReadWriteBarrier(). I just proved that this is equivalent by stepping in to the debug version of atomic_thread_fence and confirmed that for both memory_order types we are using it resolves down to _ReadWriteBarrier(). It is sufficient (on x86/x64 only) because of the pretty strict memory ordering of x86/x64 processors. On ARM we would need a memory barrier instruction, but I assume we don't build Chrome for ARM for Windows. With that change, lgtm
On 2015/01/08 07:49:55, brucedawson wrote: > If <atomic> is disallowed (and it looks like it is -- no sign of it on > http://chromium-cpp.appspot.com/) then a simple fix is required. Just replace > the two existing fences with _ReadWriteBarrier(). Done. > I just proved that this is > equivalent by stepping in to the debug version of atomic_thread_fence and > confirmed that for both memory_order types we are using it resolves down to > _ReadWriteBarrier(). I also confirmed this by examining the MSVC 2013 header files, to make sure it was the same for all Intel architectures.
https://codereview.chromium.org/797893003/diff/180001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/180001/base/time/time_win.cc#n... base/time/time_win.cc:389: // what std::atomic_thread_fence does on Windows on all Intel architectures: To be clear, this is what atomic_thread_fence does if memory_order is not memory_order_seq_cst. If memory_order is memory_order_seq_cst then a full fence instruction is required. We don't need that so this is just a comment error.
https://codereview.chromium.org/797893003/diff/180001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/180001/base/time/time_win.cc#n... base/time/time_win.cc:389: // what std::atomic_thread_fence does on Windows on all Intel architectures: On 2015/01/08 19:37:48, brucedawson wrote: > To be clear, this is what atomic_thread_fence does if memory_order is not > memory_order_seq_cst. If memory_order is memory_order_seq_cst then a full fence > instruction is required. We don't need that so this is just a comment error. Done.
jamesr@chromium.org changed reviewers: + jamesr@chromium.org
https://codereview.chromium.org/797893003/diff/180001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/180001/base/time/time_win.cc#n... base/time/time_win.cc:390: #define atomic_thread_fence(memory_order) _ReadWriteBarrier(); this doesn't follow macro naming rules: http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Macro_Names naming it as if it were a standard feature and not having it actually do what the standard says is a pretty bad idea. If you want a read write barrier then just say so. we don't have any intention of supporting windows on non-intel platforms. https://codereview.chromium.org/797893003/diff/180001/ui/gfx/frame_time.h File ui/gfx/frame_time.h (right): https://codereview.chromium.org/797893003/diff/180001/ui/gfx/frame_time.h#new... ui/gfx/frame_time.h:30: return base::TimeTicks::IsHighResolution(); imo you should just delete this, it's not producing anything useful
https://codereview.chromium.org/797893003/diff/180001/ui/gfx/frame_time.h File ui/gfx/frame_time.h (right): https://codereview.chromium.org/797893003/diff/180001/ui/gfx/frame_time.h#new... ui/gfx/frame_time.h:30: return base::TimeTicks::IsHighResolution(); On 2015/01/08 20:31:54, jamesr wrote: > imo you should just delete this, it's not producing anything useful Opened a bug to remove the whole gfx::FrameTime class: https://code.google.com/p/chromium/issues/detail?id=447329 https://codereview.chromium.org/797893003/diff/220001/base/time/time_win_unit... File base/time/time_win_unittest.cc (left): https://codereview.chromium.org/797893003/diff/220001/base/time/time_win_unit... base/time/time_win_unittest.cc:252: TEST(TimeTicks, FromQPCValue) { Please keep this test. It verifies the (qpc_value < Time::kQPCOverflowThreshold) optimization in QPCValueToTimeDelta is correct.
https://codereview.chromium.org/797893003/diff/180001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/180001/base/time/time_win.cc#n... base/time/time_win.cc:390: #define atomic_thread_fence(memory_order) _ReadWriteBarrier(); On 2015/01/08 20:31:54, jamesr wrote: > this doesn't follow macro naming rules: > http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Macro_Names Done. Fixed capitalization. > naming it as if it were a standard feature and not having it actually do what > the standard says is a pretty bad idea. If you want a read write barrier then > just say so. That's not the case here. The standard only specifies the behavior requirements, but it's up to the implementation as to how that behavior is achieved. The point of using the macro is to communicate the specific memory ordering behavior needed at the two codepoints. If I put _ReadWriteBarrier() at the two codepoints instead, this forces people to reverse-engineer/guess my original intentions. It feels really silly/stupid not to use <atomic> here. The rationale for the ban (http://chromium-cpp.appspot.com) doesn't apply to what I'm doing here. This is a platform-specific module, where the only toolchains that would touch this code DO support <atomic>. https://codereview.chromium.org/797893003/diff/180001/ui/gfx/frame_time.h File ui/gfx/frame_time.h (right): https://codereview.chromium.org/797893003/diff/180001/ui/gfx/frame_time.h#new... ui/gfx/frame_time.h:30: return base::TimeTicks::IsHighResolution(); On 2015/01/08 20:31:54, jamesr wrote: > imo you should just delete this, it's not producing anything useful What do you mean by "delete this?" Are you referring to this method, or the whole FrameTime class? Either way, I'd prefer to follow-up with this in a later clean-up change. (I've already been maintaining a TODO list to track several other clean-up items resulting from this change.) https://codereview.chromium.org/797893003/diff/180001/ui/gfx/frame_time.h#new... ui/gfx/frame_time.h:30: return base::TimeTicks::IsHighResolution(); On 2015/01/08 21:10:36, brianderson wrote: > On 2015/01/08 20:31:54, jamesr wrote: > > imo you should just delete this, it's not producing anything useful > > Opened a bug to remove the whole gfx::FrameTime class: > https://code.google.com/p/chromium/issues/detail?id=447329 Acknowledged. https://codereview.chromium.org/797893003/diff/220001/base/time/time_win_unit... File base/time/time_win_unittest.cc (left): https://codereview.chromium.org/797893003/diff/220001/base/time/time_win_unit... base/time/time_win_unittest.cc:252: TEST(TimeTicks, FromQPCValue) { On 2015/01/08 21:10:36, brianderson wrote: > Please keep this test. It verifies the (qpc_value < Time::kQPCOverflowThreshold) > optimization in QPCValueToTimeDelta is correct. It's still there (the TimeTicks.Drift test is what I deleted). I rewrote the test because the test code doing the math was copy-paste from the impl (i.e., nothing was really being tested). Actually, now that I've looked at the code I wrote again, I felt I could do better. PTAL.
I mentioned <atomic> also because in another thread we were looking if a binary generated with VS2015 would run on XP. There was talk that using certain libraries could cause the CRT to try to link with APIs that only exist in Vista+ see bug 440500 maybe its all FUD. I feel all the 1 /5 energy spent on this concurrency stuff woud have sufficed to init this stuff one time at the start.
https://codereview.chromium.org/797893003/diff/240001/base/time/time_win_unit... File base/time/time_win_unittest.cc (right): https://codereview.chromium.org/797893003/diff/240001/base/time/time_win_unit... base/time/time_win_unittest.cc:241: for (int i = 0; i < kIterations; ++i) { I don't think this for loop tests the 3 corner cases of QPCOverflowThreshold+(-1,0,+1) that the old test had. Maybe have a vector of test cases and push those on at the end? https://codereview.chromium.org/797893003/diff/240001/base/time/time_win_unit... base/time/time_win_unittest.cc:246: const TimeTicks expected_value = start_time + This test rounds to microseconds twice (here and in the calculation of start_time), whereas the real implementation only rounds once. Is that why the tolerance_us is needed? If so, can you add a comment because it's not clear why a tolerance is needed. Alternatively, find a way to get rid of the tolerance. Two options: 1) It would be nice if base had a BigInt of some sort, but I could only find ones outside of base. @hendrikw mentions there's a __int128 on Windows, can we use that? 2) Can we rely on the fact that subtracting to "de-overflow" an overflowed multiplication of an unsigned value still results in the correct value? As a sanity check: TEST(TimeTicks, SubtractToUnoverflowAfterMultiplyOverflowIsOkay) { uint8 a8 = 3; uint32 a32 = 3; for (uint8 b8 = 255u / a8; b8 !=0; b8++) { uint32 b32 = b8; for (uint8 c8 = (a32 * b32) - 255; c8 != 0; c8++) { uint32 c32 = c8; // m8 will overflow, but m32 will not. uint8 m8 = a8 * b8; uint32 m32 = a32 * b32; // Verifies that there was overflow. EXPECT_NE(uint32(m8), m32); EXPECT_EQ(m8, uint8(m32)); // Only checks cases where subtracting will deoverflow. if (m32 - c32 <= 255u) { // Verify that "de overflowing" works. EXPECT_EQ(uint32(uint8(m8 - c8)), m32 - c32); // For some reason this fails. My understanding of order of operations when it comes to casting is clearly lacking. //EXPECT_EQ(uint32((m8 - c8)), m32 - c32); } } } }
lgtm on my side, only looked at the windows parts in detail.
brianderson: Addressed your comments on the FromQPCValue() test. LGTY now? https://codereview.chromium.org/797893003/diff/240001/base/time/time_win_unit... File base/time/time_win_unittest.cc (right): https://codereview.chromium.org/797893003/diff/240001/base/time/time_win_unit... base/time/time_win_unittest.cc:241: for (int i = 0; i < kIterations; ++i) { On 2015/01/09 00:02:02, brianderson wrote: > I don't think this for loop tests the 3 corner cases of > QPCOverflowThreshold+(-1,0,+1) that the old test had. Maybe have a vector of > test cases and push those on at the end? Done. https://codereview.chromium.org/797893003/diff/240001/base/time/time_win_unit... base/time/time_win_unittest.cc:246: const TimeTicks expected_value = start_time + On 2015/01/09 00:02:02, brianderson wrote: > Alternatively, find a way to get rid of the tolerance. I looked into 128-bit integers, but didn't like what I found. One show-stopper: There don't seem to be any intrinsics to do 128-bit divide, and I didn't want to write my own function. However, I came up with a simple solution to get the job done: Just use straightforward floating-point math. Double-precision is "precise enough" to test the logic down to the microsecond. Note: I purposely DO NOT test 64-bit values that cannot be represented.
https://codereview.chromium.org/797893003/diff/240001/base/time/time_win_unit... File base/time/time_win_unittest.cc (right): https://codereview.chromium.org/797893003/diff/240001/base/time/time_win_unit... base/time/time_win_unittest.cc:246: const TimeTicks expected_value = start_time + On 2015/01/14 02:12:24, miu wrote: > On 2015/01/09 00:02:02, brianderson wrote: > > Alternatively, find a way to get rid of the tolerance. > > I looked into 128-bit integers, but didn't like what I found. One show-stopper: > There don't seem to be any intrinsics to do 128-bit divide, and I didn't want to > write my own function. > > However, I came up with a simple solution to get the job done: Just use > straightforward floating-point math. Double-precision is "precise enough" to > test the logic down to the microsecond. Note: I purposely DO NOT test 64-bit > values that cannot be represented. Thanks for looking into that. It would have been nice to test without a tolerance, but within a microsecond is more than good enough and this approach is better than what we had before, which wasn't testing values over the threshold well. lgtm!
LGTM, deferring to the other reviewers
The CQ bit was checked by miu@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/797893003/280001
Message was sent while issue was closed.
Committed patchset #12 (id:280001)
Message was sent while issue was closed.
Patchset 12 (id:??) landed as https://crrev.com/5990719bcd1403b5a5b45874ce808727c033e7ec Cr-Commit-Position: refs/heads/master@{#311414}
Message was sent while issue was closed.
glider@chromium.org changed reviewers: + glider@chromium.org
Message was sent while issue was closed.
glider@chromium.org changed reviewers: + glider@chromium.org
Message was sent while issue was closed.
https://codereview.chromium.org/797893003/diff/280001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/280001/base/time/time_win.cc#n... base/time/time_win.cc:396: ATOMIC_THREAD_FENCE(memory_order_acquire); There's base/atomicops.h with Acquire_Load() and Release_Store() for this.
Message was sent while issue was closed.
https://codereview.chromium.org/797893003/diff/280001/base/time/time_win.cc File base/time/time_win.cc (right): https://codereview.chromium.org/797893003/diff/280001/base/time/time_win.cc#n... base/time/time_win.cc:396: ATOMIC_THREAD_FENCE(memory_order_acquire); There's base/atomicops.h with Acquire_Load() and Release_Store() for this. |