|
|
Created:
11 years, 6 months ago by Craig Schlenter Modified:
9 years, 7 months ago CC:
chromium-reviews_googlegroups.com Visibility:
Public. |
DescriptionUse res_ninit for thread safety and a timer for reloading resolv.conf on Linux.
We will only reload resolv.conf once per second per thread when DNS lookups
fail. This should match the behaviour of mozilla.
[retry of r17530, passes Valgrind now]
BUG=12740
Patch Set 1 #Patch Set 2 : restart timer when expired #Patch Set 3 : use singleton and thread_local_storage #
Total comments: 4
Patch Set 4 : style fixes #
Total comments: 1
Patch Set 5 : add some prose #Patch Set 6 : call res_nclose as per Lei Zhang #Messages
Total messages: 19 (0 generated)
+dns expert
Since this data is stored per-thread, I think it would make more sense to use thread-local storage instead of a lock+map. See base/thread_local_storage.h.
On 2009/05/31 17:25:57, Evan Martin wrote: > Since this data is stored per-thread, I think it would make more sense to use > thread-local storage instead of a lock+map. See base/thread_local_storage.h. Ah, useful, thanks ... I didn't know about that. I had tried some stuff with glibc's __thread initially but that is good only for POD. Will poke at thread_local_storage tomorrow...
res_ninit piece looks ok to me (I had to read more about it). Few nits: - don't forget about DISALLOW_COPY_AND_ASSIGN - consider using Singleton instead of static (base/singleton.h) Also, I would consider a simple unit test for the timer on Linux, even just to ensure it doesn't leak memory. Unit tests are very frequently leak-tested (while this added logic may otherwise not be).
Thanks for the review comments! I have not forgotten about this and will try to get to sorting this out over the next few days ... sorry for the delay.
I have uploaded a new patch that uses singleton and thread_local_storage. I haven't created a unit test for it yet (still need to read up on gtest a bit) but if anyone is bored, I'd appreciate comments on how it looks so far please. Should I split the class off into a separate file? Naming suggestions? Thank you!
The structure of this looks good to me, I have only style nits. http://codereview.chromium.org/118061/diff/7/1005 File net/base/host_resolver.cc (right): http://codereview.chromium.org/118061/diff/7/1005#newcode66 Line 66: base::TimeTicks* timer_ptr = static_cast<base::TimeTicks*>(tls_index_.Get()); over the 80 column limit in a few places like here (try looking at the diff in rietveld) http://codereview.chromium.org/118061/diff/7/1005#newcode69 Line 69: base::TimeTicks *t = new base::TimeTicks(); prefer descriptive variable names you could probably just use timer_ptr here, since that is what you're filling in in htis code. http://codereview.chromium.org/118061/diff/7/1005#newcode85 Line 85: delete(tls_data); no parens around delete argument http://codereview.chromium.org/118061/diff/7/1005#newcode89 Line 89: // We use thread local store to identify which base::TimeTicks to interact with. s/store/storage/
On 2009/06/03 18:00:43, Evan Martin wrote: > The structure of this looks good to me, I have only style nits. Should hopefully all be fixed now. Thanks! > http://codereview.chromium.org/118061/diff/7/1005 > File net/base/host_resolver.cc (right): > > http://codereview.chromium.org/118061/diff/7/1005#newcode66 > Line 66: base::TimeTicks* timer_ptr = > static_cast<base::TimeTicks*>(tls_index_.Get()); > over the 80 column limit in a few places like here > (try looking at the diff in rietveld) Done. > http://codereview.chromium.org/118061/diff/7/1005#newcode69 > Line 69: base::TimeTicks *t = new base::TimeTicks(); > prefer descriptive variable names > you could probably just use timer_ptr here, since that is what you're filling in > in htis code. Done. > http://codereview.chromium.org/118061/diff/7/1005#newcode85 > Line 85: delete(tls_data); > no parens around delete argument Done. > http://codereview.chromium.org/118061/diff/7/1005#newcode89 > Line 89: // We use thread local store to identify which base::TimeTicks to > interact with. > s/store/storage/ Done.
this LGTM, just one more comment request http://codereview.chromium.org/118061/diff/1007/11 File net/base/host_resolver.cc (right): http://codereview.chromium.org/118061/diff/1007/11#newcode49 Line 49: #if defined(OS_LINUX) Would you mind expanding this comment to a description of what the problem is and our fix? Then your comment above the class can just be describing what the class does -- that is, it's a per-thread timer used to rate-limit calling res_ninit(). And more exposition can go in the above descriptive comment.
> http://codereview.chromium.org/118061/diff/1007/11#newcode49 > Line 49: #if defined(OS_LINUX) > Would you mind expanding this comment to a description of what the problem is > and our fix? I've added some comments. Some of it is arguably speculation since l.colitti still has issues with the normal res_init call stuff but hopefully the thread-aware res_ninit will sort that out. Thank you. > Then your comment above the class can just be describing what the class does -- > that is, it's a per-thread timer used to rate-limit calling res_ninit(). And > more exposition can go in the above descriptive comment.
LGTM, will commit
Hey Craig, the valgrind bot says we're leaking memory now. Can you take a look? http://build.chromium.org/buildbot/waterfall.fyi/builders/Chromium%20Linux%20...
On 2009/06/03 23:53:01, Lei Zhang wrote: > Hey Craig, the valgrind bot says we're leaking memory now. Can you take a look? Sorry! Will poke at it in an hour or so when I'm near a computer again...I had some printf's in it originally and they seemed to match ... Hmmm. > http://build.chromium.org/buildbot/waterfall.fyi/builders/Chromium%20Linux%20...
On 2009/06/04 03:41:19, Craig Schlenter wrote: > On 2009/06/03 23:53:01, Lei Zhang wrote: > > Hey Craig, the valgrind bot says we're leaking memory now. Can you take a > look? > > Sorry! Will poke at it in an hour or so when I'm near a computer again...I had > some printf's in it originally and they seemed to match ... Hmmm. Argghhh! It's the call to res_ninit that's creating the problem. When I change back to res_init, it's fine. The command I'm running is: valgrind --smc-check=all --leak-check=full --suppressions=tools/valgrind/suppressions.txt sconsbuild/Release/unit_tests --gtest_filter='DnsMasterTest.*' I see there is an undocumented (well it's not in the man page) res_close call that might be useful in this situation but I'll have to skill up on how the worker pool stuff works because I assume that res_close should only be called when the worker thread terminates. I'll also have to sift through some glibc code etc. etc. Do you want to revert in the meantime? I'm running on ZA time (==GMT+2) and will probably only get to poking at this stuff properly in about 10 hours from now. Thank you! > http://build.chromium.org/buildbot/waterfall.fyi/builders/Chromium%20Linux%20...
On 2009/06/04 05:26:04, Craig Schlenter wrote: > On 2009/06/04 03:41:19, Craig Schlenter wrote: > > On 2009/06/03 23:53:01, Lei Zhang wrote: > > > Hey Craig, the valgrind bot says we're leaking memory now. Can you take a > > look? > > > > Sorry! Will poke at it in an hour or so when I'm near a computer again...I had > > some printf's in it originally and they seemed to match ... Hmmm. > > Argghhh! It's the call to res_ninit that's creating the problem. When I change > back to res_init, it's fine. The command I'm running is: > valgrind --smc-check=all --leak-check=full > --suppressions=tools/valgrind/suppressions.txt sconsbuild/Release/unit_tests > --gtest_filter='DnsMasterTest.*' > > I see there is an undocumented (well it's not in the man page) res_close call > that might be useful in this situation but I'll have to skill up on how the > worker pool stuff works because I assume that res_close should only be called > when the worker thread terminates. I'll also have to sift through some glibc > code etc. etc. > > Do you want to revert in the meantime? I'm running on ZA time (==GMT+2) and will > probably only get to poking at this stuff properly in about 10 hours from now. > > Thank you! > > > > http://build.chromium.org/buildbot/waterfall.fyi/builders/Chromium%20Linux%20... I'm reverting this then. Thanks for looking at it.
> I see there is an undocumented (well it's not in the man page) res_close call > that might be useful in this situation but I'll have to skill up on how the > worker pool stuff works because I assume that res_close should only be called > when the worker thread terminates. Had a quick scan through the glibc code ... it's res_nclose that will do the trick I think. Clever suggestions are welcome as to where to call it :)
On 2009/06/04 05:37:10, Craig Schlenter wrote: > > I see there is an undocumented (well it's not in the man page) res_close call > > that might be useful in this situation but I'll have to skill up on how the > > worker pool stuff works because I assume that res_close should only be called > > when the worker thread terminates. > > Had a quick scan through the glibc code ... it's res_nclose that will do the > trick I think. Clever suggestions are welcome as to where to call it :) Does this work? (I'm not a dns expert and I could be wrong) if (err && dns_timer->Expired()) { res_nclose(&_res); if (!res_ninit(&_res)) err = getaddrinfo(host.c_str(), port.c_str(), &hints, out); } Sleep is good. Oh look at the time! Zzzzz.
On 2009/06/04 06:51:13, Lei Zhang wrote: > Does this work? (I'm not a dns expert and I could be wrong) > > if (err && dns_timer->Expired()) { > res_nclose(&_res); > if (!res_ninit(&_res)) > err = getaddrinfo(host.c_str(), port.c_str(), &hints, out); > } > > Sleep is good. Oh look at the time! Zzzzz. Thanks! I had thought of doing this (and I tried it now and it appears to work) but calling close before init is potentially bad if init hasn't been called first I think. In glibc res_close looks like this: void res_close(void) { #ifdef _LIBC /* * Some stupid programs out there call res_close() before res_init(). * Since _res._vcsock isn't explicitly initialized, these means that * we could do a close(0), which might lead to some security problems. * Therefore we check if res_init() was called before by looking at * the RES_INIT bit in _res.options. If it hasn't been set we bail out * early. */ if ((_res.options & RES_INIT) == 0) return; #endif /* We don't free the name server addresses because we never did it and it would be done implicitly on shutdown. */ __res_iclose(&_res, false); } but res_nclose looks like this: void res_nclose(res_state statp) { __res_iclose (statp, true); } i.e. it doesn't implement a similar check and perhaps it should. An easy fix might be poking at the res structure and doing the check ourselves before calling res_nclose ... Alternately if we call res_ninit when the thread is started and res_nclose when the thread terminates, that should also work.
On 2009/06/04 07:35:17, Craig Schlenter wrote: > On 2009/06/04 06:51:13, Lei Zhang wrote: > > Does this work? (I'm not a dns expert and I could be wrong) > > > > if (err && dns_timer->Expired()) { > > res_nclose(&_res); > > if (!res_ninit(&_res)) > > err = getaddrinfo(host.c_str(), port.c_str(), &hints, out); > > } > > > > Sleep is good. Oh look at the time! Zzzzz. > > Thanks! I had thought of doing this (and I tried it now and it appears to work) > but calling close before init is potentially bad if init hasn't been called > first I think.[snip] I think the reason this is working is because the first call to any of the resolver functions will end up initialising things through an internal glibc function called __res_maybe_init ... since we have just done a failed dns lookup before we call res_nclose, it should be safe to call res_nclose unconditionally. Patch updated. Passes valgrind on my machine. Thank you! |