|
|
Created:
5 years, 2 months ago by ssid Modified:
5 years, 2 months ago CC:
chromium-reviews, darin-cc_chromium.org, jam, tracing+reviews_chromium.org, wfh+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@resident Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionHandle EAGAIN error for the mincore syscall used
EAGAIN error case was not handled for mincore sys call used. This CL
fixes this issue and changes the CountResidentBytes method to return
false on failures.
BUG=542503
Committed: https://crrev.com/999ca54e61026ee8e68bdc4f69cc0f47eb21cf38
Cr-Commit-Position: refs/heads/master@{#354275}
Patch Set 1 #Patch Set 2 : Fixes #Patch Set 3 : Rebase. #
Total comments: 14
Patch Set 4 : Add comment. #Patch Set 5 : Fixes. #Patch Set 6 : Ignore error in release builds. #
Total comments: 6
Patch Set 7 : Fixes. #
Total comments: 6
Patch Set 8 : Return 0 on error. #Messages
Total messages: 40 (8 generated)
ssid@chromium.org changed reviewers: + primiano@chromium.org
PTAL, Thanks.
lgtm https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:67: } while (res == -1 && errno == EAGAIN && error_counter++ < 100); i'd cap this at 3. Typically either this is a temporary issue, or it will last longer.
ssid@chromium.org changed reviewers: + avi@chromium.org
+avi : Please take a look. This is change to reflect the base api change. +reveman, thestig FYI.
reveman@chromium.org changed reviewers: + reveman@chromium.org
https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:65: do { why do we allow retries here but not above? https://codereview.chromium.org/1398163003/diff/40001/content/common/host_dis... File content/common/host_discardable_shared_memory_manager.cc (right): https://codereview.chromium.org/1398163003/diff/40001/content/common/host_dis... content/common/host_discardable_shared_memory_manager.cc:225: NOTREACHED(); please bool rv = base::trace_event::ProcessMemoryDump::CountResidentBytes(... DCHECK(rv); instead. but if we keep having a DCHECK/NOTREACHED here, how is allowing CountResidentBytes to fail fix anything?
https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:65: do { On 2015/10/14 12:31:28, reveman wrote: > why do we allow retries here but not above? I guess because mac's manpage for mincore says it can only return EINVAL or EFAULT
https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:65: do { On 2015/10/14 12:41:33, Primiano Tucci wrote: > On 2015/10/14 12:31:28, reveman wrote: > > why do we allow retries here but not above? > > I guess because mac's manpage for mincore says it can only return EINVAL or > EFAULT yes. https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:67: } while (res == -1 && errno == EAGAIN && error_counter++ < 100); On 2015/10/14 11:27:08, Primiano Tucci wrote: > i'd cap this at 3. Typically either this is a temporary issue, or it will last > longer. Added comment why it is 100. https://codereview.chromium.org/1398163003/diff/40001/content/common/host_dis... File content/common/host_discardable_shared_memory_manager.cc (right): https://codereview.chromium.org/1398163003/diff/40001/content/common/host_dis... content/common/host_discardable_shared_memory_manager.cc:225: NOTREACHED(); On 2015/10/14 12:31:29, reveman wrote: > please > > bool rv = base::trace_event::ProcessMemoryDump::CountResidentBytes(... > DCHECK(rv); > > instead. > > but if we keep having a DCHECK/NOTREACHED here, how is allowing > CountResidentBytes to fail fix anything? In general it should not fail. What I want is in release builds it should show right values. But in debug build it is good to see if this is failing and not hide it in the tests running.
https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:65: do { On 2015/10/14 at 12:41:33, Primiano Tucci wrote: > On 2015/10/14 12:31:28, reveman wrote: > > why do we allow retries here but not above? > > I guess because mac's manpage for mincore says it can only return EINVAL or EFAULT Got it. Maybe add a comment here or above to explain the difference. https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:67: } while (res == -1 && errno == EAGAIN && error_counter++ < 100); On 2015/10/14 at 11:27:08, Primiano Tucci wrote: > i'd cap this at 3. Typically either this is a temporary issue, or it will last longer. is it worth retrying at all? https://codereview.chromium.org/1398163003/diff/40001/content/common/host_dis... File content/common/host_discardable_shared_memory_manager.cc (right): https://codereview.chromium.org/1398163003/diff/40001/content/common/host_dis... content/common/host_discardable_shared_memory_manager.cc:225: NOTREACHED(); On 2015/10/14 at 12:44:29, ssid wrote: > On 2015/10/14 12:31:29, reveman wrote: > > please > > > > bool rv = base::trace_event::ProcessMemoryDump::CountResidentBytes(... > > DCHECK(rv); > > > > instead. > > > > but if we keep having a DCHECK/NOTREACHED here, how is allowing > > CountResidentBytes to fail fix anything? > > In general it should not fail. What I want is in release builds it should show right values. But in debug build it is good to see if this is failing and not hide it in the tests running. ok, sounds good. I don't like when NOTREACHED/DCHECKs are part of the structure of the code. It should be possible to removed the DCHECK without making any other changes and the code should still work and not look weird. Can we do the following instead and dump resident_size=0 when it fails? size_t resident_size = 0; bool rv = base::trace_event::ProcessMemoryDump::CountResidentBytes(... DCHECK(rv); pmd->GetSharedGlobalAllocatorDump(shared_segment_guid) ->AddScalar(...
made changes. https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:65: do { On 2015/10/14 13:03:53, reveman wrote: > On 2015/10/14 at 12:41:33, Primiano Tucci wrote: > > On 2015/10/14 12:31:28, reveman wrote: > > > why do we allow retries here but not above? > > > > I guess because mac's manpage for mincore says it can only return EINVAL or > EFAULT > > Got it. Maybe add a comment here or above to explain the difference. Done. https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:67: } while (res == -1 && errno == EAGAIN && error_counter++ < 100); On 2015/10/14 13:03:53, reveman wrote: > On 2015/10/14 at 11:27:08, Primiano Tucci wrote: > > i'd cap this at 3. Typically either this is a temporary issue, or it will last > longer. > > is it worth retrying at all? That is how eagain is handled in code. Like in case of EINTR. https://codereview.chromium.org/1398163003/diff/40001/content/common/host_dis... File content/common/host_discardable_shared_memory_manager.cc (right): https://codereview.chromium.org/1398163003/diff/40001/content/common/host_dis... content/common/host_discardable_shared_memory_manager.cc:225: NOTREACHED(); On 2015/10/14 13:03:53, reveman wrote: > On 2015/10/14 at 12:44:29, ssid wrote: > > On 2015/10/14 12:31:29, reveman wrote: > > > please > > > > > > bool rv = base::trace_event::ProcessMemoryDump::CountResidentBytes(... > > > DCHECK(rv); > > > > > > instead. > > > > > > but if we keep having a DCHECK/NOTREACHED here, how is allowing > > > CountResidentBytes to fail fix anything? > > > > In general it should not fail. What I want is in release builds it should show > right values. But in debug build it is good to see if this is failing and not > hide it in the tests running. > > ok, sounds good. I don't like when NOTREACHED/DCHECKs are part of the structure > of the code. It should be possible to removed the DCHECK without making any > other changes and the code should still work and not look weird. > > Can we do the following instead and dump resident_size=0 when it fails? > > size_t resident_size = 0; > bool rv = base::trace_event::ProcessMemoryDump::CountResidentBytes(... > DCHECK(rv); > pmd->GetSharedGlobalAllocatorDump(shared_segment_guid) > ->AddScalar(... Done.
https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... base/trace_event/process_memory_dump.cc:67: } while (res == -1 && errno == EAGAIN && error_counter++ < 100); On 2015/10/14 at 13:35:55, ssid wrote: > On 2015/10/14 13:03:53, reveman wrote: > > On 2015/10/14 at 11:27:08, Primiano Tucci wrote: > > > i'd cap this at 3. Typically either this is a temporary issue, or it will last > > longer. > > > > is it worth retrying at all? > > That is how eagain is handled in code. Like in case of EINTR. Most of the EAGAIN code I see in the chromium code base doesn't retry. Debug builds retry forever in the case of EINTR and release builds will ignore the failure and pretend it worked fine after a number of attempts. Can we do the same here? That would mean that this function doesn't need a bool return value after all. I think one of these two options should be used: 1. CountResidentBytes can fail and we don't bother retrying inside this function when receiving EAGAIN. The caller can always retry if that makes sense. 2. CountResidentBytes always succeeds and we retry inside this function when receiving EAGAIN. I prefer option 2) as that makes it easier for users of this function but I'm not familiar with the reason EAGAIN would be returned to tell if this would be OK.
On 2015/10/14 14:17:11, reveman wrote: > https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... > File base/trace_event/process_memory_dump.cc (right): > > https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... > base/trace_event/process_memory_dump.cc:67: } while (res == -1 && errno == > EAGAIN && error_counter++ < 100); > On 2015/10/14 at 13:35:55, ssid wrote: > > On 2015/10/14 13:03:53, reveman wrote: > > > On 2015/10/14 at 11:27:08, Primiano Tucci wrote: > > > > i'd cap this at 3. Typically either this is a temporary issue, or it will > last > > > longer. > > > > > > is it worth retrying at all? > > > > That is how eagain is handled in code. Like in case of EINTR. > > Most of the EAGAIN code I see in the chromium code base doesn't retry. Debug > builds retry forever in the case of EINTR and release builds will ignore the > failure and pretend it worked fine after a number of attempts. Can we do the > same here? That would mean that this function doesn't need a bool return value > after all. > > I think one of these two options should be used: > > 1. CountResidentBytes can fail and we don't bother retrying inside this function > when receiving EAGAIN. The caller can always retry if that makes sense. > 2. CountResidentBytes always succeeds and we retry inside this function when > receiving EAGAIN. > > I prefer option 2) as that makes it easier for users of this function but I'm > not familiar with the reason EAGAIN would be returned to tell if this would be > OK. The mincore call can also fail because of other error. So for sure this function needs a return value. whether to handle eagain or not I leave it to the base owners. I was told by thestig and primiano to handle errors properly. That is the reason for this change. primiano@ WDYT?
On 2015/10/14 14:17:11, reveman wrote: > https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... > File base/trace_event/process_memory_dump.cc (right): > > https://codereview.chromium.org/1398163003/diff/40001/base/trace_event/proces... > base/trace_event/process_memory_dump.cc:67: } while (res == -1 && errno == > EAGAIN && error_counter++ < 100); > On 2015/10/14 at 13:35:55, ssid wrote: > > On 2015/10/14 13:03:53, reveman wrote: > > > On 2015/10/14 at 11:27:08, Primiano Tucci wrote: > > > > i'd cap this at 3. Typically either this is a temporary issue, or it will > last > > > longer. > > > > > > is it worth retrying at all? > > > > That is how eagain is handled in code. Like in case of EINTR. > > Most of the EAGAIN code I see in the chromium code base doesn't retry. Debug > builds retry forever in the case of EINTR and release builds will ignore the > failure and pretend it worked fine after a number of attempts. Can we do the > same here? That would mean that this function doesn't need a bool return value > after all. > > I think one of these two options should be used: > > 1. CountResidentBytes can fail and we don't bother retrying inside this function > when receiving EAGAIN. The caller can always retry if that makes sense. > 2. CountResidentBytes always succeeds and we retry inside this function when > receiving EAGAIN. > > I prefer option 2) as that makes it easier for users of this function but I'm > not familiar with the reason EAGAIN would be returned to tell if this would be > OK. The mincore call can also fail because of other error. So for sure this function needs a return value. whether to handle eagain or not I leave it to the base owners. I was told by thestig and primiano to handle errors properly. That is the reason for this change. primiano@ WDYT?
On 2015/10/14 14:37:20, ssid wrote: > The mincore call can also fail because of other error. So for sure this function > needs a return value. > whether to handle eagain or not I leave it to the base owners. I was told by > thestig and primiano to handle errors properly. That is the reason for this > change. > primiano@ WDYT? I'm in the handle EAGAIN camp. Though I imagine getting EAGAIN will be rare. Also: typo in the commit msg: "flase"
On 2015/10/14 at 16:35:47, thestig wrote: > On 2015/10/14 14:37:20, ssid wrote: > > The mincore call can also fail because of other error. So for sure this function > > needs a return value. > > whether to handle eagain or not I leave it to the base owners. I was told by > > thestig and primiano to handle errors properly. That is the reason for this > > change. > > primiano@ WDYT? > > I'm in the handle EAGAIN camp. Though I imagine getting EAGAIN will be rare. Ok, great! Handling this like EINTR and have the function always succeed sgtm.
On 2015/10/14 16:42:31, reveman wrote: > On 2015/10/14 at 16:35:47, thestig wrote: > > On 2015/10/14 14:37:20, ssid wrote: > > > The mincore call can also fail because of other error. So for sure this > function > > > needs a return value. > > > whether to handle eagain or not I leave it to the base owners. I was told by > > > thestig and primiano to handle errors properly. That is the reason for this > > > change. > > > primiano@ WDYT? > > > > I'm in the handle EAGAIN camp. Though I imagine getting EAGAIN will be rare. > > Ok, great! Handling this like EINTR and have the function always succeed sgtm. The function still can fail with different errors like ENOMEM. Should I DCHECK for these? which would mean in release builds we return a wrong value. Or if any of mincore call fails, we return 0, which still seems wrong since the client can assume that 0 bytes are resident and we are doing measuring it right.
On 2015/10/14 at 16:47:41, ssid wrote: > On 2015/10/14 16:42:31, reveman wrote: > > On 2015/10/14 at 16:35:47, thestig wrote: > > > On 2015/10/14 14:37:20, ssid wrote: > > > > The mincore call can also fail because of other error. So for sure this > > function > > > > needs a return value. > > > > whether to handle eagain or not I leave it to the base owners. I was told by > > > > thestig and primiano to handle errors properly. That is the reason for this > > > > change. > > > > primiano@ WDYT? > > > > > > I'm in the handle EAGAIN camp. Though I imagine getting EAGAIN will be rare. > > > > Ok, great! Handling this like EINTR and have the function always succeed sgtm. > > The function still can fail with different errors like ENOMEM. Should I DCHECK for these? which would mean in release builds we return a wrong value. Or if any of mincore call fails, we return 0, which still seems wrong since the client can assume that 0 bytes are resident and we are doing measuring it right. DHCECK and return 0 sgtm.
On 2015/10/14 17:12:30, reveman wrote: > DHCECK and return 0 sgtm. Hmm how do we distinguish that from the (rare) case in which mincore should happen? What happens if in some future version of android somebody accidentally (or not) selinux-es mincore? Are we going to just silently ignore that? Not too excited by the fact that we cannot tell the difference between "something wrong happened" or "there is nothing resident here".
On 2015/10/14 at 17:25:26, primiano wrote: > On 2015/10/14 17:12:30, reveman wrote: > > DHCECK and return 0 sgtm. > Hmm how do we distinguish that from the (rare) case in which mincore should happen? What happens if in some future version of android somebody accidentally (or not) selinux-es mincore? Are we going to just silently ignore that? > Not too excited by the fact that we cannot tell the difference between "something wrong happened" or "there is nothing resident here". We'd crash in debug builds and ignore the error in release. Some log output in release builds might be good.
On 2015/10/14 17:52:10, reveman wrote: > On 2015/10/14 at 17:25:26, primiano wrote: > > On 2015/10/14 17:12:30, reveman wrote: > > > DHCECK and return 0 sgtm. > > Hmm how do we distinguish that from the (rare) case in which mincore should > happen? What happens if in some future version of android somebody accidentally > (or not) selinux-es mincore? Are we going to just silently ignore that? > > Not too excited by the fact that we cannot tell the difference between > "something wrong happened" or "there is nothing resident here". > > We'd crash in debug builds and ignore the error in release. Some log output in > release builds might be good. hmm ok, sgtm with a LOG(ERROR) or similar.
ssid@chromium.org changed reviewers: - avi@chromium.org
Made changes PTAL thanks. -avi
https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:59: << "Mincore call failed. The resident size may not be accurate"; nit: s/Mincore/mincore()/ also, if you continue here it will not be non-accurate, it will be 0. https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:76: << "Mincore call failed. The resident size may not be accurate"; can we avoid duplicating this and above?
https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:59: << "Mincore call failed. The resident size may not be accurate"; On 2015/10/15 11:20:18, Primiano Tucci wrote: > nit: s/Mincore/mincore()/ > also, if you continue here it will not be non-accurate, it will be 0. The call to mincore failed only once. The next set of pages could pass and give some value. So, mentioned not accurate. https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:76: << "Mincore call failed. The resident size may not be accurate"; On 2015/10/15 11:20:18, Primiano Tucci wrote: > can we avoid duplicating this and above? I would have to add 3 extra #if defined. that is why duplicating.
https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:76: << "Mincore call failed. The resident size may not be accurate"; On 2015/10/15 11:23:43, ssid wrote: > On 2015/10/15 11:20:18, Primiano Tucci wrote: > > can we avoid duplicating this and above? > > I would have to add 3 extra #if defined. that is why duplicating. ?? how about you just add this to before line 87: DCHECK_EQ(0, res); if (res) { LOG(ERROR)<< "mincore() call failed. The resident size may not be accurate"; total_resident_size = 0; }
Patchset #7 (id:120001) has been deleted
made changes. https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/100001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:76: << "Mincore call failed. The resident size may not be accurate"; On 2015/10/15 11:27:13, Primiano Tucci wrote: > On 2015/10/15 11:23:43, ssid wrote: > > On 2015/10/15 11:20:18, Primiano Tucci wrote: > > > can we avoid duplicating this and above? > > > > I would have to add 3 extra #if defined. that is why duplicating. > > ?? > how about you just add this to before line 87: > > DCHECK_EQ(0, res); > if (res) { > LOG(ERROR)<< "mincore() call failed. The resident size may not be accurate"; > total_resident_size = 0; > } Yes, sounds good.
https://codereview.chromium.org/1398163003/diff/140001/base/trace_event/proce... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/140001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:58: continue; I wouldn't bother giving partials results really. If mincore fails at that point is better to just return a 0, which means that you can just assign result = mincore, and break; instead of continuing here. https://codereview.chromium.org/1398163003/diff/140001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:68: res = mincore(chunk_start, chunk_size, vector_as_array(&vec)); similarly, just use result here, and break below https://codereview.chromium.org/1398163003/diff/140001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:85: << "mincore() call failed. The resident size may not be accurate"; and here total_resident_size = 0 if result
done. https://codereview.chromium.org/1398163003/diff/140001/base/trace_event/proce... File base/trace_event/process_memory_dump.cc (right): https://codereview.chromium.org/1398163003/diff/140001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:58: continue; On 2015/10/15 11:44:07, Primiano Tucci wrote: > I wouldn't bother giving partials results really. If mincore fails at that point > is better to just return a 0, which means that you can just assign result = > mincore, and break; instead of continuing here. Done. https://codereview.chromium.org/1398163003/diff/140001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:68: res = mincore(chunk_start, chunk_size, vector_as_array(&vec)); On 2015/10/15 11:44:06, Primiano Tucci wrote: > similarly, just use result here, and break below Done. https://codereview.chromium.org/1398163003/diff/140001/base/trace_event/proce... base/trace_event/process_memory_dump.cc:85: << "mincore() call failed. The resident size may not be accurate"; On 2015/10/15 11:44:07, Primiano Tucci wrote: > and here total_resident_size = 0 if result Done.
lgtm
lgtm
The CQ bit was checked by ssid@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1398163003/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1398163003/160001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by ssid@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1398163003/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1398163003/160001
Message was sent while issue was closed.
Committed patchset #8 (id:160001)
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/999ca54e61026ee8e68bdc4f69cc0f47eb21cf38 Cr-Commit-Position: refs/heads/master@{#354275} |