|
|
DescriptionTabManager: Take min_free_kbytes into consideration when killing tabs.
On ChromeOS, kernel notifies Chrome when system gets into low memory
condition. However the low memory calculation misses a parameter
|min_free_kbytes| so the notification comes too late. A kernel fix is
coming to fix the issue.
On Chrome side, TabManager tries to recover memory to normal level. It
mirrors the low memory calculation in kernel to know how many tabs to
kill. So the missing parameter |min_free_kbytes| should be added back.
In the future, try to consolidate the logic in kernel so TabManager
doesn't need to do its own calculation.
BUG=chromium:645512, chrome-os-partner:60498
TEST=cave,peppy,etc.
Review-Url: https://codereview.chromium.org/2618483002
Cr-Commit-Position: refs/heads/master@{#442018}
Committed: https://chromium.googlesource.com/chromium/src/+/38d3bfaf0e8fc3a763b2053b1fdb15a30ec28b15
Patch Set 1 #Patch Set 2 : TabManager: Take min_free_kbytes into consideration when killing tabs. #
Total comments: 2
Messages
Total messages: 22 (7 generated)
cylee@chromium.org changed reviewers: + georgesak@chromium.org, semenzato@chromium.org
Description was changed from ========== TabManager: Take min_free_kbytes into consideration when killing tabs. On ChromeOS, kernel notifies Chrome when system gets into low memory condition. However the low memory calculation misses a parameter |min_free_kbytes| so the notification comes too late. A kernel fix is coming to fix the issue. On Chrome side, TabManager tries to recover memory to normal level. It mirrors the low memory calculation in kernel to know how many tabs to kill. So the missing parameter |min_free_kbytes| should be added back. In the future, try to consolidate the logic in kernel so TabManager doesn't need to do its own calculation. BUG=chromium:645512,chromium:60498 TEST=cave,peppy,etc. ========== to ========== TabManager: Take min_free_kbytes into consideration when killing tabs. On ChromeOS, kernel notifies Chrome when system gets into low memory condition. However the low memory calculation misses a parameter |min_free_kbytes| so the notification comes too late. A kernel fix is coming to fix the issue. On Chrome side, TabManager tries to recover memory to normal level. It mirrors the low memory calculation in kernel to know how many tabs to kill. So the missing parameter |min_free_kbytes| should be added back. In the future, try to consolidate the logic in kernel so TabManager doesn't need to do its own calculation. BUG=chromium:645512,chrome-os-partner:60498 TEST=cave,peppy,etc. ==========
semenzato@google.com changed reviewers: + semenzato@google.com
It's fine if you wish to do it this way. But, as I may have mentioned, the low-memory thresholds are based on heuristics. Adding min_free_kbytes seems logical, but if you add a logical number to a non-logical number, the result is still non-logical. I would just change the threshold in /etc/init/swap.conf, instead of complicating this code. That threshold is already dependent on RAM size. You could make it more board-dependent if your experiments show a benefit from that.
On 2017/01/04 22:57:36, semenzato wrote: > It's fine if you wish to do it this way. But, as I may have mentioned, the > low-memory thresholds are based on heuristics. Adding min_free_kbytes seems > logical, but if you add a logical number to a non-logical number, the result is > still non-logical. > > I would just change the threshold in /etc/init/swap.conf, instead of > complicating this code. That threshold is already dependent on RAM size. You > could make it more board-dependent if your experiments show a benefit from that. Take a look at https://wmatrix.googleplex.com/failures/unfiltered?suites=bvt-perbuild&tests=... There're lots of failures on lots of board everyday . I believe most of them is caused by this problem - low memory signal is raised too late, or the killing speed so too slow. It means low memory handling is actually failing on lots of board for a long time. Tab discarder doesn't work well. OK the result is not logical as well - I accept. But I really don't think adding this value would cause anything disaster. On the other hand, it has a chance to save "tab discarder not work" problem on lots of devices. I see more and more reports complaining seeing sad tabs when memory is low - and they're caused by OOM happens too early. That exactly how the 2 CLs (this one and the kernel patch) handles. Because we need at least a kernel change and a chrome side CL, I don't know how to trybot it before submitting. Please let me try submit the patches and see if it helps the autotest , OK? We can always revert the changes short after if it doesn't help or causes some other problem. It's not a problem on one or two board only, it's a problem on at least a dozen of boards. So even this a heuristic but not logical anyway, I need to apply them to many board at once. Or it's my misunderstanding. Do you mean I should add min_free_kbytes to the low memory margin in swap.conf to achieve the same effect?
On 2017/01/04 23:50:22, cylee1 wrote: > Or it's my misunderstanding. Do you mean I should add min_free_kbytes to the low > memory margin in swap.conf to achieve the same effect? Yes, that's what I meant. You can achieve the same effect by tweaking the multiplier applied to total RAM. There is also a crude form of override of the calculation (works by installing an additional file in the overlay), but that's only for a fixed amount of RAM. If certain devices have worse behavior than others, you may want to tweak the multiplier differently. (Assuming you even want to keep the multiplier---as I said, that worked fine when the number of boards was very small.) Basically I don't think that one size fits all---not even one multiplier. If some devices work well, you may want to leave them alone. Otherwise you'll be unnecessarily reducing the number of tabs they can load. The UMA stats may be enough to decide which devices need tweaks and which ones can be left alone. (Or perhaps even reduce the margin---for instance on those 8 or 16 GB chromebooks.)
sonnyrao@google.com changed reviewers: + sonnyrao@google.com
+1 Seems okay to me -- though I thought the kernel was still sending a signal to chrome when It needed to free memory, I guess that only happens when things are really bad? I kind of dislike having to maintain the same logic in two places but I guess that's where we are at right now https://codereview.chromium.org/2618483002/diff/20001/chrome/browser/memory/t... File chrome/browser/memory/tab_manager_delegate_chromeos.cc (right): https://codereview.chromium.org/2618483002/diff/20001/chrome/browser/memory/t... chrome/browser/memory/tab_manager_delegate_chromeos.cc:267: // eliminate the duplication here. that's an interesting idea -- does Chrome have the ability to free up arbitrary amounts of memory or is it just going to discard tabs until it reaches that number?
On 2017/01/05 00:24:26, sonnyrao wrote: > +1 Seems okay to me -- though I thought the kernel was still sending a signal to > chrome when It needed to free memory, I guess that only happens when things are > really bad? I kind of dislike having to maintain the same logic in two places > but I guess that's where we are at right now > > https://codereview.chromium.org/2618483002/diff/20001/chrome/browser/memory/t... > File chrome/browser/memory/tab_manager_delegate_chromeos.cc (right): > > https://codereview.chromium.org/2618483002/diff/20001/chrome/browser/memory/t... > chrome/browser/memory/tab_manager_delegate_chromeos.cc:267: // eliminate the > duplication here. > that's an interesting idea -- does Chrome have the ability to free up arbitrary > amounts of memory or is it just going to discard tabs until it reaches that > number? I think the duplicated logic is relatively simple. Since it has been there and Cheng-Yu's CL is just to make it more correct, maintaining the coherence between kernel and chrome won't be that hard. And if possible, gradually moving from heuristic-based to an actual formula will improve the robustness of the low-memory killer, isn't it?
On 2017/01/05 05:12:36, bccheng wrote: > On 2017/01/05 00:24:26, sonnyrao wrote: > > +1 Seems okay to me -- though I thought the kernel was still sending a signal > to > > chrome when It needed to free memory, I guess that only happens when things > are > > really bad? I kind of dislike having to maintain the same logic in two places > > but I guess that's where we are at right now > > > > > https://codereview.chromium.org/2618483002/diff/20001/chrome/browser/memory/t... > > File chrome/browser/memory/tab_manager_delegate_chromeos.cc (right): > > > > > https://codereview.chromium.org/2618483002/diff/20001/chrome/browser/memory/t... > > chrome/browser/memory/tab_manager_delegate_chromeos.cc:267: // eliminate the > > duplication here. > > that's an interesting idea -- does Chrome have the ability to free up > arbitrary > > amounts of memory or is it just going to discard tabs until it reaches that > > number? > > I think the duplicated logic is relatively simple. Since it has been there and > Cheng-Yu's CL is just to make it more correct, maintaining the coherence between > kernel and chrome won't be that hard. And if possible, gradually moving from > heuristic-based to an actual formula will improve the robustness of the > low-memory killer, isn't it? As I said, it's fine to do it this way. I am sure it will improve things, and I thank you for working on this. My comment stands though. There's no way that you can "gradually move from heuristics to an actual formula". The entire page allocator is based on heuristics. A possible long-term approach would be a system that "learns" (NOT in the ML sense) what the best threshold would be, starting from UMA-collected data, and adapting to a specific use pattern. Basically see how many OOM vs. tab discards occur during usage, and adjust the threshold to reduce OOMs. Such system should also take into account speed of allocation. When the speed is high, more slack is needed to give the browser time to release enough RAM. However, this is very tentative. If allocation speed is highly variable, this idea won't help.
I respect to your comment, but I tend to look it in a different way: Formula does its best to capture variables explicitly written in the code. Heuristic handles the rest which could not be captured easily. Some parameters are real things, but not heuristic. For example, in low-mem-notify.h there are still some formulas, right ? Like unsigned long available_file_mem = file_mem - dirty_mem - min_file_mem; unsigned long available_mem = free_mem + available_file_mem; You still use free_mem, file_mem, dirty_mem, and min_file_mem to calculate a rough number. Because you know they really affects page allocators. For example, dirty pages just couldn't be swapped easily. It's just a fact we know by code. I don't get the reason why min_file_mem should be listed in the formula, but min_free_kbytes should not ? From the code we just know both of them couldn't be used in normal page allocation path. The low memory margin, however, is still needed because there're some other factors we can not capture in the formula easily. For example, the speed of memory consumption, zone selection, page order, etc. After all we must raise the notification earlier before OOM really happens. How early? There's no way we can get a reasonable fixed number for all devices. So we'd better use a per-board override to control it. All the advanced way you mentioned above could be used to fine-tune the margin here. In another words, I think the formula reflects what applies to all boards (because kernel code says so !). While heuristic applies to what might vary across boards (so we adjust them separately). If you think we shouldn't leave as much memory as possible, I think the right solution should be: Add min_free_kbytes in kernel, then lower the margin for some (or all) boards. In yet another words, the variable "available_mem" in low-mem-notify.h is just not real available mem. It's misleading. It works on some boards now because the low mem margin covers the error. We should let things go back to where it should be. Since we have an autotest now, tuning the margin is easier if there's only one CL involved (the multiplier). Moreover, min_free_kbytes is just ~100M on all machine I've tested. I don't think pulling out 100M of memory would have any perceivable impact to an end user. (roughly just one less tab). On the other hand, an OOM would result in a very visible negative impact - for example, sad tab or system hang or crash ... ... And we gets lots of bug report because of that :-)
https://codereview.chromium.org/2618483002/diff/20001/chrome/browser/memory/t... File chrome/browser/memory/tab_manager_delegate_chromeos.cc (right): https://codereview.chromium.org/2618483002/diff/20001/chrome/browser/memory/t... chrome/browser/memory/tab_manager_delegate_chromeos.cc:267: // eliminate the duplication here. On 2017/01/05 00:24:26, sonnyrao wrote: > that's an interesting idea -- does Chrome have the ability to free up arbitrary > amounts of memory or is it just going to discard tabs until it reaches that > number? Of course it just tries to discard tabs until the estimated RSS to be released reaches the number :) And yes, I think I should consolidate the logic in kernel.
On 2017/01/05 20:26:42, cylee1 wrote: > I respect to your comment, but I tend to look it in a different way: > > Formula does its best to capture variables explicitly written in the code. > Heuristic handles the rest which could not be captured easily. > > Some parameters are real things, but not heuristic. > For example, in low-mem-notify.h there are still some formulas, right ? Like > unsigned long available_file_mem = file_mem - dirty_mem - min_file_mem; > unsigned long available_mem = free_mem + available_file_mem; > You still use free_mem, file_mem, dirty_mem, and min_file_mem to calculate a > rough number. Because you know they really > affects page allocators. For example, dirty pages just couldn't be swapped > easily. It's just a fact we know by code. > I don't get the reason why min_file_mem should be listed in the formula, but > min_free_kbytes should not ? OK, I see your point. I could argue that most of those variables change continuously except min_file_mem, which however might be changed by us. Min_free_kbytes is less likely to change, but it could too. It may be better to indicate that the formula in the kernel is different. > From the code we just know both of them couldn't be used in normal page > allocation path. > > The low memory margin, however, is still needed because there're some other > factors we can not capture in the formula easily. > For example, the speed of memory consumption, zone selection, page order, etc. > After all we must raise the notification earlier before OOM really happens. > How early? There's no way we can get a reasonable fixed number for all devices. > So we'd better use a per-board override to control it. > All the advanced way you mentioned above could be used to fine-tune the margin > here. > > In another words, I think the formula reflects what applies to all boards > (because kernel code says so !). > While heuristic applies to what might vary across boards (so we adjust them > separately). > > If you think we shouldn't leave as much memory as possible, I think the right > solution should be: > Add min_free_kbytes in kernel, then lower the margin for some (or all) boards. > In yet another words, the variable "available_mem" in low-mem-notify.h is just > not real available mem. It's misleading. > It works on some boards now because the low mem margin covers the error. We > should let things go back to where it should be. > Since we have an autotest now, tuning the margin is easier if there's only one > CL involved (the multiplier). > > Moreover, min_free_kbytes is just ~100M on all machine I've tested. > I don't think pulling out 100M of memory would have any perceivable impact to an > end user. (roughly just one less tab). You're right, only a few users will notice it: those who have a workflow that used to work, and will fail after this change. > On the other hand, an OOM would result in a very visible negative impact - for > example, sad tab or system hang or crash ... > ... And we gets lots of bug report because of that :-) You don't have to convince me that this is useful :)
+1
lgtm
Hi Luigi, Thanks ! I kind of got your point, but I think my implementation makes sense too. > > OK, I see your point. I could argue that most of those variables change > continuously except min_file_mem, which however might be changed by us. > Min_free_kbytes is less likely to change, but it could too. > > It may be better to indicate that the formula in the kernel is different. > > > From the code we just know both of them couldn't be used in normal page > > allocation path. > > > > The low memory margin, however, is still needed because there're some other > > factors we can not capture in the formula easily. > > For example, the speed of memory consumption, zone selection, page order, etc. > > After all we must raise the notification earlier before OOM really happens. > > How early? There's no way we can get a reasonable fixed number for all > devices. > > So we'd better use a per-board override to control it. > > All the advanced way you mentioned above could be used to fine-tune the margin > > here. > > > > In another words, I think the formula reflects what applies to all boards > > (because kernel code says so !). > > While heuristic applies to what might vary across boards (so we adjust them > > separately). > > > > If you think we shouldn't leave as much memory as possible, I think the right > > solution should be: > > Add min_free_kbytes in kernel, then lower the margin for some (or all) > boards. > > In yet another words, the variable "available_mem" in low-mem-notify.h is just > > not real available mem. It's misleading. > > It works on some boards now because the low mem margin covers the error. We > > should let things go back to where it should be. > > Since we have an autotest now, tuning the margin is easier if there's only one > > CL involved (the multiplier). > > > > Moreover, min_free_kbytes is just ~100M on all machine I've tested. > > I don't think pulling out 100M of memory would have any perceivable impact to > an > > end user. (roughly just one less tab). > > You're right, only a few users will notice it: those who have a workflow that > used to work, and will fail after this change. > > > On the other hand, an OOM would result in a very visible negative impact - for > > example, sad tab or system hang or crash ... > > ... And we gets lots of bug report because of that :-) > > You don't have to convince me that this is useful :)
The CQ bit was checked by cylee@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 20001, "attempt_start_ts": 1483728702293860, "parent_rev": "318c5ae5cb37a664c2f068fa2095d293bfc50bcd", "commit_rev": "38d3bfaf0e8fc3a763b2053b1fdb15a30ec28b15"}
Message was sent while issue was closed.
Description was changed from ========== TabManager: Take min_free_kbytes into consideration when killing tabs. On ChromeOS, kernel notifies Chrome when system gets into low memory condition. However the low memory calculation misses a parameter |min_free_kbytes| so the notification comes too late. A kernel fix is coming to fix the issue. On Chrome side, TabManager tries to recover memory to normal level. It mirrors the low memory calculation in kernel to know how many tabs to kill. So the missing parameter |min_free_kbytes| should be added back. In the future, try to consolidate the logic in kernel so TabManager doesn't need to do its own calculation. BUG=chromium:645512,chrome-os-partner:60498 TEST=cave,peppy,etc. ========== to ========== TabManager: Take min_free_kbytes into consideration when killing tabs. On ChromeOS, kernel notifies Chrome when system gets into low memory condition. However the low memory calculation misses a parameter |min_free_kbytes| so the notification comes too late. A kernel fix is coming to fix the issue. On Chrome side, TabManager tries to recover memory to normal level. It mirrors the low memory calculation in kernel to know how many tabs to kill. So the missing parameter |min_free_kbytes| should be added back. In the future, try to consolidate the logic in kernel so TabManager doesn't need to do its own calculation. BUG=chromium:645512,chrome-os-partner:60498 TEST=cave,peppy,etc. Review-Url: https://codereview.chromium.org/2618483002 Cr-Commit-Position: refs/heads/master@{#442018} Committed: https://chromium.googlesource.com/chromium/src/+/38d3bfaf0e8fc3a763b2053b1fdb... ==========
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as https://chromium.googlesource.com/chromium/src/+/38d3bfaf0e8fc3a763b2053b1fdb... |