|
|
Created:
4 years, 1 month ago by scheglov Modified:
4 years, 1 month ago CC:
reviews_dartlang.org Target Ref:
refs/heads/master Visibility:
Public. |
DescriptionEvict from FileByteStore by the total cache size in bytes.
Limit the size in Analysis Server to 1 GiB.
Ideally we would need to in constructor all files by access time.
But requesting stats is much more expensive than just length :-(
R=brianwilkerson@google.com, paulberry@google.com
BUG=
Committed: https://github.com/dart-lang/sdk/commit/89eabc80836b25ce79c1c085d8280225d667cb89
Patch Set 1 #
Total comments: 2
Patch Set 2 : Clean up cache in a separate isolate. #
Total comments: 7
Messages
Total messages: 15 (2 generated)
Description was changed from ========== Evict from FileByteStore by the total cache size in bytes. Limit the size in Analysis Server to 1 GiB. R=brianwilkerson@google.com, paulberry@google.com BUG= ========== to ========== Evict from FileByteStore by the total cache size in bytes. Limit the size in Analysis Server to 1 GiB. Ideally we would need to in constructor all files by access time. But requesting stats is much more expensive than just length :-( R=brianwilkerson@google.com, paulberry@google.com BUG= ==========
lgtm https://codereview.chromium.org/2492933002/diff/1/pkg/analyzer/lib/src/dart/a... File pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart (right): https://codereview.chromium.org/2492933002/diff/1/pkg/analyzer/lib/src/dart/a... pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:71: file.delete(); Do we want to decrease the size if we can't actually delete the file? https://codereview.chromium.org/2492933002/diff/1/pkg/analyzer/test/src/dart/... File pkg/analyzer/test/src/dart/analysis/file_byte_store_test.dart (right): https://codereview.chromium.org/2492933002/diff/1/pkg/analyzer/test/src/dart/... pkg/analyzer/test/src/dart/analysis/file_byte_store_test.dart:107: static List<int> _b(int length) { Really ?!? If we can't just write "new List<int>", could we at least have a name with some meaning, such as "_bytes"?
On 2016/11/10 22:00:37, scheglov wrote: IMHO this approach has several problems: 1. At startup time we read the entire contents of the directory. If the directory grows to 1GB in size, that would be a large startup penalty. 2. We can't rely on the order of files returned by the filesystem to determine which files to delete. What if the filesystem returns the newest files first? Then old files will clog up the cache. What if the filesystem returns files in alphabetical order by name? Then the files with hashes that sort toward the end will clog up the cache. 3. We are assuming that we can figure out the current size of the cache by taking the size at startup and adding and subtracting the sizes of files we've added and removed. This is not a safe assumption--the user might have multiple copies of the analyzer running at once; if another process is also adding to the cache, our estimate of the size might be very far off. How about this alternative proposal: - At startup time we kick off an isolate whose job is to compute the size of the cache directory and remove old files from it if it is bigger than 1 GB. Now it doesn't matter that requesting stats is slow, since it won't slow down the main isolate. - Each time we write a file to the cache, we accumulate that file's size into a counter. Once that counter reaches 0.1 GB, we reset it to zero and kick off another isolate to remove old files. This would keep the cache size below 1.1 GB most of the time without any penalty to the running analyzer, and even in the worst case where there are several analyzers running, the worst case cache size is bounded by (1 + 0.1*N) GB, where N is the number of analyzer processes running. What do you think?
On 2016/11/10 22:24:43, Paul Berry wrote: > On 2016/11/10 22:00:37, scheglov wrote: > > IMHO this approach has several problems: > > 1. At startup time we read the entire contents of the directory. If the > directory grows to 1GB in size, that would be a large startup penalty. > > 2. We can't rely on the order of files returned by the filesystem to determine > which files to delete. What if the filesystem returns the newest files first? > Then old files will clog up the cache. What if the filesystem returns files in > alphabetical order by name? Then the files with hashes that sort toward the end > will clog up the cache. > > 3. We are assuming that we can figure out the current size of the cache by > taking the size at startup and adding and subtracting the sizes of files we've > added and removed. This is not a safe assumption--the user might have multiple > copies of the analyzer running at once; if another process is also adding to the > cache, our estimate of the size might be very far off. > > How about this alternative proposal: > > - At startup time we kick off an isolate whose job is to compute the size of the > cache directory and remove old files from it if it is bigger than 1 GB. Now it > doesn't matter that requesting stats is slow, since it won't slow down the main > isolate. > > - Each time we write a file to the cache, we accumulate that file's size into a > counter. Once that counter reaches 0.1 GB, we reset it to zero and kick off > another isolate to remove old files. > > This would keep the cache size below 1.1 GB most of the time without any penalty > to the running analyzer, and even in the worst case where there are several > analyzers running, the worst case cache size is bounded by (1 + 0.1*N) GB, where > N is the number of analyzer processes running. > > What do you think? Unfortunately after further reading about access time, I found that it does not work well on Windows. http://superuser.com/questions/251263/the-last-access-date-is-not-changed-eve... So, I will need an alternative approach. I'm going to use an explicitly managed list of last read files in a separate file. While there is a chance of race condition, it should not cause permanent or consistent problems.
On 2016/11/11 17:18:40, scheglov wrote: > On 2016/11/10 22:24:43, Paul Berry wrote: > > On 2016/11/10 22:00:37, scheglov wrote: > > > > IMHO this approach has several problems: > > > > 1. At startup time we read the entire contents of the directory. If the > > directory grows to 1GB in size, that would be a large startup penalty. > > > > 2. We can't rely on the order of files returned by the filesystem to determine > > which files to delete. What if the filesystem returns the newest files first? > > > Then old files will clog up the cache. What if the filesystem returns files > in > > alphabetical order by name? Then the files with hashes that sort toward the > end > > will clog up the cache. > > > > 3. We are assuming that we can figure out the current size of the cache by > > taking the size at startup and adding and subtracting the sizes of files we've > > added and removed. This is not a safe assumption--the user might have > multiple > > copies of the analyzer running at once; if another process is also adding to > the > > cache, our estimate of the size might be very far off. > > > > How about this alternative proposal: > > > > - At startup time we kick off an isolate whose job is to compute the size of > the > > cache directory and remove old files from it if it is bigger than 1 GB. Now > it > > doesn't matter that requesting stats is slow, since it won't slow down the > main > > isolate. > > > > - Each time we write a file to the cache, we accumulate that file's size into > a > > counter. Once that counter reaches 0.1 GB, we reset it to zero and kick off > > another isolate to remove old files. > > > > This would keep the cache size below 1.1 GB most of the time without any > penalty > > to the running analyzer, and even in the worst case where there are several > > analyzers running, the worst case cache size is bounded by (1 + 0.1*N) GB, > where > > N is the number of analyzer processes running. > > > > What do you think? > > Unfortunately after further reading about access time, I found that it does not > work well on Windows. > http://superuser.com/questions/251263/the-last-access-date-is-not-changed-eve... > > So, I will need an alternative approach. > I'm going to use an explicitly managed list of last read files in a separate > file. > While there is a chance of race condition, it should not cause permanent or > consistent problems. How bad would it be if we just ignored this problem on Windows? Occasionally a file would get flushed out of the cache that was actually being frequently used, and then soon afterwards it would be re-analyzed and re-added to the cache. At that point it would be brand new again, so it wouldn't get flushed out again until a full 1 GB of data got cycled through the cache. Based on my experience using ccache in my C++ development days, I'm guessing that the amount of time it would take for 1 GB of data to get cycled through the cache is probably measured in weeks if not months. At steady state, the ages of these problematic files would be uniformly distributed, so the penalty would be paid in tiny, almost imperceptible increments, usually at startup time. It's not an ideal situation, obviously. But I think we should consider waiting to see if it becomes a serious problem before expending much engineering effort on a more complex solution.
PTAL
lgtm
lgtm https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... File pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart (right): https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:23: int _currentSizeBytes = 0; Rename this to something like "_bytesWrittenSinceCleanup". It doesn't represent the current size anymore. https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:87: * This function is stated in a new isolate, receives cache clean up requests s/stated/started/ https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:98: args[2] is SendPort) { It seems weird to send a list of objects to the isolate. Why not put cachePath, maxSizeBytes, and replyTo into a class and then transfer an object of that type across the isolate?
https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... File pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart (right): https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:23: int _currentSizeBytes = 0; On 2016/11/13 15:18:17, Paul Berry wrote: > Rename this to something like "_bytesWrittenSinceCleanup". It doesn't represent > the current size anymore. Done. https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:87: * This function is stated in a new isolate, receives cache clean up requests On 2016/11/13 15:18:17, Paul Berry wrote: > s/stated/started/ Done. https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:98: args[2] is SendPort) { On 2016/11/13 15:18:17, Paul Berry wrote: > It seems weird to send a list of objects to the isolate. Why not put cachePath, > maxSizeBytes, and replyTo into a class and then transfer an object of that type > across the isolate? Unfortunately this does not work. SendPort.send says that: * The content of [message] can be: primitive values (null, num, bool, double, * String), instances of [SendPort], and lists and maps whose elements are any * of these. List and maps are also allowed to be cyclic. It also says that it should be possible to send objects. * In the special circumstances when two isolates share the same code and are * running in the same process (e.g. isolates created via [Isolate.spawn]), it * is also possible to send object instances (which would be copied in the * process). This is currently only supported by the dartvm. For now, the * dart2js compiler only supports the restricted messages described above. But it does not work. I still get an exception when I try to send an object. Invalid argument(s): Invalid object found in message.
Description was changed from ========== Evict from FileByteStore by the total cache size in bytes. Limit the size in Analysis Server to 1 GiB. Ideally we would need to in constructor all files by access time. But requesting stats is much more expensive than just length :-( R=brianwilkerson@google.com, paulberry@google.com BUG= ========== to ========== Evict from FileByteStore by the total cache size in bytes. Limit the size in Analysis Server to 1 GiB. Ideally we would need to in constructor all files by access time. But requesting stats is much more expensive than just length :-( R=brianwilkerson@google.com, paulberry@google.com BUG= Committed: https://github.com/dart-lang/sdk/commit/89eabc80836b25ce79c1c085d8280225d667cb89 ==========
Message was sent while issue was closed.
Committed patchset #2 (id:20001) manually as 89eabc80836b25ce79c1c085d8280225d667cb89 (presubmit successful).
Message was sent while issue was closed.
https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... File pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart (right): https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:98: args[2] is SendPort) { On 2016/11/14 16:51:47, scheglov wrote: > On 2016/11/13 15:18:17, Paul Berry wrote: > > It seems weird to send a list of objects to the isolate. Why not put > cachePath, > > maxSizeBytes, and replyTo into a class and then transfer an object of that > type > > across the isolate? > > Unfortunately this does not work. > > SendPort.send says that: > > * The content of [message] can be: primitive values (null, num, bool, double, > * String), instances of [SendPort], and lists and maps whose elements are any > * of these. List and maps are also allowed to be cyclic. > > It also says that it should be possible to send objects. > > * In the special circumstances when two isolates share the same code and are > * running in the same process (e.g. isolates created via [Isolate.spawn]), it > * is also possible to send object instances (which would be copied in the > * process). This is currently only supported by the dartvm. For now, the > * dart2js compiler only supports the restricted messages described above. > > > But it does not work. > I still get an exception when I try to send an object. > Invalid argument(s): Invalid object found in message. +rmacnak, is this due to a bug in the VM?
Message was sent while issue was closed.
On 2016/11/14 21:19:29, Paul Berry wrote: > https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... > File pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart (right): > > https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... > pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:98: args[2] is SendPort) > { > On 2016/11/14 16:51:47, scheglov wrote: > > On 2016/11/13 15:18:17, Paul Berry wrote: > > > It seems weird to send a list of objects to the isolate. Why not put > > cachePath, > > > maxSizeBytes, and replyTo into a class and then transfer an object of that > > type > > > across the isolate? > > > > Unfortunately this does not work. > > > > SendPort.send says that: > > > > * The content of [message] can be: primitive values (null, num, bool, > double, > > * String), instances of [SendPort], and lists and maps whose elements are > any > > * of these. List and maps are also allowed to be cyclic. > > > > It also says that it should be possible to send objects. > > > > * In the special circumstances when two isolates share the same code and > are > > * running in the same process (e.g. isolates created via [Isolate.spawn]), > it > > * is also possible to send object instances (which would be copied in the > > * process). This is currently only supported by the dartvm. For now, the > > * dart2js compiler only supports the restricted messages described above. > > > > > > But it does not work. > > I still get an exception when I try to send an object. > > Invalid argument(s): Invalid object found in message. > > +rmacnak, is this due to a bug in the VM? How was the second isolate created? You can only send general objects to an isolate created with spawnFunction, not spawnUri.
Message was sent while issue was closed.
On 2016/11/14 23:29:28, rmacnak wrote: > On 2016/11/14 21:19:29, Paul Berry wrote: > > > https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... > > File pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart (right): > > > > > https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... > > pkg/analyzer/lib/src/dart/analysis/file_byte_store.dart:98: args[2] is > SendPort) > > { > > On 2016/11/14 16:51:47, scheglov wrote: > > > On 2016/11/13 15:18:17, Paul Berry wrote: > > > > It seems weird to send a list of objects to the isolate. Why not put > > > cachePath, > > > > maxSizeBytes, and replyTo into a class and then transfer an object of that > > > type > > > > across the isolate? > > > > > > Unfortunately this does not work. > > > > > > SendPort.send says that: > > > > > > * The content of [message] can be: primitive values (null, num, bool, > > double, > > > * String), instances of [SendPort], and lists and maps whose elements are > > any > > > * of these. List and maps are also allowed to be cyclic. > > > > > > It also says that it should be possible to send objects. > > > > > > * In the special circumstances when two isolates share the same code and > > are > > > * running in the same process (e.g. isolates created via > [Isolate.spawn]), > > it > > > * is also possible to send object instances (which would be copied in the > > > * process). This is currently only supported by the dartvm. For now, the > > > * dart2js compiler only supports the restricted messages described above. > > > > > > > > > But it does not work. > > > I still get an exception when I try to send an object. > > > Invalid argument(s): Invalid object found in message. > > > > +rmacnak, is this due to a bug in the VM? > > How was the second isolate created? You can only send general objects to an > isolate created with spawnFunction, not spawnUri. I assume by spawnFunction, you mean Isolate.spawn()? (There is no Isolate.spawnFunction()). It was created with Isolate.spawn(). See https://codereview.chromium.org/2492933002/diff/20001/pkg/analyzer/lib/src/da... line 65. |