|
|
Created:
6 years, 8 months ago by rileya (GONE FROM CHROMIUM) Modified:
6 years, 3 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org, gavinp+disk_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Visibility:
Public. |
DescriptionIgnore disk cache for concurrent byte range requests to a single resource, to prevent stalling.
Currently the single writer/multiple reader lock implemented by the http cache
results in stalling when multiple byte range requests are made for the same
resource. This is most notable in the case of video streaming, where attempting
to play back a single video resource in multiple places will result in all but
one stalling until the full download completes.
This is a hacky fix, wherein we simply ignore the cache if we see that a given
HttpCache::Transaction is operating on a sparse (byte range request) cache entry
which is already active and currently being written to.
BUG=31014
Patch Set 1 #Patch Set 2 : Check for range requests rather than CouldBeSparse() #
Total comments: 2
Patch Set 3 : Add link to new issue #Patch Set 4 : Alternate fix to avoid race condition. #Patch Set 5 : Fix comment #Patch Set 6 : Slightly less intrusive version of PS4 #
Total comments: 7
Patch Set 7 : address comments #Patch Set 8 : Add unit test, make conditions for activating hack a bit stricter. #
Total comments: 15
Patch Set 9 : Address comments on test #Messages
Total messages: 18 (0 generated)
Hacky fix for good 'ol crbug.com/31014 This ignores the cache for byte range requests if we see that another transaction has already taken the writer lock. This means potentially wasting some bandwidth, but given that this bug hasn't gotten enough attention to have been fixed by now (it's >4 years old), I'd say it's enough of an edge case that this isn't a big concern. I'd argue that having a working video in these rare cases at the cost of a bit more bandwidth use is a good tradeoff. I've been looking into a better solution which would allow concurrent transactions on sparse cache entries, and I think it's feasible, but a fairly large undertaking. A temporary fix like this would go a long way in the meantime.
lgtm. We have to see what happens in practice because we may get bug reports of not caching requests.
On 2014/04/09 23:56:24, rvargas wrote: > lgtm. > > We have to see what happens in practice because we may get bug reports of not > caching requests. Actually, I changed my mind. The check seems too superficial as it includes a large percentage of ActiveEntries and all new requests. Maybe if we check that both requests are range_requests?
Adjusted to check if this Transaction is a range request and the writer on the active entry (if it exists) is also a range request, rather than checking the entry's CouldBeSparse(). PTAL.
What's our testing story here? It seems like we might be able to write a content_browsertest that verifies the fix at a very high level. I'm less sure about unit tests (rvargas?) For example ... spin up the test HTTP server, have <video> #1 start loading asset A, verify we get some events, then have <video> #2 load the same asset A, and verify we get the same events. I believe the tricky part is that we need a sufficiently large asset for the <video> #1 to defer loading the asset. https://codereview.chromium.org/232003002/diff/20001/net/http/http_cache_tran... File net/http/http_cache_transaction.cc (right): https://codereview.chromium.org/232003002/diff/20001/net/http/http_cache_tran... net/http/http_cache_transaction.cc:858: // and remove this hack. if the plan is to close the original bug, then I'd recommending finding an existing bug / filing a new bug we can link to
On 2014/04/10 18:26:06, scherkus wrote: > What's our testing story here? I was wondering the same thing, I wanted to make sure this seemed like a reasonable fix before diving too far into figuring the tests out though. > It seems like we might be able to write a content_browsertest that verifies the > fix at a very high level. I'm less sure about unit tests (rvargas?) > > For example ... spin up the test HTTP server, have <video> #1 start loading > asset A, verify we get some events, then have <video> #2 load the same asset A, > and verify we get the same events. I believe the tricky part is that we need a > sufficiently large asset for the <video> #1 to defer loading the asset. Sounds good, I'll give that a try. https://codereview.chromium.org/232003002/diff/20001/net/http/http_cache_tran... File net/http/http_cache_transaction.cc (right): https://codereview.chromium.org/232003002/diff/20001/net/http/http_cache_tran... net/http/http_cache_transaction.cc:858: // and remove this hack. On 2014/04/10 18:26:06, scherkus wrote: > if the plan is to close the original bug, then I'd recommending finding an > existing bug / filing a new bug we can link to Sounds good, filed crbug.com/362215.
That looks better. As for a test, you should look at TEST(HttpCache, SimpleGET_ManyReaders) for inspiration. There's a bunch of tests that issue range requests in that file, and even a mock range server (you may or may not need that server)
The original patch actually had a potential race, since it checked for an active cache entry too soon (two transactions could proceed past the check and one would still end up getting blocked). Patchset 4 skips the cache only if a range request is actually unable to acquire the lock. PTAL. Interestingly this fix doesn't seem to handle redirects (if you have two videos with the same src, which both redirect to the same place, behavior is very inconsistent (both, one, or none will potentially play back)). I'm not familiar with how the cache system handles redirects, any thoughts?
On 2014/04/14 19:38:28, rileya wrote: > The original patch actually had a potential race, since it checked for an active > cache entry too soon (two transactions could proceed past the check and one > would still end up getting blocked). Patchset 4 skips the cache only if a range > request is actually unable to acquire the lock. PTAL. > > Interestingly this fix doesn't seem to handle redirects (if you have two videos > with the same src, which both redirect to the same place, behavior is very > inconsistent (both, one, or none will potentially play back)). I'm not familiar > with how the cache system handles redirects, any thoughts? I actually prefer the version from PS3. PS4 is more intrusive and spreads around knowledge about the hack. The race is about two transactions being started simultaneously, and that seems more of an issue with tests than in the real world. If we were talking about a clean, definitive solution I would agree that it would be sub-optimal. Maybe we can deploy the optimistic fix and only worry about messing directly with the lock if we see evidence that it matters in the field. Redirects are handled above the cache. When a redirect is detected, a new UrlRequestHttpJob is created (and with it another HttpCache::Transaction)... and the previous transaction is destroyed. In other words, the cache cares about a single URL either pre or post redirection.
Unfortunately it looks like the fix from PS3 fails ~20% of the time in a realistic use-case on my machine (two video elements with the same src on one page), and I'd lean towards saying that always failing is probably better than being non-deterministic like that... PS4's solution works all the time (but is inconsistent for redirects), but I do agree it is more intrusive. I'll see if I can come up with something more consistent but not as intrusive... I'll also look into what's causing redirects to behave differently.
PS6 still does the same thing as PS4 but confines the hack to one spot.
https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... File net/http/http_cache_transaction.cc (right): https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1218: if (partial_ && result == net::ERR_IO_PENDING) { Why giving up on trying to see if the lock owner is also a byte range request? I guess it's not that different in practice given that this one is partial_... are you worried that there is no writer? https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1226: new_entry_->pending_queue.pop_front(); cache_->RemovePendingTransaction(this); https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1227: new_entry_ = entry_ = NULL; entry_ should be NULL already.
I'm still having some issues with redirects, but it appears to be an unrelated issue, since I can repro without this patch (I'll file a bug and look into it a bit more). https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... File net/http/http_cache_transaction.cc (right): https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1218: if (partial_ && result == net::ERR_IO_PENDING) { On 2014/04/22 00:44:31, rvargas wrote: > Why giving up on trying to see if the lock owner is also a byte range request? I > guess it's not that different in practice given that this one is partial_... are > you worried that there is no writer? I was thinking that we could be blocked by either a writer or reader(s), so I wasn't sure what to check here. I was also thinking it doesn't really matter whether the holder of the lock was itself a range request, as long as this transaction is and was blocked. https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1226: new_entry_->pending_queue.pop_front(); On 2014/04/22 00:44:31, rvargas wrote: > cache_->RemovePendingTransaction(this); Done. https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1227: new_entry_ = entry_ = NULL; On 2014/04/22 00:44:31, rvargas wrote: > entry_ should be NULL already. Ahh, alright, removed setting it.
I guess it's time for net unit tests. https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... File net/http/http_cache_transaction.cc (right): https://codereview.chromium.org/232003002/diff/110001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1218: if (partial_ && result == net::ERR_IO_PENDING) { On 2014/04/23 17:45:05, rileya wrote: > On 2014/04/22 00:44:31, rvargas wrote: > > Why giving up on trying to see if the lock owner is also a byte range request? > I > > guess it's not that different in practice given that this one is partial_... > are > > you worried that there is no writer? > > I was thinking that we could be blocked by either a writer or reader(s), so I > wasn't sure what to check here. I was also thinking it doesn't really matter > whether the holder of the lock was itself a range request, as long as this > transaction is and was blocked. yeah, I've been going back and forth about this. It would be nice to limit the scope as much as we can, but it probably doesn't matter much in practice because mixed access (normal and byte range) should be uncommon.
Looking into this again. I made the check a little stricter for triggering the hack, which fixed a couple small issues. I also added a simple unit test which I think should still work when this is fixed in a more correct way (since a test that just checks for this hack existing wouldn't make much sense).
https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_tra... File net/http/http_cache_transaction.cc (right): https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1219: (mode() & Transaction::WRITE && !new_entry_->readers.empty()))) { You mentioned that this fixed some small issues. My reaction would be to remove the extra check because it seems to be just duplicating the logic that returns io_pending from AddTransactionToEntry... except in the case of a race where the cache is in the process of moving from one transaction to the next. If that is causing issues, it seems too subtle to handle it by duplicating some logic here and it would be better to ask the cache about it. What was the issue? https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... File net/http/http_cache_unittest.cc (right): https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4426: // comment in Transaction::DoAddToEntry in http_cache_transaction.cc), but a If we expect this test to pass now and with a future (better) implementation without change, we'll forget about this comment and it will become stale. It's better to avoid the comment now. https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4433: request.load_flags |= net::LOAD_VALIDATE_CACHE; no need for this flag https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4436: transaction.data = "rg: 00-09"; Should need an extra space at the end (to make it 10 bytes) https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4437: transaction.handler = NULL; why remove the server? https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4438: AddMockTransaction(&transaction); Should be paired with a RemoveMockTransaction() https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4452: AddMockTransaction(&kRangeGET_TransactionOK); This is not needed... transactions are keyed by the URL and that is not changing.
https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_tra... File net/http/http_cache_transaction.cc (right): https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1219: (mode() & Transaction::WRITE && !new_entry_->readers.empty()))) { On 2014/05/29 19:58:52, rvargas wrote: > You mentioned that this fixed some small issues. My reaction would be to remove > the extra check because it seems to be just duplicating the logic that returns > io_pending from AddTransactionToEntry... except in the case of a race where the > cache is in the process of moving from one transaction to the next. If that is > causing issues, it seems too subtle to handle it by duplicating some logic here > and it would be better to ask the cache about it. What was the issue? It was indeed a problem of racing when moving between transactions in the pending queue, when there weren't actually any readers/writers to block the transaction (this caused several net_unittests to fail). How would you recommend handling this without duplicating this logic? https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... File net/http/http_cache_unittest.cc (right): https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4426: // comment in Transaction::DoAddToEntry in http_cache_transaction.cc), but a On 2014/05/29 19:58:52, rvargas wrote: > If we expect this test to pass now and with a future (better) implementation > without change, we'll forget about this comment and it will become stale. It's > better to avoid the comment now. Good point, removed. https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4433: request.load_flags |= net::LOAD_VALIDATE_CACHE; On 2014/05/29 19:58:52, rvargas wrote: > no need for this flag Removed. https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4436: transaction.data = "rg: 00-09"; On 2014/05/29 19:58:52, rvargas wrote: > Should need an extra space at the end (to make it 10 bytes) Added. https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4437: transaction.handler = NULL; On 2014/05/29 19:58:52, rvargas wrote: > why remove the server? Another test elsewhere did this so I assumed it had some importance. Removed. https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4438: AddMockTransaction(&transaction); On 2014/05/29 19:58:52, rvargas wrote: > Should be paired with a RemoveMockTransaction() Added. https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_uni... net/http/http_cache_unittest.cc:4452: AddMockTransaction(&kRangeGET_TransactionOK); On 2014/05/29 19:58:52, rvargas wrote: > This is not needed... transactions are keyed by the URL and that is not > changing. Removed.
https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_tra... File net/http/http_cache_transaction.cc (right): https://codereview.chromium.org/232003002/diff/150001/net/http/http_cache_tra... net/http/http_cache_transaction.cc:1219: (mode() & Transaction::WRITE && !new_entry_->readers.empty()))) { On 2014/05/29 20:15:03, rileya wrote: > On 2014/05/29 19:58:52, rvargas wrote: > > You mentioned that this fixed some small issues. My reaction would be to > remove > > the extra check because it seems to be just duplicating the logic that returns > > io_pending from AddTransactionToEntry... except in the case of a race where > the > > cache is in the process of moving from one transaction to the next. If that is > > causing issues, it seems too subtle to handle it by duplicating some logic > here > > and it would be better to ask the cache about it. What was the issue? > > It was indeed a problem of racing when moving between transactions in the > pending queue, when there weren't actually any readers/writers to block the > transaction (this caused several net_unittests to fail). How would you recommend > handling this without duplicating this logic? It doesn't seem reasonable (possible?) to ask transactions to be aware of the state of the cache (ok, ActiveEntry but still) before calling RemovePendingTransaction. That says to me that the cache must be prepared to receive that call at any time, and it should handle the case where the transaction that is going away is the only transaction tied to the entry. Reading HttpCache::OnProcessPendingQueue it looks like it does everything as intended... it just deletes the entry if there's no transaction to deal with. So I guess I would have to see exactly what is failing and why to answer you. Is it just that unit tests that expect requests to be queued are seeing requests escaping the lock? If that's the case then the issue would not be will_process_pending_queue but the fact that the workaround lets regular (not byte range related) requests trigger the escape. |