Issue 11316178: Added CacheEntry implementation.

agayev

PTAL. I will add comments once the API stabilizes.

8 years ago (2012-11-26 20:57:14 UTC) #1

rvargas (doing something else)

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc File net/disk_cache/flash/cache_entry.cc (right): https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc#newcode51 net/disk_cache/flash/cache_entry.cc:51: !store_->ReadData(id_, stream_sizes, kFlashCacheEntryHeaderSize, 0)) needs braces https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc#newcode88 net/disk_cache/flash/cache_entry.cc:88: DCHECK(init_&& ...

8 years ago (2012-11-28 03:12:56 UTC) #2

agayev

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc File net/disk_cache/flash/cache_entry.cc (right): https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc#newcode51 net/disk_cache/flash/cache_entry.cc:51: !store_->ReadData(id_, stream_sizes, kFlashCacheEntryHeaderSize, 0)) On 2012/11/28 03:12:56, rvargas wrote: ...

8 years ago (2012-11-29 16:14:43 UTC) #3

rvargas (doing something else)

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc File net/disk_cache/flash/cache_entry.cc (right): https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc#newcode95 net/disk_cache/flash/cache_entry.cc:95: const int stream_size = static_cast<int>(streams_[index].data.size()); On 2012/11/29 16:14:43, agayev ...

8 years ago (2012-11-29 20:32:36 UTC) #4

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache...
File net/disk_cache/flash/cache_entry.cc (right):

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache...
net/disk_cache/flash/cache_entry.cc:95: const int stream_size =
static_cast<int>(streams_[index].data.size());
On 2012/11/29 16:14:43, agayev wrote:
> On 2012/11/28 03:12:56, rvargas wrote:
> > drop the const and call GetDataSize
> 
> Done.  Why drop const?

Because it is a local variable that just stores the result of an operation.
Sure, it doesn't change during its lifetime, but there's very little benefit by
making it const, so it is mostly noise while reading the code... it belongs to
the "don't go crazy with const" on the style guide. It would be basically the
same as making |index| const just because it doesn't change during this method.

Note that having "const int kMyConstant = 125" is a different story because we
are now saying that we don't even need a variable.

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache...
net/disk_cache/flash/cache_entry.cc:105: memcpy(buf->data(),
&streams_[index].data[offset], buf_len);
On 2012/11/29 16:14:43, agayev wrote:
> On 2012/11/28 03:12:56, rvargas wrote:
> > And extra copy here doesn't look right. What's the purpose of the extra
buffer
> > at this layer?
> 
> Right now we are not buffering in the Segment layer, so it's the only
buffering
> we do, not extra.  Ideally we should buffer at the segment layer, but we need
to
> experiment to find out which is the best.

Well... it is an extra buffer because what we really need is buffering at the
Segment layer (even if it has not been implemented yet).

Another way to look at it: having a buffer here is an optimization that would be
fine if it helps the common use pattern. But the common pattern is Create ->
Write -> Close, or Open -> Read -> Close, with a much more uncommon Open -> Read
-> Overwrite(*) -> Close. None of these benefit by a read buffer here, and in
fact would just make things slower.

(*) Total or partial overwrite depending on 302 or 200 after revalidation, but
we don't really read again the same data.

Sure, the API allows multiple random reads before closing the entry, and we get
there  with multiple readers, or with a different user (other than the HTTP
cache), but that's not the main use case.

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache...
net/disk_cache/flash/cache_entry.cc:123: if (!LazyRead(index))
On 2012/11/29 16:14:43, agayev wrote:
> On 2012/11/28 03:12:56, rvargas wrote:
> > what for?
> 
> The idea is to not to read a stream unless it is needed.  So, if we are doing
> truncate, we do not read it, if we are doing append, we read what we have into
> memory and then append to it.

But we don't have to read the first part in order to append... we just have to
remember that we are appending, no? Does that even make sense here? I mean, it
is possible to start writing with an offset, but I doubt that's what you have in
mind with this code :)

There's no code to handle updating an entry yet, and we know that doing that
means creating a new entry somewhere, so someone has to read the old entry,
receive writes from the user, and at the end create the new entry (deleting the
old one at some point). It's not clear to me if you are thinking about that use
case here, but if that's the case it seems better to wait until that code
appears.

I mean, I really don't know if this class will handle directly things coming
from the user or not... as in, if |index| is the value passed by the HTTP cache
and this class will grow code to handle the key and additional metadata, or if
that code will be handled by a layer above this one (so that we may have two
CacheEntry objects to deal with updates, for example).

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache...
net/disk_cache/flash/cache_entry.cc:156: if (!LazyRead(i))
On 2012/11/29 16:14:43, agayev wrote:
> On 2012/11/28 03:12:56, rvargas wrote:
> > why?
> 
> Say we open an existing entry and update stream 1 and then close it.  The
above
> makes sure that when we save the new copy, stream 0 and stream 2 are copied as
> well, since we do not read a stream unless it is needed.

but that use case doesn't work with the rest of the code as is, so it feels out
of place.

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache...
File net/disk_cache/flash/cache_entry.h (right):

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache...
net/disk_cache/flash/cache_entry.h:52: bool insync;
On 2012/11/29 16:14:43, agayev wrote:
> On 2012/11/28 03:12:56, rvargas wrote:
> > insync?
> 
> Means in sync with the store.

Then, at the very least it should be in_sync (but I would prefer a more
descriptive name). Note that I'm still not convinced about how this works... see
below

agayev

Instead of replying to the comments I reply here in one coherent piece. What I've ...

8 years ago (2012-12-03 17:33:07 UTC) #5

rvargas (doing something else)

My point is that (2) is not really implemented (or I am missing that). There ...

8 years ago (2012-12-03 21:13:31 UTC) #6

My point is that (2) is not really implemented (or I am missing that). There is
partial support to it by the fact that this CacheEntry supports calling
WriteData to stream x at offset (y != 0) without having to write or read the
first y bytes of that stream, but that is far from being able to update an entry
(I think).

I don't mind supporting update directly in this class, but I don't like the idea
of reading though a buffer here. Even if we want to allow this object to be
initialized with a given offset and then receive subsequent writes, I think we
should operate under the assumption that reads from the lower layer are cheap
enough (and most likely buffered) so that it is better just to read again
if/when needed.

I also don't mind punting update to the caller of this class. Or even punting
update to be dealt with later. In fact, I would prefer having a whole system
able to read and write complete entries from disk (with substantial metadata)
and lacking the support for updates, to a system that implements updates, but
not yet the basic functionality of read and write.

On 2012/12/03 17:33:07, agayev wrote:
> Instead of replying to the comments I reply here in one coherent piece.
> 
> What I've implemented here is an entry that supports all use cases: (1) create
a
> fresh entry, write and/or read to/from it (2) open an existing entry read
and/or
> write from/to it.
> 
> For case (1) everything happens via in-memory buffers, i.e. we only touch the
> storage when an entry is closed/destructed.
> 
> For case (2) we attempt to do minimal amount of work via lazy reading, we only
> bring into memory stream N, when it is needed, which is needed when we update
it
> (i.e. overwrite or append), or when we read specific stream.
> 
> I think what you are proposing is to separate update from the rest.  In this
> case, we will have two modes of operation of an entry: (a) a fresh entry that
> implements write buffering and everything happens in memory until entry is
> closed/destructed (b) a read-only existing entry that doesn't do any
> read-buffering and redirects everything to the lower layer, and naturally, it
> does not support writes.  And update is done at the upper layer by
instantiating
> two entries, one is fresh and the other one read-only existing one, and we
> copy/append from the existing one to the fresh one.
> 
> Am I understanding this correctly?  I do not have any objection to it and it
is
> probably easier to implement and reason about, just want to make sure that we
> are on the same page.

agayev

On 2012/12/03 21:13:31, rvargas wrote: > My point is that (2) is not really implemented ...

8 years ago (2012-12-03 22:53:58 UTC) #7

On 2012/12/03 21:13:31, rvargas wrote:
> My point is that (2) is not really implemented (or I am missing that). There
is
> partial support to it by the fact that this CacheEntry supports calling
> WriteData to stream x at offset (y != 0) without having to write or read the
> first y bytes of that stream, but that is far from being able to update an
entry
> (I think).

It does support updating at offset 0 as well, but I suppose by update you mean
writing some metadata as well?
 
> I don't mind supporting update directly in this class, but I don't like the
idea
> of reading though a buffer here. Even if we want to allow this object to be
> initialized with a given offset and then receive subsequent writes, I think we
> should operate under the assumption that reads from the lower layer are cheap
> enough (and most likely buffered) so that it is better just to read again
> if/when needed.

We do not keep entry sizes at the LogStructuredStore layer, which would lead us
to bringing complete segments into memory.  Depending on how random browsing
habits are, this may bring a lot of segments into memory.  Do you have some
rough idea as to how we are going to implement read buffering at the lower
layer?
 
> I also don't mind punting update to the caller of this class. Or even punting
> update to be dealt with later. In fact, I would prefer having a whole system
> able to read and write complete entries from disk (with substantial metadata)
> and lacking the support for updates, to a system that implements updates, but
> not yet the basic functionality of read and write.

Repeating the above question, what do you consider basic functionality of read
and write?  I thought it was just reading and writing streams of the CacheEntry
object?!

> 
> On 2012/12/03 17:33:07, agayev wrote:
> > Instead of replying to the comments I reply here in one coherent piece.
> > 
> > What I've implemented here is an entry that supports all use cases: (1)
create
> a
> > fresh entry, write and/or read to/from it (2) open an existing entry read
> and/or
> > write from/to it.
> > 
> > For case (1) everything happens via in-memory buffers, i.e. we only touch
the
> > storage when an entry is closed/destructed.
> > 
> > For case (2) we attempt to do minimal amount of work via lazy reading, we
only
> > bring into memory stream N, when it is needed, which is needed when we
update
> it
> > (i.e. overwrite or append), or when we read specific stream.
> > 
> > I think what you are proposing is to separate update from the rest.  In this
> > case, we will have two modes of operation of an entry: (a) a fresh entry
that
> > implements write buffering and everything happens in memory until entry is
> > closed/destructed (b) a read-only existing entry that doesn't do any
> > read-buffering and redirects everything to the lower layer, and naturally,
it
> > does not support writes.  And update is done at the upper layer by
> instantiating
> > two entries, one is fresh and the other one read-only existing one, and we
> > copy/append from the existing one to the fresh one.
> > 
> > Am I understanding this correctly?  I do not have any objection to it and it
> is
> > probably easier to implement and reason about, just want to make sure that
we
> > are on the same page.

rvargas (doing something else)

On 2012/12/03 22:53:58, agayev wrote: > On 2012/12/03 21:13:31, rvargas wrote: > > My point ...

8 years ago (2012-12-03 23:26:16 UTC) #8

On 2012/12/03 22:53:58, agayev wrote:
> On 2012/12/03 21:13:31, rvargas wrote:
> > My point is that (2) is not really implemented (or I am missing that). There
> is
> > partial support to it by the fact that this CacheEntry supports calling
> > WriteData to stream x at offset (y != 0) without having to write or read the
> > first y bytes of that stream, but that is far from being able to update an
> entry
> > (I think).
> 
> It does support updating at offset 0 as well, but I suppose by update you mean
> writing some metadata as well?

Yes. I understand that entry lifetime is not the job of any of the code we have
so far, but update also means changing the location of the entry from where it
was to the new one. As I said, maybe I'm just missing that, in which case I'm
happy that there's nothing else to do :)

>  
> > I don't mind supporting update directly in this class, but I don't like the
> idea
> > of reading though a buffer here. Even if we want to allow this object to be
> > initialized with a given offset and then receive subsequent writes, I think
we
> > should operate under the assumption that reads from the lower layer are
cheap
> > enough (and most likely buffered) so that it is better just to read again
> > if/when needed.
> 
> We do not keep entry sizes at the LogStructuredStore layer, which would lead
us
> to bringing complete segments into memory.  Depending on how random browsing
> habits are, this may bring a lot of segments into memory.  Do you have some
> rough idea as to how we are going to implement read buffering at the lower
> layer?

Why bringing complete segments into memory? If I read x bytes from offset y,
what prevents us from keeping that buffer at the lower layer? We need a buffer
for writes down there in any case. And if what I read is not in the buffer,
reading from disk should be fast, right?. Why does this scenario changes
depending on the read patterns (number of open entries)? For a given set of open
entries, we have to read the data from the segment, so we have to create that
segment object.

>  
> > I also don't mind punting update to the caller of this class. Or even
punting
> > update to be dealt with later. In fact, I would prefer having a whole system
> > able to read and write complete entries from disk (with substantial
metadata)
> > and lacking the support for updates, to a system that implements updates,
but
> > not yet the basic functionality of read and write.
> 
> Repeating the above question, what do you consider basic functionality of read
> and write?  I thought it was just reading and writing streams of the
CacheEntry
> object?!

Sorry for not being clear. I mean having a system that can be used for some
testing... that implements the upper interface of the disk cache, and it is able
to read cached data from a previous _session_.

Again, if what we have so far is all that is needed for updates then so be it.
If it is not, I think updates has a lower priority than other stuff that we
need.

If this CL handles updates as is (implying that this layer has to read data from
the store only to be able to write it to another place) then fine, lets read.
But I still see a read buffer here (as is) as the wrong thing to do given that
it is an extra copy for more than 90% of the requests. If we need to keep a read
buffer for updates, first let's make sure that we are handling an update, and
then read.

> 
> > 
> > On 2012/12/03 17:33:07, agayev wrote:
> > > Instead of replying to the comments I reply here in one coherent piece.
> > > 
> > > What I've implemented here is an entry that supports all use cases: (1)
> create
> > a
> > > fresh entry, write and/or read to/from it (2) open an existing entry read
> > and/or
> > > write from/to it.
> > > 
> > > For case (1) everything happens via in-memory buffers, i.e. we only touch
> the
> > > storage when an entry is closed/destructed.
> > > 
> > > For case (2) we attempt to do minimal amount of work via lazy reading, we
> only
> > > bring into memory stream N, when it is needed, which is needed when we
> update
> > it
> > > (i.e. overwrite or append), or when we read specific stream.
> > > 
> > > I think what you are proposing is to separate update from the rest.  In
this
> > > case, we will have two modes of operation of an entry: (a) a fresh entry
> that
> > > implements write buffering and everything happens in memory until entry is
> > > closed/destructed (b) a read-only existing entry that doesn't do any
> > > read-buffering and redirects everything to the lower layer, and naturally,
> it
> > > does not support writes.  And update is done at the upper layer by
> > instantiating
> > > two entries, one is fresh and the other one read-only existing one, and we
> > > copy/append from the existing one to the fresh one.
> > > 
> > > Am I understanding this correctly?  I do not have any objection to it and
it
> > is
> > > probably easier to implement and reason about, just want to make sure that
> we
> > > are on the same page.

agayev

PTAL. I've made the changes we talked about. On 2012/12/03 23:26:16, rvargas wrote: > On ...

8 years ago (2012-12-04 20:29:45 UTC) #9

PTAL.  I've made the changes we talked about.

On 2012/12/03 23:26:16, rvargas wrote:
> On 2012/12/03 22:53:58, agayev wrote:
> > On 2012/12/03 21:13:31, rvargas wrote:
> > > My point is that (2) is not really implemented (or I am missing that).
There
> > is
> > > partial support to it by the fact that this CacheEntry supports calling
> > > WriteData to stream x at offset (y != 0) without having to write or read
the
> > > first y bytes of that stream, but that is far from being able to update an
> > entry
> > > (I think).
> > 
> > It does support updating at offset 0 as well, but I suppose by update you
mean
> > writing some metadata as well?
> 
> Yes. I understand that entry lifetime is not the job of any of the code we
have
> so far, but update also means changing the location of the entry from where it
> was to the new one. As I said, maybe I'm just missing that, in which case I'm
> happy that there's nothing else to do :)
> 
> >  
> > > I don't mind supporting update directly in this class, but I don't like
the
> > idea
> > > of reading though a buffer here. Even if we want to allow this object to
be
> > > initialized with a given offset and then receive subsequent writes, I
think
> we
> > > should operate under the assumption that reads from the lower layer are
> cheap
> > > enough (and most likely buffered) so that it is better just to read again
> > > if/when needed.
> > 
> > We do not keep entry sizes at the LogStructuredStore layer, which would lead
> us
> > to bringing complete segments into memory.  Depending on how random browsing
> > habits are, this may bring a lot of segments into memory.  Do you have some
> > rough idea as to how we are going to implement read buffering at the lower
> > layer?
> 
> Why bringing complete segments into memory? If I read x bytes from offset y,
> what prevents us from keeping that buffer at the lower layer? We need a buffer
> for writes down there in any case. And if what I read is not in the buffer,
> reading from disk should be fast, right?. Why does this scenario changes
> depending on the read patterns (number of open entries)? For a given set of
open
> entries, we have to read the data from the segment, so we have to create that
> segment object.
> 
> >  
> > > I also don't mind punting update to the caller of this class. Or even
> punting
> > > update to be dealt with later. In fact, I would prefer having a whole
system
> > > able to read and write complete entries from disk (with substantial
> metadata)
> > > and lacking the support for updates, to a system that implements updates,
> but
> > > not yet the basic functionality of read and write.
> > 
> > Repeating the above question, what do you consider basic functionality of
read
> > and write?  I thought it was just reading and writing streams of the
> CacheEntry
> > object?!
> 
> Sorry for not being clear. I mean having a system that can be used for some
> testing... that implements the upper interface of the disk cache, and it is
able
> to read cached data from a previous _session_.
> 
> Again, if what we have so far is all that is needed for updates then so be it.
> If it is not, I think updates has a lower priority than other stuff that we
> need.
> 
> If this CL handles updates as is (implying that this layer has to read data
from
> the store only to be able to write it to another place) then fine, lets read.
> But I still see a read buffer here (as is) as the wrong thing to do given that
> it is an extra copy for more than 90% of the requests. If we need to keep a
read
> buffer for updates, first let's make sure that we are handling an update, and
> then read.
> 
> > 
> > > 
> > > On 2012/12/03 17:33:07, agayev wrote:
> > > > Instead of replying to the comments I reply here in one coherent piece.
> > > > 
> > > > What I've implemented here is an entry that supports all use cases: (1)
> > create
> > > a
> > > > fresh entry, write and/or read to/from it (2) open an existing entry
read
> > > and/or
> > > > write from/to it.
> > > > 
> > > > For case (1) everything happens via in-memory buffers, i.e. we only
touch
> > the
> > > > storage when an entry is closed/destructed.
> > > > 
> > > > For case (2) we attempt to do minimal amount of work via lazy reading,
we
> > only
> > > > bring into memory stream N, when it is needed, which is needed when we
> > update
> > > it
> > > > (i.e. overwrite or append), or when we read specific stream.
> > > > 
> > > > I think what you are proposing is to separate update from the rest.  In
> this
> > > > case, we will have two modes of operation of an entry: (a) a fresh entry
> > that
> > > > implements write buffering and everything happens in memory until entry
is
> > > > closed/destructed (b) a read-only existing entry that doesn't do any
> > > > read-buffering and redirects everything to the lower layer, and
naturally,
> > it
> > > > does not support writes.  And update is done at the upper layer by
> > > instantiating
> > > > two entries, one is fresh and the other one read-only existing one, and
we
> > > > copy/append from the existing one to the fresh one.
> > > > 
> > > > Am I understanding this correctly?  I do not have any objection to it
and
> it
> > > is
> > > > probably easier to implement and reason about, just want to make sure
that
> > we
> > > > are on the same page.

rvargas (doing something else)

mostly nits https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc File net/disk_cache/flash/cache_entry.cc (right): https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc#newcode112 net/disk_cache/flash/cache_entry.cc:112: DCHECK(offset >= 0 && buf_len >= 0); ...

8 years ago (2012-12-05 00:21:01 UTC) #10

agayev

https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc File net/disk_cache/flash/cache_entry.cc (right): https://codereview.chromium.org/11316178/diff/4006/net/disk_cache/flash/cache_entry.cc#newcode112 net/disk_cache/flash/cache_entry.cc:112: DCHECK(offset >= 0 && buf_len >= 0); On 2012/12/05 ...

8 years ago (2012-12-05 16:53:19 UTC) #11

rvargas (doing something else)

lgtm https://codereview.chromium.org/11316178/diff/8017/net/disk_cache/flash/cache_entry.cc File net/disk_cache/flash/cache_entry.cc (right): https://codereview.chromium.org/11316178/diff/8017/net/disk_cache/flash/cache_entry.cc#newcode105 net/disk_cache/flash/cache_entry.cc:105: DCHECK(offset == stream.size); On 2012/12/05 16:53:19, agayev wrote: ...

8 years ago (2012-12-05 18:39:37 UTC) #12

agayev

https://codereview.chromium.org/11316178/diff/8017/net/disk_cache/flash/cache_entry.cc File net/disk_cache/flash/cache_entry.cc (right): https://codereview.chromium.org/11316178/diff/8017/net/disk_cache/flash/cache_entry.cc#newcode105 net/disk_cache/flash/cache_entry.cc:105: DCHECK(offset == stream.size); On 2012/12/05 18:39:37, rvargas wrote: > ...

8 years ago (2012-12-05 20:11:11 UTC) #13

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/agayev@chromium.org/11316178/21002

8 years ago (2012-12-06 03:06:36 UTC) #14

Message was sent while issue was closed.

Change committed as 171444

Issue 11316178: Added CacheEntry implementation. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Fixed error message. #

Patch Set 3 : Removed redundant function declarations. #

Patch Set 4 : Moved insync initialization to constructor. #

Patch Set 5 : Fixed nits. #

Patch Set 6 : #

Patch Set 7 : got rid of unnecessary declarations. #

Patch Set 8 : Misc. #

Patch Set 9 : Fixed nits. #

Patch Set 10 : Added a TODO. #

Patch Set 11 : Added destructor for Stream struct. #

Messages