Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(532)

Unified Diff: storage/browser/blob/README.md

Issue 2637023003: [BlobStorage] Adding explainer for blob storage system. (Closed)
Patch Set: added more information Created 3 years, 11 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: storage/browser/blob/README.md
diff --git a/storage/browser/blob/README.md b/storage/browser/blob/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3fb706e66c0f82f9fd4f4a9b8f76c08a4eddcce6
--- /dev/null
+++ b/storage/browser/blob/README.md
@@ -0,0 +1,283 @@
+# Chrome's Blob Storage System Design
+
+Elaboration of the blob storage system in Chrome.
+
+# What are blobs?
+
+Please see the [FileAPI Spec](https://www.w3.org/TR/FileAPI/) for the full
+specification for Blobs, or [Mozilla's Blob documentation](
+https://developer.mozilla.org/en-US/docs/Web/API/Blob) for a description of how
+Blobs are used in the Web Platform in general. For the purposes of this
+document, the important aspects of blobs are:
+
+1. Blobs are immutable.
+2. Blob can be made using one or more of: bytes, files, or other blobs.
+3. Blobs can be 'sliced', which creates a blob that is a subsection of another
pwnall 2017/01/20 02:10:54 How about having the word "sliced" link to https:/
dmurph 2017/01/20 20:23:16 Done.
+blob.
+4. Reading blobs is asynchronous.
pwnall 2017/01/20 02:10:55 Is it worth noting that obtaining blob metadata (e
dmurph 2017/01/20 20:23:15 Done.
+5. Blobs can be passed to other browsing contexts, such as Javascript workers
+or other tabs.
+
+In Chrome, after blob creation the actual blob 'data' gets transported to and
+lives in the browser process. The renderer just holds a reference -
+specifically a string UUID - to the blob, which it can use to read the blob or
+pass it to other processes.
+
+# Summary & Terminology
+
+Blobs are created in the renderer process, where their data is temporarily held
pwnall 2017/01/20 02:10:55 in _a_ renderer process?
dmurph 2017/01/20 20:23:16 Done.
+for the browser (while Javascript execution can continue). When the browser has
+enough memory quota for the blob, it requests the data from the renderer. Once
+all data is transported and construction is complete, any pending reads for the
pwnall 2017/01/20 02:10:54 I'd emphasize the word "transported" in some way,
dmurph 2017/01/20 20:23:16 Done.
+blob are allowed to complete. Blobs can be small (bytes) or huge (GBs), so
pwnall 2017/01/20 02:10:54 "small (bytes)" does not seem to add value here
dmurph 2017/01/20 20:23:15 Done.
+quota is necessary.
+
+If the in-memory space for blobs is getting full, or a new blob is too large to
+be in-memory, then the blob system uses the disk. This can either be paging old
+blobs to disk, or saving the new too-large blob straight to disk.
+
+Blob reading goes through the network layer, where the renderer dispatches a
+network request for the blob and the browser responds with the
+`BlobURLRequestJob`.
+
+General Chrome terminology:
pwnall 2017/01/20 02:10:55 I'd like to https://www.chromium.org/developers/de
dmurph 2017/01/20 20:23:16 Done.
+
+* **Renderer (Process)**: Process where the web contents and javascript lives.
+This is basically a tab. There are multiple renderers, and they all have
+security restrictions.
+* **Browser (Process)**: There is only one browser process, and it doesn't have
+security restrictions.
+* **Shared Memory**: Memory that both the browser and renderer process can read
+& write. Created only between 2 processes.
+* **IPC**: A message sent between processes. To avoid crashes and memory issues
+the blob system tries to limit the maximum size of an ipc message.
+
+Blob system terminology:
+
+* **Blob**: This is a blob object, which can consist of bytes or files, as
+described above.
+* **BlobItem** or **[DataElement](
+https://cs.chromium.org/chromium/src/storage/common/data_element.h)**:
+This is a primitive element that can basically be a File, Bytes, or another
+Blob. It also stores an offset and size, so this can be a part of a file. (This
+can also represent "future" file and "future" bytes, which is used to signify a
pwnall 2017/01/20 02:10:54 "future" files?
dmurph 2017/01/20 20:23:17 Done.
+bytes or file item that has not been transported yet).
+* **dependent blobs**: These are blobs that our blob depends on to be
pwnall 2017/01/20 02:10:54 "blobs that a blob has data dependencies on"? The
dmurph 2017/01/20 20:23:17 mmmmmmm I think I saw it as I'm 'dependent' on the
pwnall 2017/01/21 02:56:22 Precisely -- this blob is dependent on the other b
+constructed. As in, we were constructed with a dependency on another blob
pwnall 2017/01/20 02:10:55 I'm not a big fan of "we" and "our" usage here. I'
dmurph 2017/01/20 20:23:15 Done.
+(maybe we're a slice or just a blob was in our constructor), and we might need
+to wait for these to complete constuction before we can declare ourselves
+constructed as well.
+* **transportation strategy**: We can have one of 3 transportation strategies
pwnall 2017/01/20 02:10:55 : a method for sending the data in a BlobItem from
dmurph 2017/01/20 20:23:16 Done.
+for Blobs: send data over IPC, Shared Memory, or Files.
+
+# How to use Blobs (Browser-side)
+
+### Building
pwnall 2017/01/20 02:10:54 ## instead of ###?
dmurph 2017/01/20 20:23:16 Done.
+All blob interaction should go through the `BlobStorageContext`. Blobs are
+built using a `BlobDataBuilder`, and as long as you don't use any
pwnall 2017/01/20 02:10:55 any chance you could move the caveat after the mai
dmurph 2017/01/20 20:23:16 Done.
+`BlobDataBuilder::AppendFuture*` methods then calling
+`BlobStorageContext::AddFinishedBlob` or `::BuildBlob` is all you need to do to
+create a `BlobDataHandle` that is eventually readable.
+
+If you have known data that is not available yet, you can use the
+`AppendFuture*` methods no the builder, but you must use
pwnall 2017/01/20 02:10:55 no -> on?
dmurph 2017/01/20 20:23:16 Done.
+`BlobStorageContext::BuildBlob`, and provide a callback that will notify you
+when the blob system has enough quota to store the data. At that point you can
+use the appropriate `BlobDataBuilder::Populate*` methods, and notify the
+context by calling `BlobStorageContext::NotifyTransportComplete` when done.
+
pwnall 2017/01/20 02:10:55 In general, this sections seems to assume that I'v
dmurph 2017/01/20 20:23:15 Done.
+## Accessing / Reading
+
+All blob information should come from the `BlobDataHandle` returned on
+construction. This handle is cheap to copy. Once all instances of handles for
+a blob are destructed, the blob is destroyed.
+
+`BlobDataHandle::RunOnConstructionComplete` will notify you when the blob is
+done or broken (due to not enough space, filesystem error, etc).
pwnall 2017/01/20 02:10:54 done -> constructed? broken (construction failed
dmurph 2017/01/20 20:23:17 Done.
+
+The `BlobReader` class is for reading blobs, and is accessible off of the
+`BlobDataHandle` at any time.
+
+# Blob Creation & Transportation (Renderer)
+
+**This process is outlined with diagrams and illustrations [here](
+https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75c319281_0_681).**
+
+This outlines the renderer-side responsabilities of the blob system. The
+renderer needs to:
+
+ 1. Consolidate small bytes items into larger chunks (avoiding a huge array of
+ 1 byte items).
+ 2. Communicate the blob componsition to the browser immediately on
+ construction.
+ 3. Populate shared memory or files sent from the browser with the consolidated
+ blob data items.
+ 4. Hold the blob data until the browser is finished requesting it.
+
+The meat of blob construction starts in the [WebBlobRegistryImpl](
+https://cs.chromium.org/chromium/src/content/child/blob_storage/webblobregistry_impl.h)'s
+`createBuilder(uuid, content_type)`.
+
+## Blob Data Consolidation
+
+Since blobs are often constructed with arrays with single bytes, we try to
+consolidate all **adjacent** memory blob items into one. This is done in
+[BlobConsolidation](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_consolidation.h).
+The implementation doesn't actually do any copying or allocating of new memory
+buffers, instead it facilitates the transformation between the 'consolidated'
+blob items and the underlying bytes items. This way we don't waste any memory.
+
+## Blob Transportation, Renderer
pwnall 2017/01/20 02:10:54 I think it'd be more consistent to end the heading
dmurph 2017/01/20 20:23:16 Done.
+
+After the blob has been 'consolidated', it is given to the
+[BlobTransportController](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.h).
+This class:
+
+1. Immediately communicates the contents of the blob to the Browser. We also
+[optimistically send](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=325)
+the blob data if the total memory is less than our IPC threshold.
+2. Stores the blob consolidation for data requests from the browser.
+3. Answers requests from the browser to populate or send the blob data. The
+browser can request the renderer:
+ 1. Send items and populate the data in IPC ([code](
pwnall 2017/01/20 02:10:54 I think line-level links like this one are quite b
dmurph 2017/01/20 20:23:16 Done.
+https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=238)).
+ 2. Populate items in shared memory and notify the browser when population is
+complete ([code](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=249)).
+ 3. Populate items in files and notify the browser when population is complete
+([code](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=292)).
+4. Destroys the blob consolidation when the browser says it's done.
+
+The transport controller also tries to keep the renderer alive while we are
+sending blobs, as if the renderer is closed then we would lose any pending blob
+data. It does this by using the [incrementing and decrementing the process ref
pwnall 2017/01/20 02:10:54 remove "using the"? also, ref -> reference?
dmurph 2017/01/20 20:23:16 Done.
+count](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=62),
+which should prevent fast shutdown.
+
+# Blob Transportation & Storage (Browser).
pwnall 2017/01/20 02:10:54 the period looks inappropriate here
dmurph 2017/01/20 20:23:16 Done.
+
+The browser side is a little more complicated. We are thinking about:
+
+1. Do we have enough space for this blob?
+2. If so, how do we want to transport it? IPC? Shared Memory? IPC?
pwnall 2017/01/20 02:10:54 Do you mean "File" instead of the last "IPC"? Alt
dmurph 2017/01/20 20:23:16 Done.
+3. Can I save this in memory right now? Or do I need to wait for older blob
pwnall 2017/01/20 02:10:55 Does this mean "Is there enough free memory to tra
dmurph 2017/01/20 20:23:16 Done.
+data to be paged to disk?
+4. Do I need to wait for files to be created?
+5. Do I need to wait for dependent blobs?
+
+## Summary
+
+We follow this general flow for constructing a blob on the browser side:
+
+1. Does the blob fit, and what transportation strategy should be used.
+2. Create our browser-side representation of the blob data, including any data
pwnall 2017/01/20 02:10:54 our -> the
dmurph 2017/01/20 20:23:15 Done.
+items from dependent blobs. We try to share data items as much as possible, and
pwnall 2017/01/20 02:10:54 Does the 2nd sentence here mean that data items ar
dmurph 2017/01/20 20:23:16 Done.
+allow for the dependent blob items to be not populated yet.
+3. Request memory and/or file quota from the BlobMemoryController, which
+manages our blob storage limits. Quota can be requested for both transportation
pwnall 2017/01/20 02:10:54 can be requested -> is necessary?
dmurph 2017/01/20 20:23:16 Done.
+and any copies we have to do from dependent blobs.
+4. If transporation quota is needed and when it is granted:
+ 1. Tell the BlobTransportHost to start asking for blob data given the earlier
+ decision of strategy.
+ * The BlobTransportHost populates the browser-side blob data item.
+ 2. When transportation is done we notify the BlobStorageContext
+5. When transportation is done, copy quota is granted, and dependent blobs are
+complete, we finish the blob.
+ 1. We perform any pending copies from dependent blobs
+ 2. We notify any listeners that the blob has been completed.
+
+Note: The transportation sections (steps 1, 2, 3) of this process are described
+(without thinking about blob dependencies) with diagrams and details in [this
pwnall 2017/01/20 02:10:54 thinking about -> accounting for
dmurph 2017/01/20 20:23:17 Done.
+presentation](https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75d5729ce_0_105).
+
+## BlobTransportHost
+
+The `BlobTransportHost` is in charge of the actual transportation of the data
+from the renderer to the browser. When the initial description of the blob is
pwnall 2017/01/20 02:10:55 I like "description of the blob" / "initial descri
dmurph 2017/01/20 20:23:16 Done.
+sent to the browser, the BlobTransportHost asks the BlobMemoryController which
+'strategy' (IPC, Shared Memory, or File) it should use to transport the file.
pwnall 2017/01/20 02:10:54 I don't think you need quotes here, you introduced
dmurph 2017/01/20 20:23:16 Done.
+Based on this strategy it can transform the memory items sent from the renderer
pwnall 2017/01/20 02:10:54 transform -> translate?
dmurph 2017/01/20 20:23:17 Done.
+into a browser represetation to facilitate the transportation. See [this](
+https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75d5729ce_0_145)
+slide, which illustrates how the browser might segment or split up the
+renderer's memory into transportable chunks.
+
+Once the transport host decides it's strategy, it will create it's own
pwnall 2017/01/20 02:10:55 it's -> its (twice)
dmurph 2017/01/20 20:23:15 Done.
+transport state for the blob, including a `BlobDataBuilder` using the
+transport's data segment representation. Then it will tell the
+`BlobStorageContext` that it is ready to build the blob.
+
+When the `BlobStorageContext` tells the transport host that it is ready to
+transport the blob data, this class's responsability is to populate the
pwnall 2017/01/20 02:10:54 class' ? Or, better yet, "the transport host popu
dmurph 2017/01/20 20:23:15 Done.
+`BlobDataBuilder` with all the data from the renderer, then signal the storage
+context that it is done.
+
+## BlobStorageContext
+
+The `BlobStorageContext` is the hub of the blob storage system. It is
+responsible for creating & managing all the state of constructing blobs, as
+well as all blob handle generation and general blob status access.
+
+When a `BlobDataBuilder` is given to the context, whether from the
+`BlobTransportHost` or from elsewhere, the context will do the following:
+
+1. Find all dependent blobs in the new blob (any blob reference in the blob
+item list), and create a 'slice' of their items for the new blob.
+2. Create the final blob item list representation, which creates a new blob
+item list which inserts these 'slice' items into the blob reference spots. This
+is 'flattening' the blob.
+3. Ask the `BlobMemoryManager` for file or memory quota for the transportation
+if necessary
+ * When this is approved, it notifies the `BlobTransportHost` that it can
pwnall 2017/01/20 02:10:55 it notifies -> notify (for consistency with the ot
dmurph 2017/01/20 20:23:16 Done.
+ begin transporting the data.
+4. Ask the `BlobMemoryManager` for memory quota for any copies necessary from
pwnall 2017/01/20 02:10:55 necessary for blob slicing?
dmurph 2017/01/20 20:23:17 Done.
+the blob slicing.
+5. Adds completion callbacks to any dependent blobs that our blob depends on.
pwnall 2017/01/20 02:10:54 the word "dependent" here seems redundant
dmurph 2017/01/20 20:23:16 Done.
+
+When all of the following conditions are met:
+
+1. The `BlobTransportHost` tells us it has transported all the data (or we
+don't need to transport data),
+2. The `BlobMemoryManager` approves our memory quota for slice copies (or we
+don't need slice copies), and
+3. All dependent blobs are completed (or we don't have dependent blobs),
+
+The blob can finish constructing, where any pending blob slice copies are
+performed, and we set the status of the blob.
+
+### BlobStatus lifecycle
+
+The BlobStatus outlines this procedure (specifically the transport process),
pwnall 2017/01/20 02:10:54 As a reader, I am unsure what "this procedure" ref
dmurph 2017/01/20 20:23:15 Done.
+and the copy memory quota and dependent blob process is encompassed in
+`PENDING_INTERNALS`.
+
+Once a blob is finished constructing, the status is set to `DONE`, or any of
pwnall 2017/01/20 02:10:54 I think you can say "to `DONE`, or to one of the `
dmurph 2017/01/20 20:23:16 Done.
+the `ERR_*` values if there was an error.
+
+### BlobSlice
+
+During construction, 'slices' are created for dependent blobs using the given
pwnall 2017/01/20 02:10:54 I don't think slices needs quotes here. It's a con
dmurph 2017/01/20 20:23:16 Done.
+offset and size of the reference. This slice consists of the relevant blob
+items, and metadata about possible copies from either end. If blob items can
+entirely be used by the new blob, then we just share the item between the. But
+if there is a 'slice' of the first or last item, then our resulting BlobSlice
+representation will create a new bytes item for the new blob, and store the
+necessary copy data for later.
+
+### BlobFlattener
+
+The `BlobFlattener` takes the new blob description (including blob references),
+creates blob slices for all the referenced blobs, and constructs a 'flat'
+representation of the new blob, where all blob references are replaced with the
pwnall 2017/01/20 02:10:54 remove "the"?
dmurph 2017/01/20 20:23:15 Done.
+'BlobSlice' items. It also stores any copy data from the slices.
pwnall 2017/01/20 02:10:55 I think you want backticks instead of single quote
dmurph 2017/01/20 20:23:16 Done.
+
+## BlobMemoryController
+
+The `BlobMemoryController` is responsable for:
+
+1. Determining storage quota limits for files and memory, including restricting
+file quota when disk space is low.
+2. Determining whether a blob can fit and the transportation strategy to use.
+3. Allocating memory quota.
pwnall 2017/01/20 02:10:55 It seems to me that "tracking" is a slightly bette
dmurph 2017/01/20 20:23:16 Done.
+4. Allocating file quota and creating files.
+5. Accumulating and evicting old blob data to files to disk.
+
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698