Chromium Code Reviews| Index: storage/browser/blob/README.md |
| diff --git a/storage/browser/blob/README.md b/storage/browser/blob/README.md |
| new file mode 100644 |
| index 0000000000000000000000000000000000000000..3fb706e66c0f82f9fd4f4a9b8f76c08a4eddcce6 |
| --- /dev/null |
| +++ b/storage/browser/blob/README.md |
| @@ -0,0 +1,283 @@ |
| +# Chrome's Blob Storage System Design |
| + |
| +Elaboration of the blob storage system in Chrome. |
| + |
| +# What are blobs? |
| + |
| +Please see the [FileAPI Spec](https://www.w3.org/TR/FileAPI/) for the full |
| +specification for Blobs, or [Mozilla's Blob documentation]( |
| +https://developer.mozilla.org/en-US/docs/Web/API/Blob) for a description of how |
| +Blobs are used in the Web Platform in general. For the purposes of this |
| +document, the important aspects of blobs are: |
| + |
| +1. Blobs are immutable. |
| +2. Blob can be made using one or more of: bytes, files, or other blobs. |
| +3. Blobs can be 'sliced', which creates a blob that is a subsection of another |
|
pwnall
2017/01/20 02:10:54
How about having the word "sliced" link to https:/
dmurph
2017/01/20 20:23:16
Done.
|
| +blob. |
| +4. Reading blobs is asynchronous. |
|
pwnall
2017/01/20 02:10:55
Is it worth noting that obtaining blob metadata (e
dmurph
2017/01/20 20:23:15
Done.
|
| +5. Blobs can be passed to other browsing contexts, such as Javascript workers |
| +or other tabs. |
| + |
| +In Chrome, after blob creation the actual blob 'data' gets transported to and |
| +lives in the browser process. The renderer just holds a reference - |
| +specifically a string UUID - to the blob, which it can use to read the blob or |
| +pass it to other processes. |
| + |
| +# Summary & Terminology |
| + |
| +Blobs are created in the renderer process, where their data is temporarily held |
|
pwnall
2017/01/20 02:10:55
in _a_ renderer process?
dmurph
2017/01/20 20:23:16
Done.
|
| +for the browser (while Javascript execution can continue). When the browser has |
| +enough memory quota for the blob, it requests the data from the renderer. Once |
| +all data is transported and construction is complete, any pending reads for the |
|
pwnall
2017/01/20 02:10:54
I'd emphasize the word "transported" in some way,
dmurph
2017/01/20 20:23:16
Done.
|
| +blob are allowed to complete. Blobs can be small (bytes) or huge (GBs), so |
|
pwnall
2017/01/20 02:10:54
"small (bytes)" does not seem to add value here
dmurph
2017/01/20 20:23:15
Done.
|
| +quota is necessary. |
| + |
| +If the in-memory space for blobs is getting full, or a new blob is too large to |
| +be in-memory, then the blob system uses the disk. This can either be paging old |
| +blobs to disk, or saving the new too-large blob straight to disk. |
| + |
| +Blob reading goes through the network layer, where the renderer dispatches a |
| +network request for the blob and the browser responds with the |
| +`BlobURLRequestJob`. |
| + |
| +General Chrome terminology: |
|
pwnall
2017/01/20 02:10:55
I'd like to https://www.chromium.org/developers/de
dmurph
2017/01/20 20:23:16
Done.
|
| + |
| +* **Renderer (Process)**: Process where the web contents and javascript lives. |
| +This is basically a tab. There are multiple renderers, and they all have |
| +security restrictions. |
| +* **Browser (Process)**: There is only one browser process, and it doesn't have |
| +security restrictions. |
| +* **Shared Memory**: Memory that both the browser and renderer process can read |
| +& write. Created only between 2 processes. |
| +* **IPC**: A message sent between processes. To avoid crashes and memory issues |
| +the blob system tries to limit the maximum size of an ipc message. |
| + |
| +Blob system terminology: |
| + |
| +* **Blob**: This is a blob object, which can consist of bytes or files, as |
| +described above. |
| +* **BlobItem** or **[DataElement]( |
| +https://cs.chromium.org/chromium/src/storage/common/data_element.h)**: |
| +This is a primitive element that can basically be a File, Bytes, or another |
| +Blob. It also stores an offset and size, so this can be a part of a file. (This |
| +can also represent "future" file and "future" bytes, which is used to signify a |
|
pwnall
2017/01/20 02:10:54
"future" files?
dmurph
2017/01/20 20:23:17
Done.
|
| +bytes or file item that has not been transported yet). |
| +* **dependent blobs**: These are blobs that our blob depends on to be |
|
pwnall
2017/01/20 02:10:54
"blobs that a blob has data dependencies on"?
The
dmurph
2017/01/20 20:23:17
mmmmmmm I think I saw it as I'm 'dependent' on the
pwnall
2017/01/21 02:56:22
Precisely -- this blob is dependent on the other b
|
| +constructed. As in, we were constructed with a dependency on another blob |
|
pwnall
2017/01/20 02:10:55
I'm not a big fan of "we" and "our" usage here. I'
dmurph
2017/01/20 20:23:15
Done.
|
| +(maybe we're a slice or just a blob was in our constructor), and we might need |
| +to wait for these to complete constuction before we can declare ourselves |
| +constructed as well. |
| +* **transportation strategy**: We can have one of 3 transportation strategies |
|
pwnall
2017/01/20 02:10:55
: a method for sending the data in a BlobItem from
dmurph
2017/01/20 20:23:16
Done.
|
| +for Blobs: send data over IPC, Shared Memory, or Files. |
| + |
| +# How to use Blobs (Browser-side) |
| + |
| +### Building |
|
pwnall
2017/01/20 02:10:54
## instead of ###?
dmurph
2017/01/20 20:23:16
Done.
|
| +All blob interaction should go through the `BlobStorageContext`. Blobs are |
| +built using a `BlobDataBuilder`, and as long as you don't use any |
|
pwnall
2017/01/20 02:10:55
any chance you could move the caveat after the mai
dmurph
2017/01/20 20:23:16
Done.
|
| +`BlobDataBuilder::AppendFuture*` methods then calling |
| +`BlobStorageContext::AddFinishedBlob` or `::BuildBlob` is all you need to do to |
| +create a `BlobDataHandle` that is eventually readable. |
| + |
| +If you have known data that is not available yet, you can use the |
| +`AppendFuture*` methods no the builder, but you must use |
|
pwnall
2017/01/20 02:10:55
no -> on?
dmurph
2017/01/20 20:23:16
Done.
|
| +`BlobStorageContext::BuildBlob`, and provide a callback that will notify you |
| +when the blob system has enough quota to store the data. At that point you can |
| +use the appropriate `BlobDataBuilder::Populate*` methods, and notify the |
| +context by calling `BlobStorageContext::NotifyTransportComplete` when done. |
| + |
|
pwnall
2017/01/20 02:10:55
In general, this sections seems to assume that I'v
dmurph
2017/01/20 20:23:15
Done.
|
| +## Accessing / Reading |
| + |
| +All blob information should come from the `BlobDataHandle` returned on |
| +construction. This handle is cheap to copy. Once all instances of handles for |
| +a blob are destructed, the blob is destroyed. |
| + |
| +`BlobDataHandle::RunOnConstructionComplete` will notify you when the blob is |
| +done or broken (due to not enough space, filesystem error, etc). |
|
pwnall
2017/01/20 02:10:54
done -> constructed?
broken (construction failed
dmurph
2017/01/20 20:23:17
Done.
|
| + |
| +The `BlobReader` class is for reading blobs, and is accessible off of the |
| +`BlobDataHandle` at any time. |
| + |
| +# Blob Creation & Transportation (Renderer) |
| + |
| +**This process is outlined with diagrams and illustrations [here]( |
| +https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75c319281_0_681).** |
| + |
| +This outlines the renderer-side responsabilities of the blob system. The |
| +renderer needs to: |
| + |
| + 1. Consolidate small bytes items into larger chunks (avoiding a huge array of |
| + 1 byte items). |
| + 2. Communicate the blob componsition to the browser immediately on |
| + construction. |
| + 3. Populate shared memory or files sent from the browser with the consolidated |
| + blob data items. |
| + 4. Hold the blob data until the browser is finished requesting it. |
| + |
| +The meat of blob construction starts in the [WebBlobRegistryImpl]( |
| +https://cs.chromium.org/chromium/src/content/child/blob_storage/webblobregistry_impl.h)'s |
| +`createBuilder(uuid, content_type)`. |
| + |
| +## Blob Data Consolidation |
| + |
| +Since blobs are often constructed with arrays with single bytes, we try to |
| +consolidate all **adjacent** memory blob items into one. This is done in |
| +[BlobConsolidation](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_consolidation.h). |
| +The implementation doesn't actually do any copying or allocating of new memory |
| +buffers, instead it facilitates the transformation between the 'consolidated' |
| +blob items and the underlying bytes items. This way we don't waste any memory. |
| + |
| +## Blob Transportation, Renderer |
|
pwnall
2017/01/20 02:10:54
I think it'd be more consistent to end the heading
dmurph
2017/01/20 20:23:16
Done.
|
| + |
| +After the blob has been 'consolidated', it is given to the |
| +[BlobTransportController](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.h). |
| +This class: |
| + |
| +1. Immediately communicates the contents of the blob to the Browser. We also |
| +[optimistically send](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=325) |
| +the blob data if the total memory is less than our IPC threshold. |
| +2. Stores the blob consolidation for data requests from the browser. |
| +3. Answers requests from the browser to populate or send the blob data. The |
| +browser can request the renderer: |
| + 1. Send items and populate the data in IPC ([code]( |
|
pwnall
2017/01/20 02:10:54
I think line-level links like this one are quite b
dmurph
2017/01/20 20:23:16
Done.
|
| +https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=238)). |
| + 2. Populate items in shared memory and notify the browser when population is |
| +complete ([code](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=249)). |
| + 3. Populate items in files and notify the browser when population is complete |
| +([code](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=292)). |
| +4. Destroys the blob consolidation when the browser says it's done. |
| + |
| +The transport controller also tries to keep the renderer alive while we are |
| +sending blobs, as if the renderer is closed then we would lose any pending blob |
| +data. It does this by using the [incrementing and decrementing the process ref |
|
pwnall
2017/01/20 02:10:54
remove "using the"?
also, ref -> reference?
dmurph
2017/01/20 20:23:16
Done.
|
| +count](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=62), |
| +which should prevent fast shutdown. |
| + |
| +# Blob Transportation & Storage (Browser). |
|
pwnall
2017/01/20 02:10:54
the period looks inappropriate here
dmurph
2017/01/20 20:23:16
Done.
|
| + |
| +The browser side is a little more complicated. We are thinking about: |
| + |
| +1. Do we have enough space for this blob? |
| +2. If so, how do we want to transport it? IPC? Shared Memory? IPC? |
|
pwnall
2017/01/20 02:10:54
Do you mean "File" instead of the last "IPC"?
Alt
dmurph
2017/01/20 20:23:16
Done.
|
| +3. Can I save this in memory right now? Or do I need to wait for older blob |
|
pwnall
2017/01/20 02:10:55
Does this mean "Is there enough free memory to tra
dmurph
2017/01/20 20:23:16
Done.
|
| +data to be paged to disk? |
| +4. Do I need to wait for files to be created? |
| +5. Do I need to wait for dependent blobs? |
| + |
| +## Summary |
| + |
| +We follow this general flow for constructing a blob on the browser side: |
| + |
| +1. Does the blob fit, and what transportation strategy should be used. |
| +2. Create our browser-side representation of the blob data, including any data |
|
pwnall
2017/01/20 02:10:54
our -> the
dmurph
2017/01/20 20:23:15
Done.
|
| +items from dependent blobs. We try to share data items as much as possible, and |
|
pwnall
2017/01/20 02:10:54
Does the 2nd sentence here mean that data items ar
dmurph
2017/01/20 20:23:16
Done.
|
| +allow for the dependent blob items to be not populated yet. |
| +3. Request memory and/or file quota from the BlobMemoryController, which |
| +manages our blob storage limits. Quota can be requested for both transportation |
|
pwnall
2017/01/20 02:10:54
can be requested -> is necessary?
dmurph
2017/01/20 20:23:16
Done.
|
| +and any copies we have to do from dependent blobs. |
| +4. If transporation quota is needed and when it is granted: |
| + 1. Tell the BlobTransportHost to start asking for blob data given the earlier |
| + decision of strategy. |
| + * The BlobTransportHost populates the browser-side blob data item. |
| + 2. When transportation is done we notify the BlobStorageContext |
| +5. When transportation is done, copy quota is granted, and dependent blobs are |
| +complete, we finish the blob. |
| + 1. We perform any pending copies from dependent blobs |
| + 2. We notify any listeners that the blob has been completed. |
| + |
| +Note: The transportation sections (steps 1, 2, 3) of this process are described |
| +(without thinking about blob dependencies) with diagrams and details in [this |
|
pwnall
2017/01/20 02:10:54
thinking about -> accounting for
dmurph
2017/01/20 20:23:17
Done.
|
| +presentation](https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75d5729ce_0_105). |
| + |
| +## BlobTransportHost |
| + |
| +The `BlobTransportHost` is in charge of the actual transportation of the data |
| +from the renderer to the browser. When the initial description of the blob is |
|
pwnall
2017/01/20 02:10:55
I like "description of the blob" / "initial descri
dmurph
2017/01/20 20:23:16
Done.
|
| +sent to the browser, the BlobTransportHost asks the BlobMemoryController which |
| +'strategy' (IPC, Shared Memory, or File) it should use to transport the file. |
|
pwnall
2017/01/20 02:10:54
I don't think you need quotes here, you introduced
dmurph
2017/01/20 20:23:16
Done.
|
| +Based on this strategy it can transform the memory items sent from the renderer |
|
pwnall
2017/01/20 02:10:54
transform -> translate?
dmurph
2017/01/20 20:23:17
Done.
|
| +into a browser represetation to facilitate the transportation. See [this]( |
| +https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75d5729ce_0_145) |
| +slide, which illustrates how the browser might segment or split up the |
| +renderer's memory into transportable chunks. |
| + |
| +Once the transport host decides it's strategy, it will create it's own |
|
pwnall
2017/01/20 02:10:55
it's -> its (twice)
dmurph
2017/01/20 20:23:15
Done.
|
| +transport state for the blob, including a `BlobDataBuilder` using the |
| +transport's data segment representation. Then it will tell the |
| +`BlobStorageContext` that it is ready to build the blob. |
| + |
| +When the `BlobStorageContext` tells the transport host that it is ready to |
| +transport the blob data, this class's responsability is to populate the |
|
pwnall
2017/01/20 02:10:54
class' ?
Or, better yet, "the transport host popu
dmurph
2017/01/20 20:23:15
Done.
|
| +`BlobDataBuilder` with all the data from the renderer, then signal the storage |
| +context that it is done. |
| + |
| +## BlobStorageContext |
| + |
| +The `BlobStorageContext` is the hub of the blob storage system. It is |
| +responsible for creating & managing all the state of constructing blobs, as |
| +well as all blob handle generation and general blob status access. |
| + |
| +When a `BlobDataBuilder` is given to the context, whether from the |
| +`BlobTransportHost` or from elsewhere, the context will do the following: |
| + |
| +1. Find all dependent blobs in the new blob (any blob reference in the blob |
| +item list), and create a 'slice' of their items for the new blob. |
| +2. Create the final blob item list representation, which creates a new blob |
| +item list which inserts these 'slice' items into the blob reference spots. This |
| +is 'flattening' the blob. |
| +3. Ask the `BlobMemoryManager` for file or memory quota for the transportation |
| +if necessary |
| + * When this is approved, it notifies the `BlobTransportHost` that it can |
|
pwnall
2017/01/20 02:10:55
it notifies -> notify (for consistency with the ot
dmurph
2017/01/20 20:23:16
Done.
|
| + begin transporting the data. |
| +4. Ask the `BlobMemoryManager` for memory quota for any copies necessary from |
|
pwnall
2017/01/20 02:10:55
necessary for blob slicing?
dmurph
2017/01/20 20:23:17
Done.
|
| +the blob slicing. |
| +5. Adds completion callbacks to any dependent blobs that our blob depends on. |
|
pwnall
2017/01/20 02:10:54
the word "dependent" here seems redundant
dmurph
2017/01/20 20:23:16
Done.
|
| + |
| +When all of the following conditions are met: |
| + |
| +1. The `BlobTransportHost` tells us it has transported all the data (or we |
| +don't need to transport data), |
| +2. The `BlobMemoryManager` approves our memory quota for slice copies (or we |
| +don't need slice copies), and |
| +3. All dependent blobs are completed (or we don't have dependent blobs), |
| + |
| +The blob can finish constructing, where any pending blob slice copies are |
| +performed, and we set the status of the blob. |
| + |
| +### BlobStatus lifecycle |
| + |
| +The BlobStatus outlines this procedure (specifically the transport process), |
|
pwnall
2017/01/20 02:10:54
As a reader, I am unsure what "this procedure" ref
dmurph
2017/01/20 20:23:15
Done.
|
| +and the copy memory quota and dependent blob process is encompassed in |
| +`PENDING_INTERNALS`. |
| + |
| +Once a blob is finished constructing, the status is set to `DONE`, or any of |
|
pwnall
2017/01/20 02:10:54
I think you can say "to `DONE`, or to one of the `
dmurph
2017/01/20 20:23:16
Done.
|
| +the `ERR_*` values if there was an error. |
| + |
| +### BlobSlice |
| + |
| +During construction, 'slices' are created for dependent blobs using the given |
|
pwnall
2017/01/20 02:10:54
I don't think slices needs quotes here. It's a con
dmurph
2017/01/20 20:23:16
Done.
|
| +offset and size of the reference. This slice consists of the relevant blob |
| +items, and metadata about possible copies from either end. If blob items can |
| +entirely be used by the new blob, then we just share the item between the. But |
| +if there is a 'slice' of the first or last item, then our resulting BlobSlice |
| +representation will create a new bytes item for the new blob, and store the |
| +necessary copy data for later. |
| + |
| +### BlobFlattener |
| + |
| +The `BlobFlattener` takes the new blob description (including blob references), |
| +creates blob slices for all the referenced blobs, and constructs a 'flat' |
| +representation of the new blob, where all blob references are replaced with the |
|
pwnall
2017/01/20 02:10:54
remove "the"?
dmurph
2017/01/20 20:23:15
Done.
|
| +'BlobSlice' items. It also stores any copy data from the slices. |
|
pwnall
2017/01/20 02:10:55
I think you want backticks instead of single quote
dmurph
2017/01/20 20:23:16
Done.
|
| + |
| +## BlobMemoryController |
| + |
| +The `BlobMemoryController` is responsable for: |
| + |
| +1. Determining storage quota limits for files and memory, including restricting |
| +file quota when disk space is low. |
| +2. Determining whether a blob can fit and the transportation strategy to use. |
| +3. Allocating memory quota. |
|
pwnall
2017/01/20 02:10:55
It seems to me that "tracking" is a slightly bette
dmurph
2017/01/20 20:23:16
Done.
|
| +4. Allocating file quota and creating files. |
| +5. Accumulating and evicting old blob data to files to disk. |
| + |