Chromium Code Reviews| OLD | NEW |
|---|---|
| (Empty) | |
| 1 # Chrome's Blob Storage System Design | |
| 2 | |
| 3 Elaboration of the blob storage system in Chrome. | |
| 4 | |
| 5 # What are blobs? | |
| 6 | |
| 7 Please see the [FileAPI Spec](https://www.w3.org/TR/FileAPI/) for the full | |
| 8 specification for Blobs, or [Mozilla's Blob documentation]( | |
| 9 https://developer.mozilla.org/en-US/docs/Web/API/Blob) for a description of how | |
| 10 Blobs are used in the Web Platform in general. For the purposes of this | |
| 11 document, the important aspects of blobs are: | |
| 12 | |
| 13 1. Blobs are immutable. | |
| 14 2. Blob can be made using one or more of: bytes, files, or other blobs. | |
| 15 3. Blobs can be 'sliced', which creates a blob that is a subsection of another | |
|
pwnall
2017/01/20 02:10:54
How about having the word "sliced" link to https:/
dmurph
2017/01/20 20:23:16
Done.
| |
| 16 blob. | |
| 17 4. Reading blobs is asynchronous. | |
|
pwnall
2017/01/20 02:10:55
Is it worth noting that obtaining blob metadata (e
dmurph
2017/01/20 20:23:15
Done.
| |
| 18 5. Blobs can be passed to other browsing contexts, such as Javascript workers | |
| 19 or other tabs. | |
| 20 | |
| 21 In Chrome, after blob creation the actual blob 'data' gets transported to and | |
| 22 lives in the browser process. The renderer just holds a reference - | |
| 23 specifically a string UUID - to the blob, which it can use to read the blob or | |
| 24 pass it to other processes. | |
| 25 | |
| 26 # Summary & Terminology | |
| 27 | |
| 28 Blobs are created in the renderer process, where their data is temporarily held | |
|
pwnall
2017/01/20 02:10:55
in _a_ renderer process?
dmurph
2017/01/20 20:23:16
Done.
| |
| 29 for the browser (while Javascript execution can continue). When the browser has | |
| 30 enough memory quota for the blob, it requests the data from the renderer. Once | |
| 31 all data is transported and construction is complete, any pending reads for the | |
|
pwnall
2017/01/20 02:10:54
I'd emphasize the word "transported" in some way,
dmurph
2017/01/20 20:23:16
Done.
| |
| 32 blob are allowed to complete. Blobs can be small (bytes) or huge (GBs), so | |
|
pwnall
2017/01/20 02:10:54
"small (bytes)" does not seem to add value here
dmurph
2017/01/20 20:23:15
Done.
| |
| 33 quota is necessary. | |
| 34 | |
| 35 If the in-memory space for blobs is getting full, or a new blob is too large to | |
| 36 be in-memory, then the blob system uses the disk. This can either be paging old | |
| 37 blobs to disk, or saving the new too-large blob straight to disk. | |
| 38 | |
| 39 Blob reading goes through the network layer, where the renderer dispatches a | |
| 40 network request for the blob and the browser responds with the | |
| 41 `BlobURLRequestJob`. | |
| 42 | |
| 43 General Chrome terminology: | |
|
pwnall
2017/01/20 02:10:55
I'd like to https://www.chromium.org/developers/de
dmurph
2017/01/20 20:23:16
Done.
| |
| 44 | |
| 45 * **Renderer (Process)**: Process where the web contents and javascript lives. | |
| 46 This is basically a tab. There are multiple renderers, and they all have | |
| 47 security restrictions. | |
| 48 * **Browser (Process)**: There is only one browser process, and it doesn't have | |
| 49 security restrictions. | |
| 50 * **Shared Memory**: Memory that both the browser and renderer process can read | |
| 51 & write. Created only between 2 processes. | |
| 52 * **IPC**: A message sent between processes. To avoid crashes and memory issues | |
| 53 the blob system tries to limit the maximum size of an ipc message. | |
| 54 | |
| 55 Blob system terminology: | |
| 56 | |
| 57 * **Blob**: This is a blob object, which can consist of bytes or files, as | |
| 58 described above. | |
| 59 * **BlobItem** or **[DataElement]( | |
| 60 https://cs.chromium.org/chromium/src/storage/common/data_element.h)**: | |
| 61 This is a primitive element that can basically be a File, Bytes, or another | |
| 62 Blob. It also stores an offset and size, so this can be a part of a file. (This | |
| 63 can also represent "future" file and "future" bytes, which is used to signify a | |
|
pwnall
2017/01/20 02:10:54
"future" files?
dmurph
2017/01/20 20:23:17
Done.
| |
| 64 bytes or file item that has not been transported yet). | |
| 65 * **dependent blobs**: These are blobs that our blob depends on to be | |
|
pwnall
2017/01/20 02:10:54
"blobs that a blob has data dependencies on"?
The
dmurph
2017/01/20 20:23:17
mmmmmmm I think I saw it as I'm 'dependent' on the
pwnall
2017/01/21 02:56:22
Precisely -- this blob is dependent on the other b
| |
| 66 constructed. As in, we were constructed with a dependency on another blob | |
|
pwnall
2017/01/20 02:10:55
I'm not a big fan of "we" and "our" usage here. I'
dmurph
2017/01/20 20:23:15
Done.
| |
| 67 (maybe we're a slice or just a blob was in our constructor), and we might need | |
| 68 to wait for these to complete constuction before we can declare ourselves | |
| 69 constructed as well. | |
| 70 * **transportation strategy**: We can have one of 3 transportation strategies | |
|
pwnall
2017/01/20 02:10:55
: a method for sending the data in a BlobItem from
dmurph
2017/01/20 20:23:16
Done.
| |
| 71 for Blobs: send data over IPC, Shared Memory, or Files. | |
| 72 | |
| 73 # How to use Blobs (Browser-side) | |
| 74 | |
| 75 ### Building | |
|
pwnall
2017/01/20 02:10:54
## instead of ###?
dmurph
2017/01/20 20:23:16
Done.
| |
| 76 All blob interaction should go through the `BlobStorageContext`. Blobs are | |
| 77 built using a `BlobDataBuilder`, and as long as you don't use any | |
|
pwnall
2017/01/20 02:10:55
any chance you could move the caveat after the mai
dmurph
2017/01/20 20:23:16
Done.
| |
| 78 `BlobDataBuilder::AppendFuture*` methods then calling | |
| 79 `BlobStorageContext::AddFinishedBlob` or `::BuildBlob` is all you need to do to | |
| 80 create a `BlobDataHandle` that is eventually readable. | |
| 81 | |
| 82 If you have known data that is not available yet, you can use the | |
| 83 `AppendFuture*` methods no the builder, but you must use | |
|
pwnall
2017/01/20 02:10:55
no -> on?
dmurph
2017/01/20 20:23:16
Done.
| |
| 84 `BlobStorageContext::BuildBlob`, and provide a callback that will notify you | |
| 85 when the blob system has enough quota to store the data. At that point you can | |
| 86 use the appropriate `BlobDataBuilder::Populate*` methods, and notify the | |
| 87 context by calling `BlobStorageContext::NotifyTransportComplete` when done. | |
| 88 | |
|
pwnall
2017/01/20 02:10:55
In general, this sections seems to assume that I'v
dmurph
2017/01/20 20:23:15
Done.
| |
| 89 ## Accessing / Reading | |
| 90 | |
| 91 All blob information should come from the `BlobDataHandle` returned on | |
| 92 construction. This handle is cheap to copy. Once all instances of handles for | |
| 93 a blob are destructed, the blob is destroyed. | |
| 94 | |
| 95 `BlobDataHandle::RunOnConstructionComplete` will notify you when the blob is | |
| 96 done or broken (due to not enough space, filesystem error, etc). | |
|
pwnall
2017/01/20 02:10:54
done -> constructed?
broken (construction failed
dmurph
2017/01/20 20:23:17
Done.
| |
| 97 | |
| 98 The `BlobReader` class is for reading blobs, and is accessible off of the | |
| 99 `BlobDataHandle` at any time. | |
| 100 | |
| 101 # Blob Creation & Transportation (Renderer) | |
| 102 | |
| 103 **This process is outlined with diagrams and illustrations [here]( | |
| 104 https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XD PSM/edit#slide=id.g75c319281_0_681).** | |
| 105 | |
| 106 This outlines the renderer-side responsabilities of the blob system. The | |
| 107 renderer needs to: | |
| 108 | |
| 109 1. Consolidate small bytes items into larger chunks (avoiding a huge array of | |
| 110 1 byte items). | |
| 111 2. Communicate the blob componsition to the browser immediately on | |
| 112 construction. | |
| 113 3. Populate shared memory or files sent from the browser with the consolidated | |
| 114 blob data items. | |
| 115 4. Hold the blob data until the browser is finished requesting it. | |
| 116 | |
| 117 The meat of blob construction starts in the [WebBlobRegistryImpl]( | |
| 118 https://cs.chromium.org/chromium/src/content/child/blob_storage/webblobregistry_ impl.h)'s | |
| 119 `createBuilder(uuid, content_type)`. | |
| 120 | |
| 121 ## Blob Data Consolidation | |
| 122 | |
| 123 Since blobs are often constructed with arrays with single bytes, we try to | |
| 124 consolidate all **adjacent** memory blob items into one. This is done in | |
| 125 [BlobConsolidation](https://cs.chromium.org/chromium/src/content/child/blob_stor age/blob_consolidation.h). | |
| 126 The implementation doesn't actually do any copying or allocating of new memory | |
| 127 buffers, instead it facilitates the transformation between the 'consolidated' | |
| 128 blob items and the underlying bytes items. This way we don't waste any memory. | |
| 129 | |
| 130 ## Blob Transportation, Renderer | |
|
pwnall
2017/01/20 02:10:54
I think it'd be more consistent to end the heading
dmurph
2017/01/20 20:23:16
Done.
| |
| 131 | |
| 132 After the blob has been 'consolidated', it is given to the | |
| 133 [BlobTransportController](https://cs.chromium.org/chromium/src/content/child/blo b_storage/blob_transport_controller.h). | |
| 134 This class: | |
| 135 | |
| 136 1. Immediately communicates the contents of the blob to the Browser. We also | |
| 137 [optimistically send](https://cs.chromium.org/chromium/src/content/child/blob_st orage/blob_transport_controller.cc?l=325) | |
| 138 the blob data if the total memory is less than our IPC threshold. | |
| 139 2. Stores the blob consolidation for data requests from the browser. | |
| 140 3. Answers requests from the browser to populate or send the blob data. The | |
| 141 browser can request the renderer: | |
| 142 1. Send items and populate the data in IPC ([code]( | |
|
pwnall
2017/01/20 02:10:54
I think line-level links like this one are quite b
dmurph
2017/01/20 20:23:16
Done.
| |
| 143 https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_c ontroller.cc?l=238)). | |
| 144 2. Populate items in shared memory and notify the browser when population is | |
| 145 complete ([code](https://cs.chromium.org/chromium/src/content/child/blob_storage /blob_transport_controller.cc?l=249)). | |
| 146 3. Populate items in files and notify the browser when population is complete | |
| 147 ([code](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_tra nsport_controller.cc?l=292)). | |
| 148 4. Destroys the blob consolidation when the browser says it's done. | |
| 149 | |
| 150 The transport controller also tries to keep the renderer alive while we are | |
| 151 sending blobs, as if the renderer is closed then we would lose any pending blob | |
| 152 data. It does this by using the [incrementing and decrementing the process ref | |
|
pwnall
2017/01/20 02:10:54
remove "using the"?
also, ref -> reference?
dmurph
2017/01/20 20:23:16
Done.
| |
| 153 count](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_tran sport_controller.cc?l=62), | |
| 154 which should prevent fast shutdown. | |
| 155 | |
| 156 # Blob Transportation & Storage (Browser). | |
|
pwnall
2017/01/20 02:10:54
the period looks inappropriate here
dmurph
2017/01/20 20:23:16
Done.
| |
| 157 | |
| 158 The browser side is a little more complicated. We are thinking about: | |
| 159 | |
| 160 1. Do we have enough space for this blob? | |
| 161 2. If so, how do we want to transport it? IPC? Shared Memory? IPC? | |
|
pwnall
2017/01/20 02:10:54
Do you mean "File" instead of the last "IPC"?
Alt
dmurph
2017/01/20 20:23:16
Done.
| |
| 162 3. Can I save this in memory right now? Or do I need to wait for older blob | |
|
pwnall
2017/01/20 02:10:55
Does this mean "Is there enough free memory to tra
dmurph
2017/01/20 20:23:16
Done.
| |
| 163 data to be paged to disk? | |
| 164 4. Do I need to wait for files to be created? | |
| 165 5. Do I need to wait for dependent blobs? | |
| 166 | |
| 167 ## Summary | |
| 168 | |
| 169 We follow this general flow for constructing a blob on the browser side: | |
| 170 | |
| 171 1. Does the blob fit, and what transportation strategy should be used. | |
| 172 2. Create our browser-side representation of the blob data, including any data | |
|
pwnall
2017/01/20 02:10:54
our -> the
dmurph
2017/01/20 20:23:15
Done.
| |
| 173 items from dependent blobs. We try to share data items as much as possible, and | |
|
pwnall
2017/01/20 02:10:54
Does the 2nd sentence here mean that data items ar
dmurph
2017/01/20 20:23:16
Done.
| |
| 174 allow for the dependent blob items to be not populated yet. | |
| 175 3. Request memory and/or file quota from the BlobMemoryController, which | |
| 176 manages our blob storage limits. Quota can be requested for both transportation | |
|
pwnall
2017/01/20 02:10:54
can be requested -> is necessary?
dmurph
2017/01/20 20:23:16
Done.
| |
| 177 and any copies we have to do from dependent blobs. | |
| 178 4. If transporation quota is needed and when it is granted: | |
| 179 1. Tell the BlobTransportHost to start asking for blob data given the earlier | |
| 180 decision of strategy. | |
| 181 * The BlobTransportHost populates the browser-side blob data item. | |
| 182 2. When transportation is done we notify the BlobStorageContext | |
| 183 5. When transportation is done, copy quota is granted, and dependent blobs are | |
| 184 complete, we finish the blob. | |
| 185 1. We perform any pending copies from dependent blobs | |
| 186 2. We notify any listeners that the blob has been completed. | |
| 187 | |
| 188 Note: The transportation sections (steps 1, 2, 3) of this process are described | |
| 189 (without thinking about blob dependencies) with diagrams and details in [this | |
|
pwnall
2017/01/20 02:10:54
thinking about -> accounting for
dmurph
2017/01/20 20:23:17
Done.
| |
| 190 presentation](https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjX gx0fp5AP17L7XDPSM/edit#slide=id.g75d5729ce_0_105). | |
| 191 | |
| 192 ## BlobTransportHost | |
| 193 | |
| 194 The `BlobTransportHost` is in charge of the actual transportation of the data | |
| 195 from the renderer to the browser. When the initial description of the blob is | |
|
pwnall
2017/01/20 02:10:55
I like "description of the blob" / "initial descri
dmurph
2017/01/20 20:23:16
Done.
| |
| 196 sent to the browser, the BlobTransportHost asks the BlobMemoryController which | |
| 197 'strategy' (IPC, Shared Memory, or File) it should use to transport the file. | |
|
pwnall
2017/01/20 02:10:54
I don't think you need quotes here, you introduced
dmurph
2017/01/20 20:23:16
Done.
| |
| 198 Based on this strategy it can transform the memory items sent from the renderer | |
|
pwnall
2017/01/20 02:10:54
transform -> translate?
dmurph
2017/01/20 20:23:17
Done.
| |
| 199 into a browser represetation to facilitate the transportation. See [this]( | |
| 200 https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XD PSM/edit#slide=id.g75d5729ce_0_145) | |
| 201 slide, which illustrates how the browser might segment or split up the | |
| 202 renderer's memory into transportable chunks. | |
| 203 | |
| 204 Once the transport host decides it's strategy, it will create it's own | |
|
pwnall
2017/01/20 02:10:55
it's -> its (twice)
dmurph
2017/01/20 20:23:15
Done.
| |
| 205 transport state for the blob, including a `BlobDataBuilder` using the | |
| 206 transport's data segment representation. Then it will tell the | |
| 207 `BlobStorageContext` that it is ready to build the blob. | |
| 208 | |
| 209 When the `BlobStorageContext` tells the transport host that it is ready to | |
| 210 transport the blob data, this class's responsability is to populate the | |
|
pwnall
2017/01/20 02:10:54
class' ?
Or, better yet, "the transport host popu
dmurph
2017/01/20 20:23:15
Done.
| |
| 211 `BlobDataBuilder` with all the data from the renderer, then signal the storage | |
| 212 context that it is done. | |
| 213 | |
| 214 ## BlobStorageContext | |
| 215 | |
| 216 The `BlobStorageContext` is the hub of the blob storage system. It is | |
| 217 responsible for creating & managing all the state of constructing blobs, as | |
| 218 well as all blob handle generation and general blob status access. | |
| 219 | |
| 220 When a `BlobDataBuilder` is given to the context, whether from the | |
| 221 `BlobTransportHost` or from elsewhere, the context will do the following: | |
| 222 | |
| 223 1. Find all dependent blobs in the new blob (any blob reference in the blob | |
| 224 item list), and create a 'slice' of their items for the new blob. | |
| 225 2. Create the final blob item list representation, which creates a new blob | |
| 226 item list which inserts these 'slice' items into the blob reference spots. This | |
| 227 is 'flattening' the blob. | |
| 228 3. Ask the `BlobMemoryManager` for file or memory quota for the transportation | |
| 229 if necessary | |
| 230 * When this is approved, it notifies the `BlobTransportHost` that it can | |
|
pwnall
2017/01/20 02:10:55
it notifies -> notify (for consistency with the ot
dmurph
2017/01/20 20:23:16
Done.
| |
| 231 begin transporting the data. | |
| 232 4. Ask the `BlobMemoryManager` for memory quota for any copies necessary from | |
|
pwnall
2017/01/20 02:10:55
necessary for blob slicing?
dmurph
2017/01/20 20:23:17
Done.
| |
| 233 the blob slicing. | |
| 234 5. Adds completion callbacks to any dependent blobs that our blob depends on. | |
|
pwnall
2017/01/20 02:10:54
the word "dependent" here seems redundant
dmurph
2017/01/20 20:23:16
Done.
| |
| 235 | |
| 236 When all of the following conditions are met: | |
| 237 | |
| 238 1. The `BlobTransportHost` tells us it has transported all the data (or we | |
| 239 don't need to transport data), | |
| 240 2. The `BlobMemoryManager` approves our memory quota for slice copies (or we | |
| 241 don't need slice copies), and | |
| 242 3. All dependent blobs are completed (or we don't have dependent blobs), | |
| 243 | |
| 244 The blob can finish constructing, where any pending blob slice copies are | |
| 245 performed, and we set the status of the blob. | |
| 246 | |
| 247 ### BlobStatus lifecycle | |
| 248 | |
| 249 The BlobStatus outlines this procedure (specifically the transport process), | |
|
pwnall
2017/01/20 02:10:54
As a reader, I am unsure what "this procedure" ref
dmurph
2017/01/20 20:23:15
Done.
| |
| 250 and the copy memory quota and dependent blob process is encompassed in | |
| 251 `PENDING_INTERNALS`. | |
| 252 | |
| 253 Once a blob is finished constructing, the status is set to `DONE`, or any of | |
|
pwnall
2017/01/20 02:10:54
I think you can say "to `DONE`, or to one of the `
dmurph
2017/01/20 20:23:16
Done.
| |
| 254 the `ERR_*` values if there was an error. | |
| 255 | |
| 256 ### BlobSlice | |
| 257 | |
| 258 During construction, 'slices' are created for dependent blobs using the given | |
|
pwnall
2017/01/20 02:10:54
I don't think slices needs quotes here. It's a con
dmurph
2017/01/20 20:23:16
Done.
| |
| 259 offset and size of the reference. This slice consists of the relevant blob | |
| 260 items, and metadata about possible copies from either end. If blob items can | |
| 261 entirely be used by the new blob, then we just share the item between the. But | |
| 262 if there is a 'slice' of the first or last item, then our resulting BlobSlice | |
| 263 representation will create a new bytes item for the new blob, and store the | |
| 264 necessary copy data for later. | |
| 265 | |
| 266 ### BlobFlattener | |
| 267 | |
| 268 The `BlobFlattener` takes the new blob description (including blob references), | |
| 269 creates blob slices for all the referenced blobs, and constructs a 'flat' | |
| 270 representation of the new blob, where all blob references are replaced with the | |
|
pwnall
2017/01/20 02:10:54
remove "the"?
dmurph
2017/01/20 20:23:15
Done.
| |
| 271 'BlobSlice' items. It also stores any copy data from the slices. | |
|
pwnall
2017/01/20 02:10:55
I think you want backticks instead of single quote
dmurph
2017/01/20 20:23:16
Done.
| |
| 272 | |
| 273 ## BlobMemoryController | |
| 274 | |
| 275 The `BlobMemoryController` is responsable for: | |
| 276 | |
| 277 1. Determining storage quota limits for files and memory, including restricting | |
| 278 file quota when disk space is low. | |
| 279 2. Determining whether a blob can fit and the transportation strategy to use. | |
| 280 3. Allocating memory quota. | |
|
pwnall
2017/01/20 02:10:55
It seems to me that "tracking" is a slightly bette
dmurph
2017/01/20 20:23:16
Done.
| |
| 281 4. Allocating file quota and creating files. | |
| 282 5. Accumulating and evicting old blob data to files to disk. | |
| 283 | |
| OLD | NEW |