Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(338)

Side by Side Diff: storage/browser/blob/README.md

Issue 2637023003: [BlobStorage] Adding explainer for blob storage system. (Closed)
Patch Set: comments Created 3 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 # Chrome's Blob Storage System Design
2
3 Elaboration of the blob storage system in Chrome.
4
5 # What are blobs?
6
7 Please see the [FileAPI Spec](https://www.w3.org/TR/FileAPI/) for the full
8 specification for Blobs, or [Mozilla's Blob documentation](
9 https://developer.mozilla.org/en-US/docs/Web/API/Blob) for a description of how
10 Blobs are used in the Web Platform in general. For the purposes of this
11 document, the important aspects of blobs are:
12
13 1. Blobs are immutable.
14 2. Blob can be made using one or more of: bytes, files, or other blobs.
15 3. Blobs can be ['sliced'](
16 https://developer.mozilla.org/en-US/docs/Web/API/Blob/slice), which creates a
17 blob that is a subsection of another blob.
18 4. Reading blobs is asynchronous.
19 5. Reading blob metadata (like size) is synchronous.
20 6. Blobs can be passed to other browsing contexts, such as Javascript workers
21 or other tabs.
22
23 In Chrome, after blob creation the actual blob 'data' gets transported to and
24 lives in the browser process. The renderer just holds a reference -
25 specifically a string UUID - to the blob, which it can use to read the blob or
26 pass it to other processes.
27
28 # Summary & Terminology
29
30 Blobs are created in a renderer process, where their data is temporarily held
31 for the browser (while Javascript execution can continue). When the browser has
32 enough memory quota for the blob, it requests the data from the renderer. All
33 blob data is transported from the renderer to the browser. Once complete, any
34 pending reads for the blob are allowed to complete. Blobs can be huge (GBs), so
35 quota is necessary.
36
37 If the in-memory space for blobs is getting full, or a new blob is too large to
38 be in-memory, then the blob system uses the disk. This can either be paging old
39 blobs to disk, or saving the new too-large blob straight to disk.
40
41 Blob reading goes through the network layer, where the renderer dispatches a
42 network request for the blob and the browser responds with the
43 `BlobURLRequestJob`.
44
45 General Chrome terminology:
46
47 * **Renderer, Browser, and IPCs**: See the [Multi-Process Architecture](
48 https://www.chromium.org/developers/design-documents/multi-process-architecture)
49 document to learn about these concepts.
50 * **Shared Memory**: Memory that both the browser and renderer process can read
51 & write. Created only between 2 processes.
52
53 Blob system terminology:
54
55 * **Blob**: This is a blob object, which can consist of bytes or files, as
56 described above.
57 * **BlobItem** or **[DataElement](
58 https://cs.chromium.org/chromium/src/storage/common/data_element.h)**:
59 This is a primitive element that can basically be a File, Bytes, or another
60 Blob. It also stores an offset and size, so this can be a part of a file. (This
61 can also represent a "future" file and "future" bytes, which is used to signify
62 a bytes or file item that has not been transported yet).
63 * **dependent blobs**: These are blobs that our blob is dependent on to be
64 constructed. As in, a blob is constructed with a dependency on another blob
65 (maybe it is a slice or just a blob in our constructor), and before the new
66 blob can be constructed it might need to wait for the "dependent" blobs to
67 complete. (This can sound backwards, but it's how it's referenced in the code.
68 So think "I am dependent on these other blobs")
69 * **transportation strategy**: a method for sending the data in a BlobItem from
70 a renderer to the browser. The system currently implements three strategies:
71 IPC, Shared Memory, and Files.
72 * **blob description**: the inital data sychronously sent to the browser that
73 describes the items (content and sizes) of the new blob. This can
74 optimistically include the blob data if the size is less than the maximimum IPC
75 size.
76
77 # Blob Storage Limits
78
79 We calculate the storage limits [here](
80 https://cs.chromium.org/chromium/src/storage/browser/blob/blob_memory_controller .cc?q=CalculateBlobStorageLimitsImpl&sq=package:chromium).
81
82 **In-Memory Storage Limit**
83
84 * If the architecture is x64 and NOT ChromeOS or Android: `2GB`
85 * Otherwise: `total_physical_memory / 5`
86
87 **Disk Storage Limit**
88
89 * If ChromeOS: `disk_size / 2`
90 * If Android: `disk_size / 20`
91 * Else: `disk_size / 10`
92
93 Note: ChromeOS's disk is part of the user partition, which is separate from the
94 system partition.
95
96 **Minimum Disk Availability**
97
98 We limit our disk limit to accomidate a minimum disk availability. The equation
99 we use is:
100
101 `min_disk_availability = in_memory_limit * 2`
102
103 ## Example Limits
104
105 (All sizes in GB)
106
107 | Device | Ram | In-Memory Limit | Disk | Disk Limit | Min Disk Availability |
108 | --- | --- | --- | --- | --- | --- |
109 | Cast | 0.5 | 0.1 | 0 | 0 | 0 |
110 | Android Minimal | 0.5 | 0.1 | 8 | 0.4 | 0.2 |
111 | Android Fat | 2 | 0.4 | 32 | 1.5 | 0.8 |
112 | CrOS | 2 | 0.4 | 8 | 4 | 0.8 |
113 | Desktop 32 | 3 | 0.6 | 500 | 50 | 1.2 |
114 | Desktop 64 | 4 | 2 | 500 | 50 | 4 |
115
116 # Common Pitfalls
117
118 ## Creating Large Blobs Too Fast
119
120 Creating a lot of blobs, especially if they are very large blobs, can cause
121 the renderer memory to grow too fast and result in an OOM on the renderer side.
122 This is because the renderer temporarily stores the blob data while it waits
123 for the browser to request it. Meanwhile, Javascript can continue executing.
124 Transfering the data can take a lot of time if the blob is large enough to save
125 it directly to a file, as this means we need to wait for disk operations before
126 the renderer can get rid of the data.
127
128 ## Leaking Blob References
129
130 If the blob object in Javascript is kept around, then the data will never be
131 cleaned up in the backend. This will unnecessarily us memory, so make sure to
132 dereference blob objects if they are no longer needed.
133
134 Similarily if a URL is created for a blob, this will keep the blob data around
135 until the URL is revoked (and the blob object is dereferenced). However, the
136 URL is automatically revoked when the browser context is destroyed.
137
138 # How to use Blobs (Browser-side)
139
140 ## Building
141 All blob interaction should go through the `BlobStorageContext`. Blobs are
142 built using a `BlobDataBuilder` to populate the data and then calling
143 `BlobStorageContext::AddFinishedBlob` or `::BuildBlob`. This returns a
144 `BlobDataHandle`, which manages reading, lifetime, and metadata access for the
145 new blob.
146
147 If you have known data that is not available yet, you can still create the blob
148 reference, but see the documentation in `BlobDataBuilder::AppendFuture* or
149 ::Populate*` methods on the builder, the callback usage on
150 `BlobStorageContext::BuildBlob`, and
151 `BlobStorageContext::NotifyTransportComplete` to facilitate this construction.
152
153 ## Accessing / Reading
154
155 All blob information should come from the `BlobDataHandle` returned on
156 construction. This handle is cheap to copy. Once all instances of handles for
157 a blob are destructed, the blob is destroyed.
158
159 `BlobDataHandle::RunOnConstructionComplete` will notify you when the blob is
160 constructed or broken (construction failed due to not enough space, filesystem
161 error, etc).
162
163 The `BlobReader` class is for reading blobs, and is accessible off of the
164 `BlobDataHandle` at any time.
165
166 # Blob Creation & Transportation (Renderer)
167
168 **This process is outlined with diagrams and illustrations [here](
169 https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XD PSM/edit#slide=id.g75c319281_0_681).**
170
171 This outlines the renderer-side responsabilities of the blob system. The
172 renderer needs to:
173
174 1. Consolidate small bytes items into larger chunks (avoiding a huge array of
175 1 byte items).
176 2. Communicate the blob description to the browser immediately on
177 construction.
178 3. Populate shared memory or files sent from the browser with the consolidated
179 blob data items.
180 4. Hold the blob data until the browser is finished requesting it.
181
182 The meat of blob construction starts in the [WebBlobRegistryImpl](
183 https://cs.chromium.org/chromium/src/content/child/blob_storage/webblobregistry_ impl.h)'s
184 `createBuilder(uuid, content_type)`.
185
186 ## Blob Data Consolidation
187
188 Since blobs are often constructed with arrays with single bytes, we try to
189 consolidate all **adjacent** memory blob items into one. This is done in
190 [BlobConsolidation](https://cs.chromium.org/chromium/src/content/child/blob_stor age/blob_consolidation.h).
191 The implementation doesn't actually do any copying or allocating of new memory
192 buffers, instead it facilitates the transformation between the 'consolidated'
193 blob items and the underlying bytes items. This way we don't waste any memory.
194
195 ## Blob Transportation (Renderer)
196
197 After the blob has been 'consolidated', it is given to the
198 [BlobTransportController](https://cs.chromium.org/chromium/src/content/child/blo b_storage/blob_transport_controller.h).
199 This class:
200
201 1. Immediately communicates the blob description to the Browser. We also
202 [optimistically send](https://cs.chromium.org/chromium/src/content/child/blob_st orage/blob_transport_controller.cc?l=325)
203 the blob data if the total memory is less than our IPC threshold.
204 2. Stores the blob consolidation for data requests from the browser.
205 3. Answers requests from the browser to populate or send the blob data. The
206 browser can request the renderer:
207 1. Send items and populate the data in IPC ([code](
208 https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_c ontroller.cc?q="case+IPCBlobItemRequestStrategy::IPC")).
209 2. Populate items in shared memory and notify the browser when population is
210 complete ([code](https://cs.chromium.org/chromium/src/content/child/blob_storage /blob_transport_controller.cc?q="case+IPCBlobItemRequestStrategy::SHARED_MEMORY" )).
211 3. Populate items in files and notify the browser when population is complete
212 ([code](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_tra nsport_controller.cc?q="case+IPCBlobItemRequestStrategy::FILE")).
213 4. Destroys the blob consolidation when the browser says it's done.
214
215 The transport controller also tries to keep the renderer alive while we are
216 sending blobs, as if the renderer is closed then we would lose any pending blob
217 data. It does this the [incrementing and decrementing the process reference
218 count](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_tran sport_controller.cc?l=62),
219 which should prevent fast shutdown.
220
221 # Blob Transportation & Storage (Browser)
222
223 The browser side is a little more complicated. We are thinking about:
224
225 1. Do we have enough space for this blob?
226 2. Pick transportation strategy for blob's components.
227 3. Is there enough free memory to transport the blob right now? Or does older
228 blob data to be paged to disk first?
229 4. Do I need to wait for files to be created?
230 5. Do I need to wait for dependent blobs?
231
232 ## Summary
233
234 We follow this general flow for constructing a blob on the browser side:
235
236 1. Does the blob fit, and what transportation strategy should be used.
237 2. Create our browser-side representation of the blob data, including the data
238 items from dependent blobs. We try to share items as much as possible to save
239 memory, and allow for the dependent blob items to be not populated yet.
240 3. Request memory and/or file quota from the BlobMemoryController, which
241 manages our blob storage limits. Quota is necessary for both transportation and
242 any copies we have to do from dependent blobs.
243 4. If transporation quota is needed and when it is granted:
244 1. Tell the BlobTransportHost to start asking for blob data given the earlier
245 decision of strategy.
246 * The BlobTransportHost populates the browser-side blob data item.
247 2. When transportation is done we notify the BlobStorageContext
248 5. When transportation is done, copy quota is granted, and dependent blobs are
249 complete, we finish the blob.
250 1. We perform any pending copies from dependent blobs
251 2. We notify any listeners that the blob has been completed.
252
253 Note: The transportation sections (steps 1, 2, 3) of this process are described
254 (without accounting for blob dependencies) with diagrams and details in [this
255 presentation](https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjX gx0fp5AP17L7XDPSM/edit#slide=id.g75d5729ce_0_105).
256
257 ## BlobTransportHost
258
259 The `BlobTransportHost` is in charge of the actual transportation of the data
260 from the renderer to the browser. When the initial description of the blob is
261 sent to the browser, the BlobTransportHost asks the BlobMemoryController which
262 strategy (IPC, Shared Memory, or File) it should use to transport the file.
263 Based on this strategy it can translate the memory items sent from the renderer
264 into a browser represetation to facilitate the transportation. See [this](
265 https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XD PSM/edit#slide=id.g75d5729ce_0_145)
266 slide, which illustrates how the browser might segment or split up the
267 renderer's memory into transportable chunks.
268
269 Once the transport host decides its strategy, it will create its own transport
270 state for the blob, including a `BlobDataBuilder` using the transport's data
271 segment representation. Then it will tell the `BlobStorageContext` that it is
272 ready to build the blob.
273
274 When the `BlobStorageContext` tells the transport host that it is ready to
275 transport the blob data, the transport host requests all of the data from the
276 renderer, populates the data in the `BlobDataBuilder`, and then signals the
277 storage context that it is done.
278
279 ## BlobStorageContext
280
281 The `BlobStorageContext` is the hub of the blob storage system. It is
282 responsible for creating & managing all the state of constructing blobs, as
283 well as all blob handle generation and general blob status access.
284
285 When a `BlobDataBuilder` is given to the context, whether from the
286 `BlobTransportHost` or from elsewhere, the context will do the following:
287
288 1. Find all dependent blobs in the new blob (any blob reference in the blob
289 item list), and create a 'slice' of their items for the new blob.
290 2. Create the final blob item list representation, which creates a new blob
291 item list which inserts these 'slice' items into the blob reference spots. This
292 is 'flattening' the blob.
293 3. Ask the `BlobMemoryManager` for file or memory quota for the transportation
294 if necessary
295 * When the quota request is granted, notify the `BlobTransportHost` that to
296 begin transporting the data.
297 4. Ask the `BlobMemoryManager` for memory quota for any copies necessary for
298 blob slicing.
299 5. Adds completion callbacks to any blobs our blob depends on.
300
301 When all of the following conditions are met:
302
303 1. The `BlobTransportHost` tells us it has transported all the data (or we
304 don't need to transport data),
305 2. The `BlobMemoryManager` approves our memory quota for slice copies (or we
306 don't need slice copies), and
307 3. All dependent blobs are completed (or we don't have dependent blobs),
308
309 The blob can finish constructing, where any pending blob slice copies are
310 performed, and we set the status of the blob.
311
312 ### BlobStatus lifecycle
313
314 The BlobStatus tracks the construction procedure (specifically the transport
315 process), and the copy memory quota and dependent blob process is encompassed
316 in `PENDING_INTERNALS`.
317
318 Once a blob is finished constructing, the status is set to `DONE` or any of
319 the `ERR_*` values.
320
321 ### BlobSlice
322
323 During construction, slices are created for dependent blobs using the given
324 offset and size of the reference. This slice consists of the relevant blob
325 items, and metadata about possible copies from either end. If blob items can
326 entirely be used by the new blob, then we just share the item between the. But
327 if there is a 'slice' of the first or last item, then our resulting BlobSlice
328 representation will create a new bytes item for the new blob, and store
329 necessary copy data for later.
330
331 ### BlobFlattener
332
333 The `BlobFlattener` takes the new blob description (including blob references),
334 creates blob slices for all the referenced blobs, and constructs a 'flat'
335 representation of the new blob, where all blob references are replaced with the
336 `BlobSlice` items. It also stores any copy data from the slices.
337
338 ## BlobMemoryController
339
340 The `BlobMemoryController` is responsable for:
341
342 1. Determining storage quota limits for files and memory, including restricting
343 file quota when disk space is low.
344 2. Determining whether a blob can fit and the transportation strategy to use.
345 3. Tracking memory quota.
346 4. Tracking file quota and creating files.
347 5. Accumulating and evicting old blob data to files to disk.
348
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698