Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(186)

Side by Side Diff: content/browser/indexed_db/leveldb_coding_scheme.md

Issue 1682253004: Indexed DB: Pull leveldb coding scheme docs into a markdown file (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Review feedback, more details on obsolete/compat issues Created 4 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « content/browser/indexed_db/indexed_db_leveldb_coding.cc ('k') | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 # LevelDB Coding Scheme
2
3 LevelDB stores key/value pairs. Keys and values are strings of bytes,
4 normally of type `std::string`.
5
6 The keys in the backing store are variable-length tuples with
7 different types of fields, described here using the notation «a, b, c,
8 ...». Each key in the backing store starts with a ternary prefix:
9 «database id, object store id, index id». For each, the id `0` is
10 always reserved for metadata; other ids may be reserved as well.
11
12 *** aside
13 The prefix makes sure that data for a specific database, object store,
14 and index are grouped together. The locality is important for
15 performance: common operations should only need a minimal number of
16 seek operations. For example, all the metadata for a database is
17 grouped together so that reading that metadata only requires one seek.
18 ***
19
20 Each key type has a class - in [`square brackets`] below - which knows
21 how to encode, decode, and compare that key type.
22
23 The term "user key" refers to an Indexed DB key value specified by
24 user code as opposed to the internal keys as described here.
25
26 Integer literals used below (e.g. «0, 0, 0, 201, ...») are defined as
27 constants in `indexed_db_leveldb_coding.cc`
28
29 *** note
30 **Warning:** In order to be compatible with LevelDB's Bloom filter each
31 bit of the encoded key needs to used and "not ignored" by the
32 comparator.
33 ***
34
35 *** note
36 **Warning:** As a custom comparator is used, some code to handle obsolete
37 data is still needed as the sort order must be maintained.
38 ***
39
40 - - - -
41
42 ## Types
43
44 ### Primitive Types
45
46 Generic types which may appear as parts of keys or values are:
47
48 * **Byte** - what it says on the tin
49 * **Bool** - single byte, 0 = false, otherwise true
50 * **Int** - int64_t >= 0; 8 bytes in little-endian order
51 * **VarInt** - int64_t >= 0; variable-width, little-endian, 7 bits per
52 byte with high bit set until last
53 * **String** - UTF-16BE (must be byte-swapped on x86/x64/ARM); length
54 must be implicit
55 * **StringWithLength** - VarInt prefix with length in code units (i.e.
56 bytes/2), followed by String
57 * **Binary** - VarInt prefix with length in bytes, followed by data
58 bytes
59 * **Double** - IEEE754 64-bit (double), in *host endianness*
60
61 *** aside
62 There is a mix of network-endian, little-endian, and host-endian. In
63 particular, the String encoding requires byte-swapping. Sorry about
64 that.
65 ***
66
67 ### IDBKey (keys and values)
68
69 First byte is the type, followed by type-specific serialization:
70
71 * Null (no key): `0` (Byte) _Should not appear in data._
72 * Number: `3` (Byte), Double
73 * Date: `2` (Byte), Double
74 * String: `1` (Byte), StringWithLength
75 * Binary: `6` (Byte), Binary
76 * Array: `4` (Byte), count (VarInt), IDBKey...
77
78 ### IDBKeyPath (values)
79
80 * Null: `0` (Byte), `0` (Byte), `0` (Byte)
81 * String: `0` (Byte), `0` (Byte), `1` (Byte), StringWithLength
82 * Array: `0` (Byte), `0` (Byte), `2` (Byte), count (VarInt), StringWithLength...
83
84 *** note
85 **Compatibility:**
86 If length is < 3 or the first two bytes are not `0`, `0` whole value
87 is decoded as a String.
88 ***
89
90 ### Blob Journal (value)
91
92 Blob journals are zero-or-more instances of the structure:
93
94 ```
95 {
96 database_id (VarInt),
97 blobKey (VarInt)
98 }
99 ```
100
101 There is no length prefix; just read until you run out of data.
102
103 If the blobKey is `DatabaseMetaDataKey::kAllBlobsKey`, the whole
104 database should be deleted.
105
106 ### BlobData (value)
107
108 Blob data is zero-or-more instances of the structure:
109
110 ```
111 {
112 is_file (Bool),
113 key (VarInt),
114 type (StringWithLength), // may be empty
115 /*for Blobs only*/ size (VarInt),
116 /*for Files only*/ filename (StringWithLength)
117 }
118 ```
119
120 There is no length prefix; just read until you run out of data.
121
122 - - - -
123
124 ## Key Prefix
125 [`KeyPrefix`]
126
127 Each key is prefixed with «database id, object store id, index id»
128 with `0` reserved for metadata.
129
130 To save space, the prefix is not encoded with the usual types. The
131 first byte defines the lengths of the other fields:
132
133 * The top 3 bits are the length of the database id - 1 (in bytes)
134 * The next 3 bits are the length of the object store id - 1 (in bytes)
135 * The bottom 2 bits are the length of the index id - 1 (in bytes)
136
137 This is followed by:
138
139 * The database id in little-endian order (1 - 8 bytes)
140 * The object store id in little-endian order (1 - 8 bytes)
141 * The index id in little-endian order (1 - 4 bytes)
142
143 - - - -
144
145 ## Global metadata
146
147 The prefix is «0, 0, 0», followed by a metadata type byte:
148
149 key | value
150 ------------------------------------|------
151 «0, 0, 0, 0» | backing store schema version (Int) [`Schem aVersionKey`]
152 «0, 0, 0, 1» | maximum allocated database (Int) [`MaxData baseIdKey`]
153 «0, 0, 0, 2» | SerializedScriptValue version (Int) [`Data VersionKey`]
154 «0, 0, 0, 3» | primary BlobJournal [`BlobJournalKey`]
155 «0, 0, 0, 4» | live BlobJournal [`LiveBlobJournalKey`]
156 «0, 0, 0, 100, database id (VarInt)» | Existence implies the database id is in t he free list [`DatabaseFreeListKey`] - _obsolete_
157 «0, 0, 0, 201, origin (StringWithLength), database name (StringWithLength)» | Da tabase id (Int) [`DatabaseNameKey`]
158
159 *** aside
160 Free lists (#100) are no longer used. The ID space is assumed to be
161 sufficient.
162 ***
163
164
165 ## Database metadata
166 [`DatabaseMetaDataKey`]
167
168 The prefix is «database id, 0, 0» followed by a metadata type Byte:
169
170 key | value
171 -----------------------|-------
172 «database id, 0, 0, 0» | origin name (String)
173 «database id, 0, 0, 1» | database name (String)
174 «database id, 0, 0, 2» | IDB string version data (String) - _obsolete_
175 «database id, 0, 0, 3» | maximum allocated object store id (Int)
176 «database id, 0, 0, 4» | IDB integer version (VarInt)
177 «database id, 0, 0, 5» | blob key generator current number (VarInt)
178
179 *** aside
180 Early versions of the Indexed DB spec used strings for versions
181 (#2) instead of monotonically increasing integers.
182 ***
183
184
185 ### More database metadata
186
187 The prefix is «database id, 0, 0» followed by a type Byte. The object
188 store and index id are VarInt, the names are StringWithLength.
189
190 key | value
191 ------------------------------------------------------|-------
192 «database id, 0, 0, 150, object store id» | existence implies the ob ject store id is in the free list [`ObjectStoreFreeListKey`] - _obsolete_
193 «database id, 0, 0, 151, object store id, index id» | existence implies the in dex id is in the free list [`IndexFreeListKey`] - _obsolete_
194 «database id, 0, 0, 200, object store name» | object store id (Int) [` ObjectStoreNamesKey`] - _obsolete_
195 «database id, 0, 0, 201, object store id, index name» | index id (Int) [`IndexNa mesKey`] - _obsolete_
196
197 *** aside
198 Free lists (#150, #151) are no longer used. The ID space is assumed to
199 be sufficient.
200
201 The name-to-id mappings (#200, #201) are written but no longer read;
202 instead the mapping is inferred from the object store and index
203 metadata. _These should probably be removed._
204 ***
205
206
207 ## Object store metadata
208 [`ObjectStoreMetaDataKey`]
209
210 The prefix is «database id, 0, 0», followed by a type Byte (`50`),
211 then the object store id (VarInt), then a metadata type Byte.
212
213 key | value
214 --------------------------------------------|-------
215 «database id, 0, 0, 50, object store id, 0» | object store name (String)
216 «database id, 0, 0, 50, object store id, 1» | key path (IDBKeyPath)
217 «database id, 0, 0, 50, object store id, 2» | auto increment flag (Bool)
218 «database id, 0, 0, 50, object store id, 3» | is evictable (Bool) - _obsolete_
219 «database id, 0, 0, 50, object store id, 4» | last version number (Int)
220 «database id, 0, 0, 50, object store id, 5» | maximum allocated index id (Int)
221 «database id, 0, 0, 50, object store id, 6» | "has key path" flag (Bool) - _obso lete_
222 «database id, 0, 0, 50, object store id, 7» | key generator current number (Int)
223
224 The version field is used to weed out stale index data. Whenever new
225 object store data is inserted, it gets a new version number, and new
226 index data is written with this number. When the index is used for
227 look-ups, entries are validated against the "exists" entries, and
228 records with old version numbers are deleted when they are encountered
229 in `GetPrimaryKeyViaIndex`, `IndexCursorImpl::LoadCurrentRow` and
230 `IndexKeyCursorImpl::LoadCurrentRow`.
231
232 *** aside
233 Evictable stores (#3) were present in early iterations of the Indexed
234 DB specification.
235
236 The key path was originally just a string (#1) or null (identified by
237 flag, #6). To support null, string, or array the coding is now
238 identified by the leading bytes in #1 - see **IDBKeyPath**.
239 ***
240
241 *** note
242 **Compatibility:**
243 If #6 is not present then a String key path can be assumed.
244 If #7 is not present, the key generator state is lazily initialized
245 using the maximum numeric key present in existing data.
246 ***
247
248
249 ## Index metadata
250 [`IndexMetaDataKey`]
251
252 The prefix is «database id, 0, 0», followed by a type Byte (100), then
253 the object store id (VarInt), then the index id (VarInt), then a
254 metadata type Byte.
255
256 key | value
257 -------------------------------------------------------|-------
258 «database id, 0, 0, 100, object store id, index id, 0» | index name (String)
259 «database id, 0, 0, 100, object store id, index id, 1» | unique flag (Bool)
260 «database id, 0, 0, 100, object store id, index id, 2» | key path (IDBKeyPath)
261 «database id, 0, 0, 100, object store id, index id, 3» | multi-entry flag (Bool)
262
263 *** note
264 **Compatibility:**
265 If #3 is not present, the multi-entry flag is unset.
266 ***
267
268
269 ## Object store data
270 [`ObjectStoreDataKey`]
271
272 The reserved index id `1` is used in the prefix. The prefix is
273 followed the encoded IDB primary key (IDBKey). The data has a
274 version prefix followed by the serialized script value.
275
276 key | value
277 -----------------------------------------------------|-------
278 «database id, object store id, 1, user key (IDBKey)» | version (VarInt), seriali zed script value
279
280
281 ## "Exists" entry
282 [`ExistsEntryKey`]
283
284 The reserved index id `2` is used in the prefix. The prefix is
285 followed the encoded IDB primary key (IDBKey).
286
287 key | value
288 -----------------------------------------------------|-------
289 «database id, object store id, 2, user key (IDBKey)» | version (VarInt)
290
291
292 ## Blob entry table
293 [`BlobEntryKey`]
294
295 The reserved index id `3` is used in the prefix. The prefix is
296 followed the encoded IDB primary key.
297
298 key | value
299 -----------------------------------------------------|-------
300 «database id, object store id, 3, user key (IDBKey)» | BlobData
301
302
303 ## Index data
304 [`IndexDataKey`]
305
306 The prefix is followed by a type byte, the encoded index key (IDBKey),
307 a sequence number (VarInt), and the encoded primary key (IDBKey).
308
309 key | value
310 ----|-------
311 «database id, object store id, index id, index key (IDBKey), sequence number (Va rInt), primary key (IDBKey)» | version (VarInt), primary key (IDBKey)
312
313 The sequence number is obsolete; it was used to allow two entries with
314 the same user (index) key in non-unique indexes prior to the inclusion
315 of the primary key in the key itself. `0` is always written now
316 (which, as a VarInt, takes a single byte)
317
318 *** note
319 **Compatibility:**
320 The sequence number and primary key, or just the primary key may not
321 be present. In that case, enumerators that need the primary key must
322 access the value.
323 ***
OLDNEW
« no previous file with comments | « content/browser/indexed_db/indexed_db_leveldb_coding.cc ('k') | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698