| OLD | NEW |
| (Empty) | |
| 1 # LevelDB Coding Scheme |
| 2 |
| 3 LevelDB stores key/value pairs. Keys and values are strings of bytes, |
| 4 normally of type `std::string`. |
| 5 |
| 6 The keys in the backing store are variable-length tuples with |
| 7 different types of fields, described here using the notation «a, b, c, |
| 8 ...». Each key in the backing store starts with a ternary prefix: |
| 9 «database id, object store id, index id». For each, the id `0` is |
| 10 always reserved for metadata; other ids may be reserved as well. |
| 11 |
| 12 *** aside |
| 13 The prefix makes sure that data for a specific database, object store, |
| 14 and index are grouped together. The locality is important for |
| 15 performance: common operations should only need a minimal number of |
| 16 seek operations. For example, all the metadata for a database is |
| 17 grouped together so that reading that metadata only requires one seek. |
| 18 *** |
| 19 |
| 20 Each key type has a class - in [`square brackets`] below - which knows |
| 21 how to encode, decode, and compare that key type. |
| 22 |
| 23 The term "user key" refers to an Indexed DB key value specified by |
| 24 user code as opposed to the internal keys as described here. |
| 25 |
| 26 Integer literals used below (e.g. «0, 0, 0, 201, ...») are defined as |
| 27 constants in `indexed_db_leveldb_coding.cc` |
| 28 |
| 29 *** note |
| 30 **Warning:** In order to be compatible with LevelDB's Bloom filter each |
| 31 bit of the encoded key needs to used and "not ignored" by the |
| 32 comparator. |
| 33 *** |
| 34 |
| 35 *** note |
| 36 **Warning:** As a custom comparator is used, some code to handle obsolete |
| 37 data is still needed as the sort order must be maintained. |
| 38 *** |
| 39 |
| 40 - - - - |
| 41 |
| 42 ## Types |
| 43 |
| 44 ### Primitive Types |
| 45 |
| 46 Generic types which may appear as parts of keys or values are: |
| 47 |
| 48 * **Byte** - what it says on the tin |
| 49 * **Bool** - single byte, 0 = false, otherwise true |
| 50 * **Int** - int64_t >= 0; 8 bytes in little-endian order |
| 51 * **VarInt** - int64_t >= 0; variable-width, little-endian, 7 bits per |
| 52 byte with high bit set until last |
| 53 * **String** - UTF-16BE (must be byte-swapped on x86/x64/ARM); length |
| 54 must be implicit |
| 55 * **StringWithLength** - VarInt prefix with length in code units (i.e. |
| 56 bytes/2), followed by String |
| 57 * **Binary** - VarInt prefix with length in bytes, followed by data |
| 58 bytes |
| 59 * **Double** - IEEE754 64-bit (double), in *host endianness* |
| 60 |
| 61 *** aside |
| 62 There is a mix of network-endian, little-endian, and host-endian. In |
| 63 particular, the String encoding requires byte-swapping. Sorry about |
| 64 that. |
| 65 *** |
| 66 |
| 67 ### IDBKey (keys and values) |
| 68 |
| 69 First byte is the type, followed by type-specific serialization: |
| 70 |
| 71 * Null (no key): `0` (Byte) _Should not appear in data._ |
| 72 * Number: `3` (Byte), Double |
| 73 * Date: `2` (Byte), Double |
| 74 * String: `1` (Byte), StringWithLength |
| 75 * Binary: `6` (Byte), Binary |
| 76 * Array: `4` (Byte), count (VarInt), IDBKey... |
| 77 |
| 78 ### IDBKeyPath (values) |
| 79 |
| 80 * Null: `0` (Byte), `0` (Byte), `0` (Byte) |
| 81 * String: `0` (Byte), `0` (Byte), `1` (Byte), StringWithLength |
| 82 * Array: `0` (Byte), `0` (Byte), `2` (Byte), count (VarInt), StringWithLength... |
| 83 |
| 84 *** note |
| 85 **Compatibility:** |
| 86 If length is < 3 or the first two bytes are not `0`, `0` whole value |
| 87 is decoded as a String. |
| 88 *** |
| 89 |
| 90 ### Blob Journal (value) |
| 91 |
| 92 Blob journals are zero-or-more instances of the structure: |
| 93 |
| 94 ``` |
| 95 { |
| 96 database_id (VarInt), |
| 97 blobKey (VarInt) |
| 98 } |
| 99 ``` |
| 100 |
| 101 There is no length prefix; just read until you run out of data. |
| 102 |
| 103 If the blobKey is `DatabaseMetaDataKey::kAllBlobsKey`, the whole |
| 104 database should be deleted. |
| 105 |
| 106 ### BlobData (value) |
| 107 |
| 108 Blob data is zero-or-more instances of the structure: |
| 109 |
| 110 ``` |
| 111 { |
| 112 is_file (Bool), |
| 113 key (VarInt), |
| 114 type (StringWithLength), // may be empty |
| 115 /*for Blobs only*/ size (VarInt), |
| 116 /*for Files only*/ filename (StringWithLength) |
| 117 } |
| 118 ``` |
| 119 |
| 120 There is no length prefix; just read until you run out of data. |
| 121 |
| 122 - - - - |
| 123 |
| 124 ## Key Prefix |
| 125 [`KeyPrefix`] |
| 126 |
| 127 Each key is prefixed with «database id, object store id, index id» |
| 128 with `0` reserved for metadata. |
| 129 |
| 130 To save space, the prefix is not encoded with the usual types. The |
| 131 first byte defines the lengths of the other fields: |
| 132 |
| 133 * The top 3 bits are the length of the database id - 1 (in bytes) |
| 134 * The next 3 bits are the length of the object store id - 1 (in bytes) |
| 135 * The bottom 2 bits are the length of the index id - 1 (in bytes) |
| 136 |
| 137 This is followed by: |
| 138 |
| 139 * The database id in little-endian order (1 - 8 bytes) |
| 140 * The object store id in little-endian order (1 - 8 bytes) |
| 141 * The index id in little-endian order (1 - 4 bytes) |
| 142 |
| 143 - - - - |
| 144 |
| 145 ## Global metadata |
| 146 |
| 147 The prefix is «0, 0, 0», followed by a metadata type byte: |
| 148 |
| 149 key | value |
| 150 ------------------------------------|------ |
| 151 «0, 0, 0, 0» | backing store schema version (Int) [`Schem
aVersionKey`] |
| 152 «0, 0, 0, 1» | maximum allocated database (Int) [`MaxData
baseIdKey`] |
| 153 «0, 0, 0, 2» | SerializedScriptValue version (Int) [`Data
VersionKey`] |
| 154 «0, 0, 0, 3» | primary BlobJournal [`BlobJournalKey`] |
| 155 «0, 0, 0, 4» | live BlobJournal [`LiveBlobJournalKey`] |
| 156 «0, 0, 0, 100, database id (VarInt)» | Existence implies the database id is in t
he free list [`DatabaseFreeListKey`] - _obsolete_ |
| 157 «0, 0, 0, 201, origin (StringWithLength), database name (StringWithLength)» | Da
tabase id (Int) [`DatabaseNameKey`] |
| 158 |
| 159 *** aside |
| 160 Free lists (#100) are no longer used. The ID space is assumed to be |
| 161 sufficient. |
| 162 *** |
| 163 |
| 164 |
| 165 ## Database metadata |
| 166 [`DatabaseMetaDataKey`] |
| 167 |
| 168 The prefix is «database id, 0, 0» followed by a metadata type Byte: |
| 169 |
| 170 key | value |
| 171 -----------------------|------- |
| 172 «database id, 0, 0, 0» | origin name (String) |
| 173 «database id, 0, 0, 1» | database name (String) |
| 174 «database id, 0, 0, 2» | IDB string version data (String) - _obsolete_ |
| 175 «database id, 0, 0, 3» | maximum allocated object store id (Int) |
| 176 «database id, 0, 0, 4» | IDB integer version (VarInt) |
| 177 «database id, 0, 0, 5» | blob key generator current number (VarInt) |
| 178 |
| 179 *** aside |
| 180 Early versions of the Indexed DB spec used strings for versions |
| 181 (#2) instead of monotonically increasing integers. |
| 182 *** |
| 183 |
| 184 |
| 185 ### More database metadata |
| 186 |
| 187 The prefix is «database id, 0, 0» followed by a type Byte. The object |
| 188 store and index id are VarInt, the names are StringWithLength. |
| 189 |
| 190 key | value |
| 191 ------------------------------------------------------|------- |
| 192 «database id, 0, 0, 150, object store id» | existence implies the ob
ject store id is in the free list [`ObjectStoreFreeListKey`] - _obsolete_ |
| 193 «database id, 0, 0, 151, object store id, index id» | existence implies the in
dex id is in the free list [`IndexFreeListKey`] - _obsolete_ |
| 194 «database id, 0, 0, 200, object store name» | object store id (Int) [`
ObjectStoreNamesKey`] - _obsolete_ |
| 195 «database id, 0, 0, 201, object store id, index name» | index id (Int) [`IndexNa
mesKey`] - _obsolete_ |
| 196 |
| 197 *** aside |
| 198 Free lists (#150, #151) are no longer used. The ID space is assumed to |
| 199 be sufficient. |
| 200 |
| 201 The name-to-id mappings (#200, #201) are written but no longer read; |
| 202 instead the mapping is inferred from the object store and index |
| 203 metadata. _These should probably be removed._ |
| 204 *** |
| 205 |
| 206 |
| 207 ## Object store metadata |
| 208 [`ObjectStoreMetaDataKey`] |
| 209 |
| 210 The prefix is «database id, 0, 0», followed by a type Byte (`50`), |
| 211 then the object store id (VarInt), then a metadata type Byte. |
| 212 |
| 213 key | value |
| 214 --------------------------------------------|------- |
| 215 «database id, 0, 0, 50, object store id, 0» | object store name (String) |
| 216 «database id, 0, 0, 50, object store id, 1» | key path (IDBKeyPath) |
| 217 «database id, 0, 0, 50, object store id, 2» | auto increment flag (Bool) |
| 218 «database id, 0, 0, 50, object store id, 3» | is evictable (Bool) - _obsolete_ |
| 219 «database id, 0, 0, 50, object store id, 4» | last version number (Int) |
| 220 «database id, 0, 0, 50, object store id, 5» | maximum allocated index id (Int) |
| 221 «database id, 0, 0, 50, object store id, 6» | "has key path" flag (Bool) - _obso
lete_ |
| 222 «database id, 0, 0, 50, object store id, 7» | key generator current number (Int) |
| 223 |
| 224 The version field is used to weed out stale index data. Whenever new |
| 225 object store data is inserted, it gets a new version number, and new |
| 226 index data is written with this number. When the index is used for |
| 227 look-ups, entries are validated against the "exists" entries, and |
| 228 records with old version numbers are deleted when they are encountered |
| 229 in `GetPrimaryKeyViaIndex`, `IndexCursorImpl::LoadCurrentRow` and |
| 230 `IndexKeyCursorImpl::LoadCurrentRow`. |
| 231 |
| 232 *** aside |
| 233 Evictable stores (#3) were present in early iterations of the Indexed |
| 234 DB specification. |
| 235 |
| 236 The key path was originally just a string (#1) or null (identified by |
| 237 flag, #6). To support null, string, or array the coding is now |
| 238 identified by the leading bytes in #1 - see **IDBKeyPath**. |
| 239 *** |
| 240 |
| 241 *** note |
| 242 **Compatibility:** |
| 243 If #6 is not present then a String key path can be assumed. |
| 244 If #7 is not present, the key generator state is lazily initialized |
| 245 using the maximum numeric key present in existing data. |
| 246 *** |
| 247 |
| 248 |
| 249 ## Index metadata |
| 250 [`IndexMetaDataKey`] |
| 251 |
| 252 The prefix is «database id, 0, 0», followed by a type Byte (100), then |
| 253 the object store id (VarInt), then the index id (VarInt), then a |
| 254 metadata type Byte. |
| 255 |
| 256 key | value |
| 257 -------------------------------------------------------|------- |
| 258 «database id, 0, 0, 100, object store id, index id, 0» | index name (String) |
| 259 «database id, 0, 0, 100, object store id, index id, 1» | unique flag (Bool) |
| 260 «database id, 0, 0, 100, object store id, index id, 2» | key path (IDBKeyPath) |
| 261 «database id, 0, 0, 100, object store id, index id, 3» | multi-entry flag (Bool) |
| 262 |
| 263 *** note |
| 264 **Compatibility:** |
| 265 If #3 is not present, the multi-entry flag is unset. |
| 266 *** |
| 267 |
| 268 |
| 269 ## Object store data |
| 270 [`ObjectStoreDataKey`] |
| 271 |
| 272 The reserved index id `1` is used in the prefix. The prefix is |
| 273 followed the encoded IDB primary key (IDBKey). The data has a |
| 274 version prefix followed by the serialized script value. |
| 275 |
| 276 key | value |
| 277 -----------------------------------------------------|------- |
| 278 «database id, object store id, 1, user key (IDBKey)» | version (VarInt), seriali
zed script value |
| 279 |
| 280 |
| 281 ## "Exists" entry |
| 282 [`ExistsEntryKey`] |
| 283 |
| 284 The reserved index id `2` is used in the prefix. The prefix is |
| 285 followed the encoded IDB primary key (IDBKey). |
| 286 |
| 287 key | value |
| 288 -----------------------------------------------------|------- |
| 289 «database id, object store id, 2, user key (IDBKey)» | version (VarInt) |
| 290 |
| 291 |
| 292 ## Blob entry table |
| 293 [`BlobEntryKey`] |
| 294 |
| 295 The reserved index id `3` is used in the prefix. The prefix is |
| 296 followed the encoded IDB primary key. |
| 297 |
| 298 key | value |
| 299 -----------------------------------------------------|------- |
| 300 «database id, object store id, 3, user key (IDBKey)» | BlobData |
| 301 |
| 302 |
| 303 ## Index data |
| 304 [`IndexDataKey`] |
| 305 |
| 306 The prefix is followed by a type byte, the encoded index key (IDBKey), |
| 307 a sequence number (VarInt), and the encoded primary key (IDBKey). |
| 308 |
| 309 key | value |
| 310 ----|------- |
| 311 «database id, object store id, index id, index key (IDBKey), sequence number (Va
rInt), primary key (IDBKey)» | version (VarInt), primary key (IDBKey) |
| 312 |
| 313 The sequence number is obsolete; it was used to allow two entries with |
| 314 the same user (index) key in non-unique indexes prior to the inclusion |
| 315 of the primary key in the key itself. `0` is always written now |
| 316 (which, as a VarInt, takes a single byte) |
| 317 |
| 318 *** note |
| 319 **Compatibility:** |
| 320 The sequence number and primary key, or just the primary key may not |
| 321 be present. In that case, enumerators that need the primary key must |
| 322 access the value. |
| 323 *** |
| OLD | NEW |