 Chromium Code Reviews
 Chromium Code Reviews Issue 1682253004:
  Indexed DB: Pull leveldb coding scheme docs into a markdown file  (Closed) 
  Base URL: https://chromium.googlesource.com/chromium/src.git@master
    
  
    Issue 1682253004:
  Indexed DB: Pull leveldb coding scheme docs into a markdown file  (Closed) 
  Base URL: https://chromium.googlesource.com/chromium/src.git@master| Index: content/browser/indexed_db/leveldb_coding_scheme.md | 
| diff --git a/content/browser/indexed_db/leveldb_coding_scheme.md b/content/browser/indexed_db/leveldb_coding_scheme.md | 
| new file mode 100644 | 
| index 0000000000000000000000000000000000000000..48fbaf2ad07456afb8065867be0f3c5152d9079d | 
| --- /dev/null | 
| +++ b/content/browser/indexed_db/leveldb_coding_scheme.md | 
| @@ -0,0 +1,239 @@ | 
| +# LevelDB Coding Scheme | 
| + | 
| +LevelDB stores key/value pairs. Keys and values are strings of bytes, | 
| +normally of type std::string. | 
| + | 
| +The keys in the backing store are variable-length tuples with | 
| +different types of fields. Each key in the backing store starts with a | 
| +ternary prefix: <database id, object store id, index id>. For each, 0 | 
| +is reserved for metadata. | 
| + | 
| +> The prefix makes sure that data for a specific database, object store, and | 
| 
cmumford
2016/02/11 00:48:32
Why did you decide to blockquote this paragraph?
 
jsbell
2016/02/11 23:26:55
The markdown formatter I was using rendered it mor
 | 
| +> index are grouped together. The locality is important for performance: | 
| +> common operations should only need a minimal number of seek operations. For | 
| +> example, all the metadata for a database is grouped together so that | 
| +> reading that metadata only requires one seek. | 
| + | 
| +Each key type has a class (in square brackets below) which knows how | 
| 
jsbell
2016/02/11 23:26:55
Note definition of "square brackets" here. Made it
 | 
| +to encode, decode, and compare that key type. | 
| + | 
| +The term "user key" refers to an Indexed DB key value specified by | 
| +user code as opposed to the internal keys as described here. | 
| + | 
| +Integer literals used below (e.g. <0, 0, 0, 201, ...>) are defined as | 
| +constants in `indexed_db_leveldb_coding.cc` | 
| + | 
| +> Note: In order to be compatible with LevelDB's Bloom filter each bit of the | 
| 
cmumford
2016/02/11 00:48:31
Consider using this markdown:
*** note
**Warning:
 
jsbell
2016/02/11 23:26:55
Done.
 | 
| +> encoded key needs to used and "not ignored" by the comparator. | 
| + | 
| +## Types | 
| + | 
| +### Primitive Types | 
| + | 
| +Generic types which may appear as parts of keys or values are: | 
| + | 
| +* **Byte** - what it says on the tin | 
| +* **Bool** - single byte, 0 = false, otherwise true | 
| +* **Int** - int64_t; 8 bytes in little-endian order | 
| +* **VarInt** - int64_t >= 0; variable-width, little endian, 7 bits per | 
| + byte with high bit set until last | 
| +* **String** - UTF-16BE (must be byte-swapped on x86/x64/ARM); length | 
| + must be implicit | 
| +* **StringWithLength** - VarInt prefix with length in code units (i.e. | 
| + bytes/2), followed by String | 
| +* **Binary** - VarInt prefix with length in bytes, followed by data | 
| + bytes | 
| +* **Double** - IEEE754 64-bit (double), in *host endianness* | 
| + | 
| +### IDBKey | 
| + | 
| +First byte is the type, followed by type-specific serialization: | 
| + | 
| +* Number = Byte(3), Double | 
| +* Date = Byte(2), Double | 
| +* String = Byte(1), StringWithLength | 
| +* Binary = Byte(6), Binary | 
| +* Array = Byte(4), VarInt(count), IDBKey... | 
| + | 
| +### IDBKeyPath | 
| + | 
| +* Null: Byte(0), Byte(0), Byte(0) | 
| +* String: Byte(0), Byte(0), Byte(1), StringWithLength | 
| +* Array: Byte(0), Byte(0), Byte(2), VarInt(count), StringWithLength... | 
| + | 
| +If length is < 3 or the first two bytes are not 0, 0 it is treated as | 
| +a String (backwards compatibility). | 
| + | 
| +## Key Prefix | 
| +[`KeyPrefix`] | 
| 
cmumford
2016/02/11 00:48:31
I'm not sure what this means - to have it in brace
 
jsbell
2016/02/11 23:26:55
Defined above, but made more obvious.
 | 
| + | 
| +Each key is prefixed with <database id, object store id, index id> | 
| 
cmumford
2016/02/11 00:48:31
Hour <> is interpreted as a tag showing as:
"Each
 
jsbell
2016/02/11 23:26:55
*sigh* worked in the markdown editor I was using.
 | 
| +with 0 reserved for metadata. | 
| + | 
| +To save space, the prefix is not encoded with the usual types. The | 
| +first byte defines the lengths of the other fields: | 
| + | 
| +* The top 3 bits are the length of the database id - 1 (in bytes) | 
| +* The next 3 bits are the length of the object store id - 1(in bytes) | 
| +* The bottom 2 bits are the length of the index id - 1 (in bytes) | 
| + | 
| +This is followed by: | 
| + | 
| +* the database id in little-endian order (1 - 8 bytes) | 
| +* the object store id in little-endian order (1 - 8 bytes) | 
| +* the index id in little-endian order (1 - 4 bytes) | 
| + | 
| +## Global metadata | 
| + | 
| +The prefix is <0, 0, 0>, followed by a metadata type byte: | 
| 
cmumford
2016/02/11 00:48:31
These braces also interpreted as html.
 
jsbell
2016/02/11 23:26:55
Done.
 | 
| + | 
| +key | value | 
| +------------------------------------|------ | 
| +<0, 0, 0, 0> | backing store schema version [`SchemaVersionKey`] | 
| +<0, 0, 0, 1> | maximum allocated database [`MaxDatabaseIdKey`] | 
| 
cmumford
2016/02/11 00:48:31
Keys in this table, and those below also interpret
 
jsbell
2016/02/11 23:26:55
Done.
 | 
| +<0, 0, 0, 2> | SerializedScriptValue version [`DataVersionKey`] | 
| +<0, 0, 0, 3> | Blob journal [`BlobJournalKey`] | 
| +<0, 0, 0, 4> | Live blob journal; same format. [`LiveBlobJournalKey`] | 
| +<0, 0, 0, 100, VarInt(database id)> | Existence implies the database id is in the free list [`DatabaseFreeListKey`] - _obsolete_ | 
| +<0, 0, 0, 201, StringWithLength(origin), StringWithLength(database name)> | Database id (Int) [`DatabaseNameKey`] | 
| + | 
| + | 
| +### Blob Journal | 
| +The format of the blob journal value is: | 
| + | 
| +`{database_id (var int), blobKey (var int)}*` | 
| + | 
| +There is no length field; just read until you run out of data. | 
| + | 
| +If the blobKey is `kAllBlobsKey`, the whole database should be | 
| +deleted. | 
| + | 
| +## Database metadata | 
| +[`DatabaseMetaDataKey`] | 
| + | 
| +The prefix is <database id, 0, 0> followed by a metadata type Byte: | 
| + | 
| +key | value | 
| +-----------------------|------- | 
| +<database id, 0, 0, 0> | origin name (String) | 
| +<database id, 0, 0, 1> | database name (String) | 
| +<database id, 0, 0, 2> | IDB string version data (String) - _obsolete_ | 
| +<database id, 0, 0, 3> | maximum allocated object store id (Int) | 
| +<database id, 0, 0, 4> | IDB integer version (VarInt) | 
| +<database id, 0, 0, 5> | blob key generator current number (VarInt) | 
| + | 
| + | 
| +## More database metadata | 
| + | 
| +The prefix is <database id, 0, 0> followed by a type Byte. The object | 
| +store and index id are VarInt, the names are StringWithLength. | 
| + | 
| +key | value | 
| +------------------------------------------------------|------- | 
| +<database id, 0, 0, 150, object store id> | existence implies the object store id is in the free list [`ObjectStoreFreeListKey`] - _obsolete_ | 
| +<database id, 0, 0, 151, object store id, index id> | existence implies the index id is in the free list [`IndexFreeListKey`] - _obsolete_ | 
| +<database id, 0, 0, 200, object store name> | object store id (Int) [`ObjectStoreNamesKey`] | 
| +<database id, 0, 0, 201, object store id, index name> | index id (Int) [`IndexNamesKey`] | 
| + | 
| +## Object store metadata | 
| +[`ObjectStoreMetaDataKey`] | 
| + | 
| +The prefix is <database id, 0, 0>, followed by a type Byte (50), then | 
| +the object store id (VarInt), then a metadata type Byte. | 
| + | 
| +key | value | 
| +--------------------------------------------|------- | 
| +<database id, 0, 0, 50, object store id, 0> | object store name (String) | 
| +<database id, 0, 0, 50, object store id, 1> | key path (IDBKeyPath) | 
| +<database id, 0, 0, 50, object store id, 2> | auto increment flag (Bool) | 
| +<database id, 0, 0, 50, object store id, 3> | is evictable (Bool) - _obsolete_ | 
| +<database id, 0, 0, 50, object store id, 4> | last _version_ number (Int) | 
| 
cmumford
2016/02/11 00:48:32
I see this makes "version" italic, but no others.
 
jsbell
2016/02/11 23:26:55
It was originally quoted, which was weird. Agreed
 | 
| +<database id, 0, 0, 50, object store id, 5> | maximum allocated index id (Int) | 
| +<database id, 0, 0, 50, object store id, 6> | has key path flag (Bool) - _obsolete_ | 
| +<database id, 0, 0, 50, object store id, 7> | key generator current number (Int) | 
| + | 
| +The key path was originally just a string (#1) or null (identified by | 
| +flag, #6). To support null, string, or array the coding is now | 
| +identified by the leading bytes in #1 - see IDBKeyPath. | 
| + | 
| +The _version_ field is used to weed out stale index data. Whenever new | 
| +object store data is inserted, it gets a new _version_ number, and new | 
| +index data is written with this number. When the index is used for | 
| +look-ups, entries are validated against the "exists" entries, and | 
| +records with old _version_ numbers are deleted when they are | 
| +encountered in `GetPrimaryKeyViaIndex`, | 
| +`IndexCursorImpl::LoadCurrentRow` and | 
| +`IndexKeyCursorImpl::LoadCurrentRow`. | 
| + | 
| + | 
| +## Index metadata | 
| +[`IndexMetaDataKey`] | 
| + | 
| +The prefix is <database id, 0, 0>, followed by a type Byte (100), then | 
| +the object store id (VarInt), then the index id (VarInt), then a | 
| +metadata type Byte. | 
| + | 
| +key | value | 
| +-------------------------------------------------------|------- | 
| +<database id, 0, 0, 100, object store id, index id, 0> | index name (String) | 
| +<database id, 0, 0, 100, object store id, index id, 1> | unique flag (Bool) | 
| +<database id, 0, 0, 100, object store id, index id, 2> | key path (IDBKeyPath) | 
| +<database id, 0, 0, 100, object store id, index id, 3> | multi-entry flag (Bool) | 
| + | 
| +## Object store data | 
| +[`ObjectStoreDataKey`] | 
| + | 
| +The reserved index id `1` is used in the prefix. The prefix is | 
| +followed the encoded IDB primary key (IDBKey). The data has a | 
| +_version_ prefix followed by the serialized script value. | 
| + | 
| +key | value | 
| +----------------------------------------------------|------- | 
| +<database id, object store id, 1, IDBKey(user key)> | _version_ (VarInt), serialized script value | 
| + | 
| + | 
| +## "Exists" entry | 
| +[`ExistsEntryKey`] | 
| + | 
| +The reserved index id `2` is used in the prefix. The prefix is | 
| +followed the encoded IDB primary key (IDBKey). | 
| + | 
| +key | value | 
| +----------------------------------------------------|------- | 
| +<database id, object store id, 2, IDBKey(user key)> | _version_ (VarInt) | 
| + | 
| + | 
| +## Blob entry table | 
| +[`BlobEntryKey`] | 
| + | 
| +The reserved index id `3` is used in the prefix. The prefix is | 
| +followed the encoded IDB primary key. | 
| + | 
| +key | value | 
| +----------------------------------------------------|------- | 
| +<database id, object store id, 3, IDBKey(user key)> | BlobData | 
| + | 
| +### BlobData | 
| + | 
| + { Bool(is_file), | 
| + VarInt(key), | 
| + StringWithLength(type), // may be empty | 
| + /*for Blobs only*/ VarInt(size), | 
| + /*for Files only*/ StringWithLength(filename) | 
| + } * | 
| 
cmumford
2016/02/11 00:48:31
errant asterisk?
 
jsbell
2016/02/11 23:26:55
Means "zero or more of the preceding" as in BNF/re
 | 
| + | 
| +There is no length field; just read until you run out of data. | 
| + | 
| +## Index data | 
| + | 
| +The prefix is followed by a type byte, the encoded index key (IDBKey), | 
| +a _sequence number_ (VarInt), and the encoded primary key (IDBKey). | 
| + | 
| +key | value | 
| +----|------- | 
| +<database id, object store id, index id, IDBKey(index key), VarInt(sequence number), IDBKey(primary key)> | _version_ (VarInt), primary key (IDBKey) [`IndexDataKey`] | 
| + | 
| +The _sequence number_ is obsolete; it was used to allow two entries | 
| +with the same user (index) key in non-unique indexes prior to the | 
| +inclusion of the primary key in the data. `0` is always written now | 
| +(which, as a VarInt, takes a single byte) |