content/browser/indexed_db/leveldb_coding_scheme.md - Issue 1682253004: Indexed DB: Pull leveldb coding scheme docs into a markdown file

Unified Diff: content/browser/indexed_db/leveldb_coding_scheme.md

Issue 1682253004: Indexed DB: Pull leveldb coding scheme docs into a markdown file (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Add compile asserts, reformat docs Created 4 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: content/browser/indexed_db/leveldb_coding_scheme.md

diff --git a/content/browser/indexed_db/leveldb_coding_scheme.md b/content/browser/indexed_db/leveldb_coding_scheme.md

new file mode 100644

index 0000000000000000000000000000000000000000..48fbaf2ad07456afb8065867be0f3c5152d9079d

--- /dev/null

+++ b/content/browser/indexed_db/leveldb_coding_scheme.md

@@ -0,0 +1,239 @@

+# LevelDB Coding Scheme

+LevelDB stores key/value pairs. Keys and values are strings of bytes,

+normally of type std::string.

+The keys in the backing store are variable-length tuples with

+different types of fields. Each key in the backing store starts with a

+ternary prefix: <database id, object store id, index id>. For each, 0

+is reserved for metadata.

+> The prefix makes sure that data for a specific database, object store, and

cmumford 2016/02/11 00:48:32 Why did you decide to blockquote this paragraph?

jsbell 2016/02/11 23:26:55 The markdown formatter I was using rendered it mor

+> index are grouped together. The locality is important for performance:

+> common operations should only need a minimal number of seek operations. For

+> example, all the metadata for a database is grouped together so that

+> reading that metadata only requires one seek.

+Each key type has a class (in square brackets below) which knows how

jsbell 2016/02/11 23:26:55 Note definition of "square brackets" here. Made it

+to encode, decode, and compare that key type.

+The term "user key" refers to an Indexed DB key value specified by

+user code as opposed to the internal keys as described here.

+Integer literals used below (e.g. <0, 0, 0, 201, ...>) are defined as

+constants in `indexed_db_leveldb_coding.cc`

+> Note: In order to be compatible with LevelDB's Bloom filter each bit of the

cmumford 2016/02/11 00:48:31 Consider using this markdown: *** note **Warning:

jsbell 2016/02/11 23:26:55 Done.

+> encoded key needs to used and "not ignored" by the comparator.

+## Types

+### Primitive Types

+Generic types which may appear as parts of keys or values are:

+* **Byte** - what it says on the tin

+* **Bool** - single byte, 0 = false, otherwise true

+* **Int** - int64_t; 8 bytes in little-endian order

+* **VarInt** - int64_t >= 0; variable-width, little endian, 7 bits per

+ byte with high bit set until last

+* **String** - UTF-16BE (must be byte-swapped on x86/x64/ARM); length

+ must be implicit

+* **StringWithLength** - VarInt prefix with length in code units (i.e.

+ bytes/2), followed by String

+* **Binary** - VarInt prefix with length in bytes, followed by data

+ bytes

+* **Double** - IEEE754 64-bit (double), in *host endianness*

+### IDBKey

+First byte is the type, followed by type-specific serialization:

+* Number = Byte(3), Double

+* Date = Byte(2), Double

+* String = Byte(1), StringWithLength

+* Binary = Byte(6), Binary

+* Array = Byte(4), VarInt(count), IDBKey...

+### IDBKeyPath

+* Null: Byte(0), Byte(0), Byte(0)

+* String: Byte(0), Byte(0), Byte(1), StringWithLength

+* Array: Byte(0), Byte(0), Byte(2), VarInt(count), StringWithLength...

+If length is < 3 or the first two bytes are not 0, 0 it is treated as

+a String (backwards compatibility).

+## Key Prefix

+[`KeyPrefix`]

cmumford 2016/02/11 00:48:31 I'm not sure what this means - to have it in brace

jsbell 2016/02/11 23:26:55 Defined above, but made more obvious.

+Each key is prefixed with <database id, object store id, index id>

cmumford 2016/02/11 00:48:31 Hour <> is interpreted as a tag showing as: "Each

jsbell 2016/02/11 23:26:55 *sigh* worked in the markdown editor I was using.

+with 0 reserved for metadata.

+To save space, the prefix is not encoded with the usual types. The

+first byte defines the lengths of the other fields:

+* The top 3 bits are the length of the database id - 1 (in bytes)

+* The next 3 bits are the length of the object store id - 1(in bytes)

+* The bottom 2 bits are the length of the index id - 1 (in bytes)

+This is followed by:

+* the database id in little-endian order (1 - 8 bytes)

+* the object store id in little-endian order (1 - 8 bytes)

+* the index id in little-endian order (1 - 4 bytes)

+## Global metadata

+The prefix is <0, 0, 0>, followed by a metadata type byte:

cmumford 2016/02/11 00:48:31 These braces also interpreted as html.

jsbell 2016/02/11 23:26:55 Done.

+key | value

+------------------------------------|------

+<0, 0, 0, 0> | backing store schema version [`SchemaVersionKey`]

+<0, 0, 0, 1> | maximum allocated database [`MaxDatabaseIdKey`]

cmumford 2016/02/11 00:48:31 Keys in this table, and those below also interpret

jsbell 2016/02/11 23:26:55 Done.

+<0, 0, 0, 2> | SerializedScriptValue version [`DataVersionKey`]

+<0, 0, 0, 3> | Blob journal [`BlobJournalKey`]

+<0, 0, 0, 4> | Live blob journal; same format. [`LiveBlobJournalKey`]

+<0, 0, 0, 100, VarInt(database id)> | Existence implies the database id is in the free list [`DatabaseFreeListKey`] - _obsolete_

+<0, 0, 0, 201, StringWithLength(origin), StringWithLength(database name)> | Database id (Int) [`DatabaseNameKey`]

+### Blob Journal

+The format of the blob journal value is:

+`{database_id (var int), blobKey (var int)}*`

+There is no length field; just read until you run out of data.

+If the blobKey is `kAllBlobsKey`, the whole database should be

+deleted.

+## Database metadata

+[`DatabaseMetaDataKey`]

+The prefix is <database id, 0, 0> followed by a metadata type Byte:

+key | value

+-----------------------|-------

+<database id, 0, 0, 0> | origin name (String)

+<database id, 0, 0, 1> | database name (String)

+<database id, 0, 0, 2> | IDB string version data (String) - _obsolete_

+<database id, 0, 0, 3> | maximum allocated object store id (Int)

+<database id, 0, 0, 4> | IDB integer version (VarInt)

+<database id, 0, 0, 5> | blob key generator current number (VarInt)

+## More database metadata

+The prefix is <database id, 0, 0> followed by a type Byte. The object

+store and index id are VarInt, the names are StringWithLength.

+key | value

+------------------------------------------------------|-------

+<database id, 0, 0, 150, object store id> | existence implies the object store id is in the free list [`ObjectStoreFreeListKey`] - _obsolete_

+<database id, 0, 0, 151, object store id, index id> | existence implies the index id is in the free list [`IndexFreeListKey`] - _obsolete_

+<database id, 0, 0, 200, object store name> | object store id (Int) [`ObjectStoreNamesKey`]

+<database id, 0, 0, 201, object store id, index name> | index id (Int) [`IndexNamesKey`]

+## Object store metadata

+[`ObjectStoreMetaDataKey`]

+The prefix is <database id, 0, 0>, followed by a type Byte (50), then

+the object store id (VarInt), then a metadata type Byte.

+key | value

+--------------------------------------------|-------

+<database id, 0, 0, 50, object store id, 0> | object store name (String)

+<database id, 0, 0, 50, object store id, 1> | key path (IDBKeyPath)

+<database id, 0, 0, 50, object store id, 2> | auto increment flag (Bool)

+<database id, 0, 0, 50, object store id, 3> | is evictable (Bool) - _obsolete_

+<database id, 0, 0, 50, object store id, 4> | last _version_ number (Int)

cmumford 2016/02/11 00:48:32 I see this makes "version" italic, but no others.

jsbell 2016/02/11 23:26:55 It was originally quoted, which was weird. Agreed

+<database id, 0, 0, 50, object store id, 5> | maximum allocated index id (Int)

+<database id, 0, 0, 50, object store id, 6> | has key path flag (Bool) - _obsolete_

+<database id, 0, 0, 50, object store id, 7> | key generator current number (Int)

+The key path was originally just a string (#1) or null (identified by

+flag, #6). To support null, string, or array the coding is now

+identified by the leading bytes in #1 - see IDBKeyPath.

+The _version_ field is used to weed out stale index data. Whenever new

+object store data is inserted, it gets a new _version_ number, and new

+index data is written with this number. When the index is used for

+look-ups, entries are validated against the "exists" entries, and

+records with old _version_ numbers are deleted when they are

+encountered in `GetPrimaryKeyViaIndex`,

+`IndexCursorImpl::LoadCurrentRow` and

+`IndexKeyCursorImpl::LoadCurrentRow`.

+## Index metadata

+[`IndexMetaDataKey`]

+The prefix is <database id, 0, 0>, followed by a type Byte (100), then

+the object store id (VarInt), then the index id (VarInt), then a

+metadata type Byte.

+key | value

+-------------------------------------------------------|-------

+<database id, 0, 0, 100, object store id, index id, 0> | index name (String)

+<database id, 0, 0, 100, object store id, index id, 1> | unique flag (Bool)

+<database id, 0, 0, 100, object store id, index id, 2> | key path (IDBKeyPath)

+<database id, 0, 0, 100, object store id, index id, 3> | multi-entry flag (Bool)

+## Object store data

+[`ObjectStoreDataKey`]

+The reserved index id `1` is used in the prefix. The prefix is

+followed the encoded IDB primary key (IDBKey). The data has a

+_version_ prefix followed by the serialized script value.

+key | value

+----------------------------------------------------|-------

+<database id, object store id, 1, IDBKey(user key)> | _version_ (VarInt), serialized script value

+## "Exists" entry

+[`ExistsEntryKey`]

+The reserved index id `2` is used in the prefix. The prefix is

+followed the encoded IDB primary key (IDBKey).

+key | value

+----------------------------------------------------|-------

+<database id, object store id, 2, IDBKey(user key)> | _version_ (VarInt)

+## Blob entry table

+[`BlobEntryKey`]

+The reserved index id `3` is used in the prefix. The prefix is

+followed the encoded IDB primary key.

+key | value

+----------------------------------------------------|-------

+<database id, object store id, 3, IDBKey(user key)> | BlobData

+### BlobData

+ { Bool(is_file),

+ VarInt(key),

+ StringWithLength(type), // may be empty

+ /*for Blobs only*/ VarInt(size),

+ /*for Files only*/ StringWithLength(filename)

+ } *

cmumford 2016/02/11 00:48:31 errant asterisk?

jsbell 2016/02/11 23:26:55 Means "zero or more of the preceding" as in BNF/re

+There is no length field; just read until you run out of data.

+## Index data

+The prefix is followed by a type byte, the encoded index key (IDBKey),

+a _sequence number_ (VarInt), and the encoded primary key (IDBKey).

+key | value

+----|-------

+<database id, object store id, index id, IDBKey(index key), VarInt(sequence number), IDBKey(primary key)> | _version_ (VarInt), primary key (IDBKey) [`IndexDataKey`]

+The _sequence number_ is obsolete; it was used to allow two entries

+with the same user (index) key in non-unique indexes prior to the

+inclusion of the primary key in the data. `0` is always written now

+(which, as a VarInt, takes a single byte)

« content/browser/indexed_db/indexed_db_leveldb_coding.cc ('K') | « content/browser/indexed_db/indexed_db_leveldb_coding.cc ('k') | no next file » | no next file with comments »