Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(712)

Unified Diff: content/browser/indexed_db/leveldb_coding_scheme.md

Issue 1682253004: Indexed DB: Pull leveldb coding scheme docs into a markdown file (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Add compile asserts, reformat docs Created 4 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: content/browser/indexed_db/leveldb_coding_scheme.md
diff --git a/content/browser/indexed_db/leveldb_coding_scheme.md b/content/browser/indexed_db/leveldb_coding_scheme.md
new file mode 100644
index 0000000000000000000000000000000000000000..48fbaf2ad07456afb8065867be0f3c5152d9079d
--- /dev/null
+++ b/content/browser/indexed_db/leveldb_coding_scheme.md
@@ -0,0 +1,239 @@
+# LevelDB Coding Scheme
+
+LevelDB stores key/value pairs. Keys and values are strings of bytes,
+normally of type std::string.
+
+The keys in the backing store are variable-length tuples with
+different types of fields. Each key in the backing store starts with a
+ternary prefix: <database id, object store id, index id>. For each, 0
+is reserved for metadata.
+
+> The prefix makes sure that data for a specific database, object store, and
cmumford 2016/02/11 00:48:32 Why did you decide to blockquote this paragraph?
jsbell 2016/02/11 23:26:55 The markdown formatter I was using rendered it mor
+> index are grouped together. The locality is important for performance:
+> common operations should only need a minimal number of seek operations. For
+> example, all the metadata for a database is grouped together so that
+> reading that metadata only requires one seek.
+
+Each key type has a class (in square brackets below) which knows how
jsbell 2016/02/11 23:26:55 Note definition of "square brackets" here. Made it
+to encode, decode, and compare that key type.
+
+The term "user key" refers to an Indexed DB key value specified by
+user code as opposed to the internal keys as described here.
+
+Integer literals used below (e.g. <0, 0, 0, 201, ...>) are defined as
+constants in `indexed_db_leveldb_coding.cc`
+
+> Note: In order to be compatible with LevelDB's Bloom filter each bit of the
cmumford 2016/02/11 00:48:31 Consider using this markdown: *** note **Warning:
jsbell 2016/02/11 23:26:55 Done.
+> encoded key needs to used and "not ignored" by the comparator.
+
+## Types
+
+### Primitive Types
+
+Generic types which may appear as parts of keys or values are:
+
+* **Byte** - what it says on the tin
+* **Bool** - single byte, 0 = false, otherwise true
+* **Int** - int64_t; 8 bytes in little-endian order
+* **VarInt** - int64_t >= 0; variable-width, little endian, 7 bits per
+ byte with high bit set until last
+* **String** - UTF-16BE (must be byte-swapped on x86/x64/ARM); length
+ must be implicit
+* **StringWithLength** - VarInt prefix with length in code units (i.e.
+ bytes/2), followed by String
+* **Binary** - VarInt prefix with length in bytes, followed by data
+ bytes
+* **Double** - IEEE754 64-bit (double), in *host endianness*
+
+### IDBKey
+
+First byte is the type, followed by type-specific serialization:
+
+* Number = Byte(3), Double
+* Date = Byte(2), Double
+* String = Byte(1), StringWithLength
+* Binary = Byte(6), Binary
+* Array = Byte(4), VarInt(count), IDBKey...
+
+### IDBKeyPath
+
+* Null: Byte(0), Byte(0), Byte(0)
+* String: Byte(0), Byte(0), Byte(1), StringWithLength
+* Array: Byte(0), Byte(0), Byte(2), VarInt(count), StringWithLength...
+
+If length is < 3 or the first two bytes are not 0, 0 it is treated as
+a String (backwards compatibility).
+
+## Key Prefix
+[`KeyPrefix`]
cmumford 2016/02/11 00:48:31 I'm not sure what this means - to have it in brace
jsbell 2016/02/11 23:26:55 Defined above, but made more obvious.
+
+Each key is prefixed with <database id, object store id, index id>
cmumford 2016/02/11 00:48:31 Hour <> is interpreted as a tag showing as: "Each
jsbell 2016/02/11 23:26:55 *sigh* worked in the markdown editor I was using.
+with 0 reserved for metadata.
+
+To save space, the prefix is not encoded with the usual types. The
+first byte defines the lengths of the other fields:
+
+* The top 3 bits are the length of the database id - 1 (in bytes)
+* The next 3 bits are the length of the object store id - 1(in bytes)
+* The bottom 2 bits are the length of the index id - 1 (in bytes)
+
+This is followed by:
+
+* the database id in little-endian order (1 - 8 bytes)
+* the object store id in little-endian order (1 - 8 bytes)
+* the index id in little-endian order (1 - 4 bytes)
+
+## Global metadata
+
+The prefix is <0, 0, 0>, followed by a metadata type byte:
cmumford 2016/02/11 00:48:31 These braces also interpreted as html.
jsbell 2016/02/11 23:26:55 Done.
+
+key | value
+------------------------------------|------
+<0, 0, 0, 0> | backing store schema version [`SchemaVersionKey`]
+<0, 0, 0, 1> | maximum allocated database [`MaxDatabaseIdKey`]
cmumford 2016/02/11 00:48:31 Keys in this table, and those below also interpret
jsbell 2016/02/11 23:26:55 Done.
+<0, 0, 0, 2> | SerializedScriptValue version [`DataVersionKey`]
+<0, 0, 0, 3> | Blob journal [`BlobJournalKey`]
+<0, 0, 0, 4> | Live blob journal; same format. [`LiveBlobJournalKey`]
+<0, 0, 0, 100, VarInt(database id)> | Existence implies the database id is in the free list [`DatabaseFreeListKey`] - _obsolete_
+<0, 0, 0, 201, StringWithLength(origin), StringWithLength(database name)> | Database id (Int) [`DatabaseNameKey`]
+
+
+### Blob Journal
+The format of the blob journal value is:
+
+`{database_id (var int), blobKey (var int)}*`
+
+There is no length field; just read until you run out of data.
+
+If the blobKey is `kAllBlobsKey`, the whole database should be
+deleted.
+
+## Database metadata
+[`DatabaseMetaDataKey`]
+
+The prefix is <database id, 0, 0> followed by a metadata type Byte:
+
+key | value
+-----------------------|-------
+<database id, 0, 0, 0> | origin name (String)
+<database id, 0, 0, 1> | database name (String)
+<database id, 0, 0, 2> | IDB string version data (String) - _obsolete_
+<database id, 0, 0, 3> | maximum allocated object store id (Int)
+<database id, 0, 0, 4> | IDB integer version (VarInt)
+<database id, 0, 0, 5> | blob key generator current number (VarInt)
+
+
+## More database metadata
+
+The prefix is <database id, 0, 0> followed by a type Byte. The object
+store and index id are VarInt, the names are StringWithLength.
+
+key | value
+------------------------------------------------------|-------
+<database id, 0, 0, 150, object store id> | existence implies the object store id is in the free list [`ObjectStoreFreeListKey`] - _obsolete_
+<database id, 0, 0, 151, object store id, index id> | existence implies the index id is in the free list [`IndexFreeListKey`] - _obsolete_
+<database id, 0, 0, 200, object store name> | object store id (Int) [`ObjectStoreNamesKey`]
+<database id, 0, 0, 201, object store id, index name> | index id (Int) [`IndexNamesKey`]
+
+## Object store metadata
+[`ObjectStoreMetaDataKey`]
+
+The prefix is <database id, 0, 0>, followed by a type Byte (50), then
+the object store id (VarInt), then a metadata type Byte.
+
+key | value
+--------------------------------------------|-------
+<database id, 0, 0, 50, object store id, 0> | object store name (String)
+<database id, 0, 0, 50, object store id, 1> | key path (IDBKeyPath)
+<database id, 0, 0, 50, object store id, 2> | auto increment flag (Bool)
+<database id, 0, 0, 50, object store id, 3> | is evictable (Bool) - _obsolete_
+<database id, 0, 0, 50, object store id, 4> | last _version_ number (Int)
cmumford 2016/02/11 00:48:32 I see this makes "version" italic, but no others.
jsbell 2016/02/11 23:26:55 It was originally quoted, which was weird. Agreed
+<database id, 0, 0, 50, object store id, 5> | maximum allocated index id (Int)
+<database id, 0, 0, 50, object store id, 6> | has key path flag (Bool) - _obsolete_
+<database id, 0, 0, 50, object store id, 7> | key generator current number (Int)
+
+The key path was originally just a string (#1) or null (identified by
+flag, #6). To support null, string, or array the coding is now
+identified by the leading bytes in #1 - see IDBKeyPath.
+
+The _version_ field is used to weed out stale index data. Whenever new
+object store data is inserted, it gets a new _version_ number, and new
+index data is written with this number. When the index is used for
+look-ups, entries are validated against the "exists" entries, and
+records with old _version_ numbers are deleted when they are
+encountered in `GetPrimaryKeyViaIndex`,
+`IndexCursorImpl::LoadCurrentRow` and
+`IndexKeyCursorImpl::LoadCurrentRow`.
+
+
+## Index metadata
+[`IndexMetaDataKey`]
+
+The prefix is <database id, 0, 0>, followed by a type Byte (100), then
+the object store id (VarInt), then the index id (VarInt), then a
+metadata type Byte.
+
+key | value
+-------------------------------------------------------|-------
+<database id, 0, 0, 100, object store id, index id, 0> | index name (String)
+<database id, 0, 0, 100, object store id, index id, 1> | unique flag (Bool)
+<database id, 0, 0, 100, object store id, index id, 2> | key path (IDBKeyPath)
+<database id, 0, 0, 100, object store id, index id, 3> | multi-entry flag (Bool)
+
+## Object store data
+[`ObjectStoreDataKey`]
+
+The reserved index id `1` is used in the prefix. The prefix is
+followed the encoded IDB primary key (IDBKey). The data has a
+_version_ prefix followed by the serialized script value.
+
+key | value
+----------------------------------------------------|-------
+<database id, object store id, 1, IDBKey(user key)> | _version_ (VarInt), serialized script value
+
+
+## "Exists" entry
+[`ExistsEntryKey`]
+
+The reserved index id `2` is used in the prefix. The prefix is
+followed the encoded IDB primary key (IDBKey).
+
+key | value
+----------------------------------------------------|-------
+<database id, object store id, 2, IDBKey(user key)> | _version_ (VarInt)
+
+
+## Blob entry table
+[`BlobEntryKey`]
+
+The reserved index id `3` is used in the prefix. The prefix is
+followed the encoded IDB primary key.
+
+key | value
+----------------------------------------------------|-------
+<database id, object store id, 3, IDBKey(user key)> | BlobData
+
+### BlobData
+
+ { Bool(is_file),
+ VarInt(key),
+ StringWithLength(type), // may be empty
+ /*for Blobs only*/ VarInt(size),
+ /*for Files only*/ StringWithLength(filename)
+ } *
cmumford 2016/02/11 00:48:31 errant asterisk?
jsbell 2016/02/11 23:26:55 Means "zero or more of the preceding" as in BNF/re
+
+There is no length field; just read until you run out of data.
+
+## Index data
+
+The prefix is followed by a type byte, the encoded index key (IDBKey),
+a _sequence number_ (VarInt), and the encoded primary key (IDBKey).
+
+key | value
+----|-------
+<database id, object store id, index id, IDBKey(index key), VarInt(sequence number), IDBKey(primary key)> | _version_ (VarInt), primary key (IDBKey) [`IndexDataKey`]
+
+The _sequence number_ is obsolete; it was used to allow two entries
+with the same user (index) key in non-unique indexes prior to the
+inclusion of the primary key in the data. `0` is always written now
+(which, as a VarInt, takes a single byte)

Powered by Google App Engine
This is Rietveld 408576698