content/browser/indexed_db/leveldb_coding_scheme.md - Issue 1682253004: Indexed DB: Pull leveldb coding scheme docs into a markdown file

Side by Side Diff: content/browser/indexed_db/leveldb_coding_scheme.md

Issue 1682253004: Indexed DB: Pull leveldb coding scheme docs into a markdown file (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Review feedback, more details on obsolete/compat issues Created 4 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # LevelDB Coding Scheme

	2

	3 LevelDB stores key/value pairs. Keys and values are strings of bytes,

	4 normally of type `std::string`.

	5

	6 The keys in the backing store are variable-length tuples with

	7 different types of fields, described here using the notation «a, b, c,

	8 ...». Each key in the backing store starts with a ternary prefix:

	9 «database id, object store id, index id». For each, the id `0` is

	10 always reserved for metadata; other ids may be reserved as well.

	11

	12 *** aside

	13 The prefix makes sure that data for a specific database, object store,

	14 and index are grouped together. The locality is important for

	15 performance: common operations should only need a minimal number of

	16 seek operations. For example, all the metadata for a database is

	17 grouped together so that reading that metadata only requires one seek.

	18 ***

	19

	20 Each key type has a class - in [`square brackets`] below - which knows

	21 how to encode, decode, and compare that key type.

	22

	23 The term "user key" refers to an Indexed DB key value specified by

	24 user code as opposed to the internal keys as described here.

	25

	26 Integer literals used below (e.g. «0, 0, 0, 201, ...») are defined as

	27 constants in `indexed_db_leveldb_coding.cc`

	28

	29 *** note

	30 Warning: In order to be compatible with LevelDB's Bloom filter each

	31 bit of the encoded key needs to used and "not ignored" by the

	32 comparator.

	33 ***

	34

	35 *** note

	36 Warning: As a custom comparator is used, some code to handle obsolete

	37 data is still needed as the sort order must be maintained.

	38 ***

	39

	40 - - - -

	41

	42 ## Types

	43

	44 ### Primitive Types

	45

	46 Generic types which may appear as parts of keys or values are:

	47

	48 * Byte - what it says on the tin

	49 * Bool - single byte, 0 = false, otherwise true

	50 * Int - int64_t >= 0; 8 bytes in little-endian order

	51 * VarInt - int64_t >= 0; variable-width, little-endian, 7 bits per

	52 byte with high bit set until last

	53 * String - UTF-16BE (must be byte-swapped on x86/x64/ARM); length

	54 must be implicit

	55 * StringWithLength - VarInt prefix with length in code units (i.e.

	56 bytes/2), followed by String

	57 * Binary - VarInt prefix with length in bytes, followed by data

	58 bytes

	59 * Double - IEEE754 64-bit (double), in host endianness

	60

	61 *** aside

	62 There is a mix of network-endian, little-endian, and host-endian. In

	63 particular, the String encoding requires byte-swapping. Sorry about

	64 that.

	65 ***

	66

	67 ### IDBKey (keys and values)

	68

	69 First byte is the type, followed by type-specific serialization:

	70

	71 * Null (no key): `0` (Byte) _Should not appear in data._

	72 * Number: `3` (Byte), Double

	73 * Date: `2` (Byte), Double

	74 * String: `1` (Byte), StringWithLength

	75 * Binary: `6` (Byte), Binary

	76 * Array: `4` (Byte), count (VarInt), IDBKey...

	77

	78 ### IDBKeyPath (values)

	79

	80 * Null: `0` (Byte), `0` (Byte), `0` (Byte)

	81 * String: `0` (Byte), `0` (Byte), `1` (Byte), StringWithLength

	82 * Array: `0` (Byte), `0` (Byte), `2` (Byte), count (VarInt), StringWithLength...

	83

	84 *** note

	85 Compatibility:

	86 If length is < 3 or the first two bytes are not `0`, `0` whole value

	87 is decoded as a String.

	88 ***

	89

	90 ### Blob Journal (value)

	91

	92 Blob journals are zero-or-more instances of the structure:

	93

	94 ```

	95 {

	96 database_id (VarInt),

	97 blobKey (VarInt)

	98 }

	99 ```

	100

	101 There is no length prefix; just read until you run out of data.

	102

	103 If the blobKey is `DatabaseMetaDataKey::kAllBlobsKey`, the whole

	104 database should be deleted.

	105

	106 ### BlobData (value)

	107

	108 Blob data is zero-or-more instances of the structure:

	109

	110 ```

	111 {

	112 is_file (Bool),

	113 key (VarInt),

	114 type (StringWithLength), // may be empty

	115 /for Blobs only/ size (VarInt),

	116 /for Files only/ filename (StringWithLength)

	117 }

	118 ```

	119

	120 There is no length prefix; just read until you run out of data.

	121

	122 - - - -

	123

	124 ## Key Prefix

	125 [`KeyPrefix`]

	126

	127 Each key is prefixed with «database id, object store id, index id»

	128 with `0` reserved for metadata.

	129

	130 To save space, the prefix is not encoded with the usual types. The

	131 first byte defines the lengths of the other fields:

	132

	133 * The top 3 bits are the length of the database id - 1 (in bytes)

	134 * The next 3 bits are the length of the object store id - 1 (in bytes)

	135 * The bottom 2 bits are the length of the index id - 1 (in bytes)

	136

	137 This is followed by:

	138

	139 * The database id in little-endian order (1 - 8 bytes)

	140 * The object store id in little-endian order (1 - 8 bytes)

	141 * The index id in little-endian order (1 - 4 bytes)

	142

	143 - - - -

	144

	145 ## Global metadata

	146

	147 The prefix is «0, 0, 0», followed by a metadata type byte:

	148

	149 key \| value

	150 ------------------------------------\|------

	151 «0, 0, 0, 0» \| backing store schema version (Int) [`Schem aVersionKey`]

	152 «0, 0, 0, 1» \| maximum allocated database (Int) [`MaxData baseIdKey`]

	153 «0, 0, 0, 2» \| SerializedScriptValue version (Int) [`Data VersionKey`]

	154 «0, 0, 0, 3» \| primary BlobJournal [`BlobJournalKey`]

	155 «0, 0, 0, 4» \| live BlobJournal [`LiveBlobJournalKey`]

	156 «0, 0, 0, 100, database id (VarInt)» \| Existence implies the database id is in t he free list [`DatabaseFreeListKey`] - _obsolete_

	157 «0, 0, 0, 201, origin (StringWithLength), database name (StringWithLength)» \| Da tabase id (Int) [`DatabaseNameKey`]

	158

	159 *** aside

	160 Free lists (#100) are no longer used. The ID space is assumed to be

	161 sufficient.

	162 ***

	163

	164

	165 ## Database metadata

	166 [`DatabaseMetaDataKey`]

	167

	168 The prefix is «database id, 0, 0» followed by a metadata type Byte:

	169

	170 key \| value

	171 -----------------------\|-------

	172 «database id, 0, 0, 0» \| origin name (String)

	173 «database id, 0, 0, 1» \| database name (String)

	174 «database id, 0, 0, 2» \| IDB string version data (String) - _obsolete_

	175 «database id, 0, 0, 3» \| maximum allocated object store id (Int)

	176 «database id, 0, 0, 4» \| IDB integer version (VarInt)

	177 «database id, 0, 0, 5» \| blob key generator current number (VarInt)

	178

	179 *** aside

	180 Early versions of the Indexed DB spec used strings for versions

	181 (#2) instead of monotonically increasing integers.

	182 ***

	183

	184

	185 ### More database metadata

	186

	187 The prefix is «database id, 0, 0» followed by a type Byte. The object

	188 store and index id are VarInt, the names are StringWithLength.

	189

	190 key \| value

	191 ------------------------------------------------------\|-------

	192 «database id, 0, 0, 150, object store id» \| existence implies the ob ject store id is in the free list [`ObjectStoreFreeListKey`] - _obsolete_

	193 «database id, 0, 0, 151, object store id, index id» \| existence implies the in dex id is in the free list [`IndexFreeListKey`] - _obsolete_

	194 «database id, 0, 0, 200, object store name» \| object store id (Int) [` ObjectStoreNamesKey`] - _obsolete_

	195 «database id, 0, 0, 201, object store id, index name» \| index id (Int) [`IndexNa mesKey`] - _obsolete_

	196

	197 *** aside

	198 Free lists (#150, #151) are no longer used. The ID space is assumed to

	199 be sufficient.

	200

	201 The name-to-id mappings (#200, #201) are written but no longer read;

	202 instead the mapping is inferred from the object store and index

	203 metadata. _These should probably be removed._

	204 ***

	205

	206

	207 ## Object store metadata

	208 [`ObjectStoreMetaDataKey`]

	209

	210 The prefix is «database id, 0, 0», followed by a type Byte (`50`),

	211 then the object store id (VarInt), then a metadata type Byte.

	212

	213 key \| value

	214 --------------------------------------------\|-------

	215 «database id, 0, 0, 50, object store id, 0» \| object store name (String)

	216 «database id, 0, 0, 50, object store id, 1» \| key path (IDBKeyPath)

	217 «database id, 0, 0, 50, object store id, 2» \| auto increment flag (Bool)

	218 «database id, 0, 0, 50, object store id, 3» \| is evictable (Bool) - _obsolete_

	219 «database id, 0, 0, 50, object store id, 4» \| last version number (Int)

	220 «database id, 0, 0, 50, object store id, 5» \| maximum allocated index id (Int)

	221 «database id, 0, 0, 50, object store id, 6» \| "has key path" flag (Bool) - _obso lete_

	222 «database id, 0, 0, 50, object store id, 7» \| key generator current number (Int)

	223

	224 The version field is used to weed out stale index data. Whenever new

	225 object store data is inserted, it gets a new version number, and new

	226 index data is written with this number. When the index is used for

	227 look-ups, entries are validated against the "exists" entries, and

	228 records with old version numbers are deleted when they are encountered

	229 in `GetPrimaryKeyViaIndex`, `IndexCursorImpl::LoadCurrentRow` and

	230 `IndexKeyCursorImpl::LoadCurrentRow`.

	231

	232 *** aside

	233 Evictable stores (#3) were present in early iterations of the Indexed

	234 DB specification.

	235

	236 The key path was originally just a string (#1) or null (identified by

	237 flag, #6). To support null, string, or array the coding is now

	238 identified by the leading bytes in #1 - see IDBKeyPath.

	239 ***

	240

	241 *** note

	242 Compatibility:

	243 If #6 is not present then a String key path can be assumed.

	244 If #7 is not present, the key generator state is lazily initialized

	245 using the maximum numeric key present in existing data.

	246 ***

	247

	248

	249 ## Index metadata

	250 [`IndexMetaDataKey`]

	251

	252 The prefix is «database id, 0, 0», followed by a type Byte (100), then

	253 the object store id (VarInt), then the index id (VarInt), then a

	254 metadata type Byte.

	255

	256 key \| value

	257 -------------------------------------------------------\|-------

	258 «database id, 0, 0, 100, object store id, index id, 0» \| index name (String)

	259 «database id, 0, 0, 100, object store id, index id, 1» \| unique flag (Bool)

	260 «database id, 0, 0, 100, object store id, index id, 2» \| key path (IDBKeyPath)

	261 «database id, 0, 0, 100, object store id, index id, 3» \| multi-entry flag (Bool)

	262

	263 *** note

	264 Compatibility:

	265 If #3 is not present, the multi-entry flag is unset.

	266 ***

	267

	268

	269 ## Object store data

	270 [`ObjectStoreDataKey`]

	271

	272 The reserved index id `1` is used in the prefix. The prefix is

	273 followed the encoded IDB primary key (IDBKey). The data has a

	274 version prefix followed by the serialized script value.

	275

	276 key \| value

	277 -----------------------------------------------------\|-------

	278 «database id, object store id, 1, user key (IDBKey)» \| version (VarInt), seriali zed script value

	279

	280

	281 ## "Exists" entry

	282 [`ExistsEntryKey`]

	283

	284 The reserved index id `2` is used in the prefix. The prefix is

	285 followed the encoded IDB primary key (IDBKey).

	286

	287 key \| value

	288 -----------------------------------------------------\|-------

	289 «database id, object store id, 2, user key (IDBKey)» \| version (VarInt)

	290

	291

	292 ## Blob entry table

	293 [`BlobEntryKey`]

	294

	295 The reserved index id `3` is used in the prefix. The prefix is

	296 followed the encoded IDB primary key.

	297

	298 key \| value

	299 -----------------------------------------------------\|-------

	300 «database id, object store id, 3, user key (IDBKey)» \| BlobData

	301

	302

	303 ## Index data

	304 [`IndexDataKey`]

	305

	306 The prefix is followed by a type byte, the encoded index key (IDBKey),

	307 a sequence number (VarInt), and the encoded primary key (IDBKey).

	308

	309 key \| value

	310 ----\|-------

	311 «database id, object store id, index id, index key (IDBKey), sequence number (Va rInt), primary key (IDBKey)» \| version (VarInt), primary key (IDBKey)

	312

	313 The sequence number is obsolete; it was used to allow two entries with

	314 the same user (index) key in non-unique indexes prior to the inclusion

	315 of the primary key in the key itself. `0` is always written now

	316 (which, as a VarInt, takes a single byte)

	317

	318 *** note

	319 Compatibility:

	320 The sequence number and primary key, or just the primary key may not

	321 be present. In that case, enumerators that need the primary key must

	322 access the value.

	323 ***

OLD	NEW

« no previous file with comments | « content/browser/indexed_db/indexed_db_leveldb_coding.cc ('k') | no next file » | no next file with comments »