Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(70)

Unified Diff: go/tracedb/DESIGN.md

Issue 1411663004: Create gRPC client and server, traceservice, that stores trace data in a BoltDB backend. (Closed) Base URL: https://skia.googlesource.com/buildbot@master
Patch Set: clean Created 5 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | traceservice/proto/README.md » ('j') | traceservice/proto/impl.go » ('J')
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: go/tracedb/DESIGN.md
diff --git a/go/tracedb/DESIGN.md b/go/tracedb/DESIGN.md
index e754adeb3590c0c3f8be39d43d8514a575c47bba..6ac847c060ce9c182cf3578f689f01b03172943f 100644
--- a/go/tracedb/DESIGN.md
+++ b/go/tracedb/DESIGN.md
@@ -2,7 +2,7 @@ tracedb
=======
The tracedb package is designed to replace the current storage system for
-traces, tiles, with a new BoltDB backend that allows for much more flexibility
+traces, tiles, with a new backend that allows for much more flexibility
and an increase in the size of data that can be stored. The new system needs
to support both branches and trybots (note that in the future there may be no
difference between the two), while still supporting the current capabilities
@@ -59,77 +59,68 @@ In the following list you may substitute 'branch' for 'trybot'.
Assumptions
===========
-1. We will use queries to the BoltDB to build in-memory Tiles.
+1. We will use queries to the interface to build in-memory Tiles.
2. We can extract a timestamp from Reitveld for each patch.
Design
======
-To actually handle this in BoltDB we will need to create two buckets, one
-for the per-commit values in each trace, and another for the trace-level
-information, such as the params for each trace.
+The design will actually be done in two layers, tracedb.DB, which is the Go interface
+for talking to the data store, and then there will be two concrete implementations.
+The first implementation will be the gRPC based server, and the second will be Cloud BigTable.
-commit bucket
--------------
-
-The keys for the commit bucket are structured as:
-
- [timestamp]:[git hash]:[branch name]:[trace_key]
-
-and the keys map to a single value []byte, that is either the Gold digest or
-the Perf float64 measurement value.
-
-Note that to search through a time range for a specific branch name we'll need
-to do the filtering inside the closure we pass to BoltDB.
-
-trace bucket
-------------
-The keys for the trace bucket are just the trace keys.
+            +-------------+
+            | tracedb.DB  |
+            | interface   |
+            +-------------+
+                   |
+       +-----------+-----------+
+ | |
+       |                       |
+  +------v------+        +-------v------+
+  | gRPC Server |        | |
+  | BoltDB      |        | GCE BigTable |
+  +-------------+        +--------------+
- [trace_key]
-The values are structs serialized as JSON that contain the params for each
-trace. We are using JSON over GOB since these are relatively small structs.
+tracedb.DB Interface
+--------------------
-Interface
----------
-
-The interface to tracedb looks like:
+This is the Go interface to the storage for traces. The interface to tracedb looks like:
// DB represents the interface to any datastore for perf and gold results.
//
// Notes:
- // 1. If 'sources' is an empty slice it will match all sources.
- // 2. The Commits in the Tile will only contain the commit id and
+ // 1. The Commits in the Tile will only contain the commit id and
// the timestamp, the Author will not be populated.
- // 3. The Tile's Scale and TileIndex will be set to 0.
+ // 2. The Tile's Scale and TileIndex will be set to 0.
//
type DB interface {
- // Add new information to the datastore.
- //
- // source - Either a branch name or a Rietveld issue id.
- // values - maps the trace id to a DBEntry.
- //
- // Note that only allowing adding data for a single commit at a time
- // should work well with ingestion while still breaking up writes into
- // shorter actions.
- Add(commitID *CommitID, source string, values map[string]*DBEntry) error
-
- // Create a Tile based on the given query parameters.
- //
- // If 'sources' is an empty slice it will match all sources.
- //
- // Note that the Commits in the Tile will only contain the commit id and
- // the timestamp, the Author will not be populated.
- TileFromRangeAndSources(begin, end time.Time, sources []string) (*tiling.Tile, error)
-
- // Create a Tile for the given commit ids. Commits should be provided in
- // time order.
- //
- // Note that the Commits in the Tile will only contain the commit id and
- // the timestamp, the Author will not be populated.
- TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error)
+ // Add new information to the datastore.
+ //
+ // The values maps a trace id to a Entry.
+ //
+ // Note that only allowing adding data for a single commit at a time
+ // should work well with ingestion while still breaking up writes into
+ // shorter actions.
+ Add(commitID *CommitID, values map[string]*Entry) error
+
+ // Remove the given commit from the datastore.
+ Remove(commitID *CommitID) error
+
+ // List returns all the CommitID's between begin and end.
+ List(begin, end time.Time) ([]*CommitID, error)
+
+ // Create a Tile for the given commit ids. Will build the Tile using the
+ // commits in the order they are provided.
+ //
+ // Note that the Commits in the Tile will only contain the commit id and
+ // the timestamp, the Author will not be populated.
+ TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error)
+
+ // Close the datastore.
+ Close() error
stephana 2015/10/19 15:03:49 There is no way to enumerate the CommitIDs current
jcgregorio 2015/10/19 15:11:44 To do that simply call: List(time.Time{}, time.
}
The above interface depends on the CommitID struct, which is:
@@ -138,17 +129,14 @@ The above interface depends on the CommitID struct, which is:
// a real commit into the repo, or an event like running a trybot.
type CommitID struct {
Timestamp time.Time
- ID string // Normally a git hash, but could also be Rietveld issue id + patch id.
- }
-
- func (c *CommitID) String() string {
- return fmt.Sprintf("%s%s", c.Timestamp.Format(time.RFC3339), c.ID)
+ ID string // Normally a git hash, but could also be Rietveld patch id.
+ Source string // The branch name, e.g. "master", or the Reitveld issue id.
}
stephana 2015/10/19 15:03:49 typo: Rietveld I don't see a simple way to enume
jcgregorio 2015/10/19 15:11:44 Use List() with a beginning and ending time that y
stephana 2015/10/19 15:22:40 That means I have to load the equivalent of a curr
jcgregorio 2015/10/19 20:30:11 Fixed Typo.
-And DBEntry, which is:
+And Entry, which is:
- // DBEntry holds the params and a value for single measurement.
- type DBEntry struct {
+ // Entry holds the params and a value for single measurement.
+ type Entry struct {
Params map[string]string
// Value is the value of the measurement.
@@ -166,15 +154,151 @@ Note that this will require adding a new method to the Trace interface:
// Each specialization will convert []byte to the correct type.
SetAt(index int, value []byte) error
+
+BoltDB Implementation
+=====================
+
+For local testing the Go interface above will be implemented in terms of the
+gRPC interface defined below with a BoltDB store. I.e. there will be a
+standalone server that implements the following gRPC interface.
+
+The gRPC interface is similar to the Go interface, with Add and List operating
+exactly the same. The only difference is in retrieving data, which means that
+TileForCommits is broken down into two different calls, GetValues, and
+GetParams, which the caller can use to build a Tile from.
+
+ // TraceDB stores trace information for both Gold and Perf.
+ service TraceDB {
+ // Returns a list of traceids that don't have Params stored in the datastore.
+ rpc MissingParams(MissingParamsRequest) returns (MissingParamsResponse) {}
+
+ // Adds Params for a set of traceids.
+ rpc AddParams(AddParamsRequest) returns (EmptyResponse) {}
+
+ // Adds data for a set of traces for a particular commitid.
+ rpc Add(AddRequest) returns (AddResponse) {}
+
+ // Removes data for a particular commitid.
+ rpc Remove(RemoveRequest) returns (EmptyResponse) {}
+
+ // List returns all the CommitIDs that exist in the given time range.
+ rpc List(ListRequest) return (ListResponse) {}
+
+ // GetValues returns all the trace values stored for the given CommitID.
+ rpc GetValues(GetValuesRequest) (GetValuesResponse)
+
+ // GetParams returns the Params for all of the given traces.
+ rpc GetParams(GetParamsRequest) (GetParamsResponse)
+ }
+
+See `go/tracedb/proto/tracestore.proto` for more details.
+
+
+To actually handle this in BoltDB we will need to create three buckets, one for
+the per-commit values in each trace, and another for the trace-level
+information, such as the params for each trace, and a third for mapping
+traceids to much shorter int64 values.
+
+traceid bucket
+--------------
+
+To reduce the amount of data stored, we'll map traceids to 64 bit ints
+and use the 64 bit ints as the keys to the maps stored in the commit
+bucket. The traceid bucket maps traceids to trace64id, and vice versa.
+
+There is a special key, "the largest trace64id", which isn't a valid traceid, which
+contains the largest trace64id seen, and defaults to 0 if not set.
+
+commit bucket
+-------------
+
+The keys for the commit bucket are structured as:
+
+ [timestamp]:[git hash]:[branch name]
+
+The key maps to a serialized values and their trace64ids. I.e. a serialized
+map[uint64][]byte, where the uint64 is the trace64id.
stephana 2015/10/19 20:00:19 Shouldn't this be the '!' delimited concatenation
jcgregorio 2015/10/19 20:30:11 Fixed. On 2015/10/19 at 20:00:19, stephana wrote:
+
+trace bucket
+------------
+
+The keys for the trace bucket are traceids.
+
+ [traceid]
+
+The values are structs serialized Protocol Buffers that contain the params for
+each trace and the original traceid.
+
+constructor
+-----------
+
+ func NewTraceStoreDB(conn *grpc.ClientConn, tb tiling.TraceBuilder) (DB, error) {
+
+Cloud BigTable Implementation
+=============================
+
+For production use the Go interface will also have a BigTable implementation.
+This will be designed to hold information for multiple types of applications,
+such as perf and gold, in the same tables. It will also be able to handle
+storing data from multiple instances of the same application, such as for
+gold-prod, gold-android, and gold-blink.
+
+Cluster ID: skia-infra
+
+ Table Name | Column Families
+ -------------|----------------
+ commits | key values
+ traces | key params
+
+commits
+-------
+The commits table contains all the data stored in the traces, either the
+float64s or the digests
+
+The key for the commits table is:
+
+ md5('id':'branch':'app')
+
+The 'key' column family contains the following columns:
+ id - The git hash or trybot patch id.
+ branch - The git branch name or the code review id.
+ app - The name of the app, such as 'gold-prod', 'gold-blink', or 'perf'.
+ ts - The timestamp of the commit.
+
+The 'values' column family contains the following columns:
+ "[traceid]" - One column for each traceid, the cell value is either a float64 or a digest.
+
+
+traces
+------
+The Traces table will contain information about each trace.
+
+The key for the traces table is:
+
+ md5('traceid':'app')
+
+The 'key' column family contains the following columns:
+ traceid - The trace id.
+ app - The name of the app, such as 'gold-prod', 'gold-blink', or 'perf'.
+
+The 'params' column family contains the following columns:
+ params - A serialized map[string]string of the trace params.
+
+
+constructor
+-----------
+
+ func NewBigTableTraceStoreDB(app string, tb tiling.TraceBuilder, client *bigtable.Client) (DB, error)
+
Usage
=====
-Here is how the single TileFromRangeAndSources can be used to satisfy all the above requirements:
+Here is how the single TileFromCommits can be used to satisfy all the above requirements:
1. Build a tile of the last N commits from master.
- * Find the ~Nth commit via gitinfo, along with its timestamp. Then call
+ * Find the last N commits via gitinfo, construct CommitIDs for each one, then call:
- TileFromRangeAndSources(nth.Timestamp, head.Timestamp, []string{"master"})
+ TileFromCommits(commits)
2. Build a Tile for a trybot.
* Find the Reitveld issue id and created time of each patchset. Use the
@@ -183,10 +307,6 @@ Here is how the single TileFromRangeAndSources can be used to satisfy all the ab
TileFromCommits(commits)
- or if you know the timestamp when the issue was created:
-
- TileFromRangeAndSources(created.Timestamp, time.Now(), []string{"[codereview id]"})
-
3. Build a Tile for a single trybot result vs a specific commit.
* Find the Reitveld issue id and created time of the patchset. Find the
commitid of the target commit:
@@ -194,18 +314,19 @@ Here is how the single TileFromRangeAndSources can be used to satisfy all the ab
TileFromCommits([]*CommitID{trybot, commit})
4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf).
- * Given the time range:
+ * Given the time range, build CommitIDs from gitinfo, then call:
- TileFromRangeAndSources(beginTimestamp, endTimestamp, []string{"master"})
+ TileFromCommits(commits)
5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main).
- * Given the time range, the empty slice for source means include all sources:
+ * Given the time range, call List, then TileFromCommits:
- TileFromRangeAndSources(beginTimestamp, endTimestamp, []string{})
+ commits, err := List(beginTimestamp, endTimestamp)
+ TileFromCommits(commits)
6. Build a Tile for all commits to main and a given branch for a given time range. (See how a single branch compares to main).
- * Find the ~Nth commit via gitinfo. Then call:
+ * Find the ~Nth commit via gitinfo. Then call List, filter the results, then call TileFromCommits.
- TileFromRangeAndSources(nth.Timestamp, head.Timestamp, []string{"master", "[codereview id]"})
-
- Note that this might return multiple tries, i.e. one for each patchset.
+ commits, err := List(beginTimestamp, endTimestamp)
+ // Filter commits to only include values from the desired branches.
+ TileFromCommits(commits)
« no previous file with comments | « no previous file | traceservice/proto/README.md » ('j') | traceservice/proto/impl.go » ('J')

Powered by Google App Engine
This is Rietveld 408576698