Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(257)

Unified Diff: trace/DESIGN.md

Issue 1411663004: Create gRPC client and server, traceservice, that stores trace data in a BoltDB backend. (Closed) Base URL: https://skia.googlesource.com/buildbot@master
Patch Set: fix vet Created 5 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « go/tracedb/DESIGN.md ('k') | trace/service/README.md » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: trace/DESIGN.md
diff --git a/trace/DESIGN.md b/trace/DESIGN.md
new file mode 100644
index 0000000000000000000000000000000000000000..1db7162c34390b5e8a868d6c1b817b57c98e2fa3
--- /dev/null
+++ b/trace/DESIGN.md
@@ -0,0 +1,275 @@
+tracedb
+=======
+
+The tracedb package is designed to replace the current storage system for
+traces, tiles, with a new backend that allows for much more flexibility
+and an increase in the size of data that can be stored. The new system needs
+to support both branches and trybots (note that in the future there may be no
+difference between the two), while still supporting the current capabilities
+of looking at master.
+
+The current structure for a Tile looks like:
+
+ type GoldenTrace struct {
+ Params_ map[string]string
+ Values []string
+ }
+
+ type PerfTrace struct {
+ Values []float64 `json:"values"`
+ Params_ map[string]string `json:"params"`
+ }
+
+ // Commit is information about each Git commit.
+ type Commit struct {
+ CommitTime int64 `json:"commit_time" bq:"timestamp" db:"ts"`
+ Hash string `json:"hash" bq:"gitHash" db:"githash"`
+ Author string `json:"author" db:"author"`
+ }
+
+ // Tile is a config.TILE_SIZE commit slice of data.
+ //
+ // The length of the Commits array is the same length as all of the Values
+ // arrays in all of the Traces.
+ type Tile struct {
+ Traces map[string]Trace `json:"traces"`
+ ParamSet map[string][]string `json:"param_set"`
+ Commits []*Commit `json:"commits"`
+
+ // What is the scale of this Tile, i.e. it contains every Nth point, where
+ // N=const.TILE_SCALE^Scale.
+ Scale int `json:"scale"`
+ TileIndex int `json:"tileIndex"`
+ }
+
+Where `PerfTrace` and `GoldenTrace` implement the `Trace` interface.
+
+Requirements
+============
+
+In the following list you may substitute 'branch' for 'trybot'.
+
+1. Build a tile of the last N commits from master. (Our only usage today.)
+2. Build a Tile for a trybot.
+3. Build a Tile for a single trybot result vs a specific commit.
+4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf.)
+5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main.)
+6. Build a Tile for all commits to main and a given branch for a given time range. (See how a single branch compares to main.)
+
+Assumptions
+===========
+
+1. We will use queries to the interface to build in-memory Tiles.
+2. We can extract a timestamp from Rietveld for each patch.
+
+Design
+======
+
+The design will actually be done in two layers, tracedb.DB, which is the Go
+interface for talking to the data store, and then a separate service that
+implements a gRPC interface and stores the data in BoltDB.
+
+
+            +-------------+
+            | tracedb.DB  |
+            | interface   |
+            +-------------+
+                   |
+ |
+       |
+  +------v------+
+  | gRPC Server |
+  | BoltDB      |
+  +-------------+
+
+
+tracedb.DB Interface
+--------------------
+
+This is the Go interface to the storage for traces. The interface to tracedb looks like:
+
+ // DB represents the interface to any datastore for perf and gold results.
+ //
+ // Notes:
+ // 1. The Commits in the Tile will only contain the commit id and
+ // the timestamp, the Author will not be populated.
+ // 2. The Tile's Scale and TileIndex will be set to 0.
+ //
+ type DB interface {
+ // Add new information to the datastore.
+ //
+ // The values maps a trace id to a Entry.
+ //
+ // Note that only allowing adding data for a single commit at a time
+ // should work well with ingestion while still breaking up writes into
+ // shorter actions.
+ Add(commitID *CommitID, values map[string]*Entry) error
+
+ // Remove the given commit from the datastore.
+ Remove(commitID *CommitID) error
+
+ // List returns all the CommitID's between begin and end.
+ List(begin, end time.Time) ([]*CommitID, error)
+
+ // Create a Tile for the given commit ids. Will build the Tile using the
+ // commits in the order they are provided.
+ //
+ // Note that the Commits in the Tile will only contain the commit id and
+ // the timestamp, the Author will not be populated.
+ TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error)
+
+ // Close the datastore.
+ Close() error
+ }
+
+The above interface depends on the CommitID struct, which is:
+
+ // CommitID represents the time of a particular commit, where a commit could either be
+ // a real commit into the repo, or an event like running a trybot.
+ type CommitID struct {
+ Timestamp time.Time
+ ID string // Normally a git hash, but could also be Rietveld patch id.
+ Source string // The branch name, e.g. "master", or the Rietveld issue id.
+ }
+
+And Entry, which is:
+
+ // Entry holds the params and a value for single measurement.
+ type Entry struct {
+ Params map[string]string
+
+ // Value is the value of the measurement.
+ //
+ // It should be the digest string converted to a []byte, or a float64
+ // converted to a little endian []byte. I.e. tiling.Trace.SetAt
+ // should be able to consume this value.
+ Value []byte
+ }
+
+Note that this will require adding a new method to the Trace interface:
+
+ // Sets the value of the measurement at index.
+ //
+ // Each specialization will convert []byte to the correct type.
+ SetAt(index int, value []byte) error
+
+
+BoltDB Implementation
+=====================
+
+For local testing the Go interface above will be implemented in terms of the
+gRPC interface defined below with a BoltDB store. I.e. there will be a
+standalone server that implements the following gRPC interface.
+
+The gRPC interface is similar to the Go interface, with Add and List operating
+exactly the same. The only difference is in retrieving data, which means that
+TileForCommits is broken down into two different calls, GetValues, and
+GetParams, which the caller can use to build a Tile from.
+
+ // TraceDB stores trace information for both Gold and Perf.
+ service TraceDB {
+ // Returns a list of traceids that don't have Params stored in the datastore.
+ rpc MissingParams(MissingParamsRequest) returns (MissingParamsResponse) {}
+
+ // Adds Params for a set of traceids.
+ rpc AddParams(AddParamsRequest) returns (EmptyResponse) {}
+
+ // Adds data for a set of traces for a particular commitid.
+ rpc Add(AddRequest) returns (AddResponse) {}
+
+ // Removes data for a particular commitid.
+ rpc Remove(RemoveRequest) returns (EmptyResponse) {}
+
+ // List returns all the CommitIDs that exist in the given time range.
+ rpc List(ListRequest) return (ListResponse) {}
+
+ // GetValues returns all the trace values stored for the given CommitID.
+ rpc GetValues(GetValuesRequest) (GetValuesResponse)
+
+ // GetParams returns the Params for all of the given traces.
+ rpc GetParams(GetParamsRequest) (GetParamsResponse)
+ }
+
+See `go/tracedb/proto/tracestore.proto` for more details.
+
+
+To actually handle this in BoltDB we will need to create three buckets, one for
+the per-commit values in each trace, and another for the trace-level
+information, such as the params for each trace, and a third for mapping
+traceids to much shorter int64 values.
+
+traceid bucket
+--------------
+
+To reduce the amount of data stored, we'll map traceids to 64 bit ints
+and use the 64 bit ints as the keys to the maps stored in the commit
+bucket. The traceid bucket maps traceids to trace64id, and vice versa.
+
+There is a special key, "the largest trace64id", which isn't a valid traceid, which
+contains the largest trace64id seen, and defaults to 0 if not set.
+
+commit bucket
+-------------
+
+The keys for the commit bucket are structured as:
+
+ [timestamp]![git hash]![branch name]
+
+The key maps to a serialized values and their trace64ids. I.e. a serialized
+map[uint64][]byte, where the uint64 is the trace64id.
+
+trace bucket
+------------
+
+The keys for the trace bucket are traceids.
+
+ [traceid]
+
+The values are structs serialized Protocol Buffers that contain the params for
+each trace and the original traceid.
+
+constructor
+-----------
+
+ func NewTraceStoreDB(conn *grpc.ClientConn, tb tiling.TraceBuilder) (DB, error) {
+
+Usage
+=====
+
+Here is how the single TileFromCommits can be used to satisfy all the above requirements:
+
+1. Build a tile of the last N commits from master.
+ * Find the last N commits via gitinfo, construct CommitIDs for each one, then call:
+
+ TileFromCommits(commits)
+
+2. Build a Tile for a trybot.
+ * Find the Rietveld issue id and created time of each patchset. Use the
+ patchset ids and created timestamps to create a slice of CommitID's to use
+ in:
+
+ TileFromCommits(commits)
+
+3. Build a Tile for a single trybot result vs a specific commit.
+ * Find the Rietveld issue id and created time of the patchset. Find the
+ commitid of the target commit:
+
+ TileFromCommits([]*CommitID{trybot, commit})
+
+4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf).
+ * Given the time range, build CommitIDs from gitinfo, then call:
+
+ TileFromCommits(commits)
+
+5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main).
+ * Given the time range, call List, then TileFromCommits:
+
+ commits, err := List(beginTimestamp, endTimestamp)
+ TileFromCommits(commits)
+
+6. Build a Tile for all commits to main and a given branch for a given time range. (See how a single branch compares to main).
+ * Find the ~Nth commit via gitinfo. Then call List, filter the results, then call TileFromCommits.
+
+ commits, err := List(beginTimestamp, endTimestamp)
+ // Filter commits to only include values from the desired branches.
+ TileFromCommits(commits)
« no previous file with comments | « go/tracedb/DESIGN.md ('k') | trace/service/README.md » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698