Chromium Code Reviews| Index: go/tracedb/DESIGN.md |
| diff --git a/go/tracedb/DESIGN.md b/go/tracedb/DESIGN.md |
| index e754adeb3590c0c3f8be39d43d8514a575c47bba..6ac847c060ce9c182cf3578f689f01b03172943f 100644 |
| --- a/go/tracedb/DESIGN.md |
| +++ b/go/tracedb/DESIGN.md |
| @@ -2,7 +2,7 @@ tracedb |
| ======= |
| The tracedb package is designed to replace the current storage system for |
| -traces, tiles, with a new BoltDB backend that allows for much more flexibility |
| +traces, tiles, with a new backend that allows for much more flexibility |
| and an increase in the size of data that can be stored. The new system needs |
| to support both branches and trybots (note that in the future there may be no |
| difference between the two), while still supporting the current capabilities |
| @@ -59,77 +59,68 @@ In the following list you may substitute 'branch' for 'trybot'. |
| Assumptions |
| =========== |
| -1. We will use queries to the BoltDB to build in-memory Tiles. |
| +1. We will use queries to the interface to build in-memory Tiles. |
| 2. We can extract a timestamp from Reitveld for each patch. |
| Design |
| ====== |
| -To actually handle this in BoltDB we will need to create two buckets, one |
| -for the per-commit values in each trace, and another for the trace-level |
| -information, such as the params for each trace. |
| +The design will actually be done in two layers, tracedb.DB, which is the Go interface |
| +for talking to the data store, and then there will be two concrete implementations. |
| +The first implementation will be the gRPC based server, and the second will be Cloud BigTable. |
| -commit bucket |
| -------------- |
| - |
| -The keys for the commit bucket are structured as: |
| - |
| - [timestamp]:[git hash]:[branch name]:[trace_key] |
| - |
| -and the keys map to a single value []byte, that is either the Gold digest or |
| -the Perf float64 measurement value. |
| - |
| -Note that to search through a time range for a specific branch name we'll need |
| -to do the filtering inside the closure we pass to BoltDB. |
| - |
| -trace bucket |
| ------------- |
| -The keys for the trace bucket are just the trace keys. |
| + +-------------+ |
| + | tracedb.DB | |
| + | interface | |
| + +-------------+ |
| + | |
| + +-----------+-----------+ |
| + | | |
| + | | |
| + +------v------+ +-------v------+ |
| + | gRPC Server | | | |
| + | BoltDB | | GCE BigTable | |
| + +-------------+ +--------------+ |
| - [trace_key] |
| -The values are structs serialized as JSON that contain the params for each |
| -trace. We are using JSON over GOB since these are relatively small structs. |
| +tracedb.DB Interface |
| +-------------------- |
| -Interface |
| ---------- |
| - |
| -The interface to tracedb looks like: |
| +This is the Go interface to the storage for traces. The interface to tracedb looks like: |
| // DB represents the interface to any datastore for perf and gold results. |
| // |
| // Notes: |
| - // 1. If 'sources' is an empty slice it will match all sources. |
| - // 2. The Commits in the Tile will only contain the commit id and |
| + // 1. The Commits in the Tile will only contain the commit id and |
| // the timestamp, the Author will not be populated. |
| - // 3. The Tile's Scale and TileIndex will be set to 0. |
| + // 2. The Tile's Scale and TileIndex will be set to 0. |
| // |
| type DB interface { |
| - // Add new information to the datastore. |
| - // |
| - // source - Either a branch name or a Rietveld issue id. |
| - // values - maps the trace id to a DBEntry. |
| - // |
| - // Note that only allowing adding data for a single commit at a time |
| - // should work well with ingestion while still breaking up writes into |
| - // shorter actions. |
| - Add(commitID *CommitID, source string, values map[string]*DBEntry) error |
| - |
| - // Create a Tile based on the given query parameters. |
| - // |
| - // If 'sources' is an empty slice it will match all sources. |
| - // |
| - // Note that the Commits in the Tile will only contain the commit id and |
| - // the timestamp, the Author will not be populated. |
| - TileFromRangeAndSources(begin, end time.Time, sources []string) (*tiling.Tile, error) |
| - |
| - // Create a Tile for the given commit ids. Commits should be provided in |
| - // time order. |
| - // |
| - // Note that the Commits in the Tile will only contain the commit id and |
| - // the timestamp, the Author will not be populated. |
| - TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error) |
| + // Add new information to the datastore. |
| + // |
| + // The values maps a trace id to a Entry. |
| + // |
| + // Note that only allowing adding data for a single commit at a time |
| + // should work well with ingestion while still breaking up writes into |
| + // shorter actions. |
| + Add(commitID *CommitID, values map[string]*Entry) error |
| + |
| + // Remove the given commit from the datastore. |
| + Remove(commitID *CommitID) error |
| + |
| + // List returns all the CommitID's between begin and end. |
| + List(begin, end time.Time) ([]*CommitID, error) |
| + |
| + // Create a Tile for the given commit ids. Will build the Tile using the |
| + // commits in the order they are provided. |
| + // |
| + // Note that the Commits in the Tile will only contain the commit id and |
| + // the timestamp, the Author will not be populated. |
| + TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error) |
| + |
| + // Close the datastore. |
| + Close() error |
|
stephana
2015/10/19 15:03:49
There is no way to enumerate the CommitIDs current
jcgregorio
2015/10/19 15:11:44
To do that simply call:
List(time.Time{}, time.
|
| } |
| The above interface depends on the CommitID struct, which is: |
| @@ -138,17 +129,14 @@ The above interface depends on the CommitID struct, which is: |
| // a real commit into the repo, or an event like running a trybot. |
| type CommitID struct { |
| Timestamp time.Time |
| - ID string // Normally a git hash, but could also be Rietveld issue id + patch id. |
| - } |
| - |
| - func (c *CommitID) String() string { |
| - return fmt.Sprintf("%s%s", c.Timestamp.Format(time.RFC3339), c.ID) |
| + ID string // Normally a git hash, but could also be Rietveld patch id. |
| + Source string // The branch name, e.g. "master", or the Reitveld issue id. |
| } |
|
stephana
2015/10/19 15:03:49
typo: Rietveld
I don't see a simple way to enume
jcgregorio
2015/10/19 15:11:44
Use List() with a beginning and ending time that y
stephana
2015/10/19 15:22:40
That means I have to load the equivalent of a curr
jcgregorio
2015/10/19 20:30:11
Fixed Typo.
|
| -And DBEntry, which is: |
| +And Entry, which is: |
| - // DBEntry holds the params and a value for single measurement. |
| - type DBEntry struct { |
| + // Entry holds the params and a value for single measurement. |
| + type Entry struct { |
| Params map[string]string |
| // Value is the value of the measurement. |
| @@ -166,15 +154,151 @@ Note that this will require adding a new method to the Trace interface: |
| // Each specialization will convert []byte to the correct type. |
| SetAt(index int, value []byte) error |
| + |
| +BoltDB Implementation |
| +===================== |
| + |
| +For local testing the Go interface above will be implemented in terms of the |
| +gRPC interface defined below with a BoltDB store. I.e. there will be a |
| +standalone server that implements the following gRPC interface. |
| + |
| +The gRPC interface is similar to the Go interface, with Add and List operating |
| +exactly the same. The only difference is in retrieving data, which means that |
| +TileForCommits is broken down into two different calls, GetValues, and |
| +GetParams, which the caller can use to build a Tile from. |
| + |
| + // TraceDB stores trace information for both Gold and Perf. |
| + service TraceDB { |
| + // Returns a list of traceids that don't have Params stored in the datastore. |
| + rpc MissingParams(MissingParamsRequest) returns (MissingParamsResponse) {} |
| + |
| + // Adds Params for a set of traceids. |
| + rpc AddParams(AddParamsRequest) returns (EmptyResponse) {} |
| + |
| + // Adds data for a set of traces for a particular commitid. |
| + rpc Add(AddRequest) returns (AddResponse) {} |
| + |
| + // Removes data for a particular commitid. |
| + rpc Remove(RemoveRequest) returns (EmptyResponse) {} |
| + |
| + // List returns all the CommitIDs that exist in the given time range. |
| + rpc List(ListRequest) return (ListResponse) {} |
| + |
| + // GetValues returns all the trace values stored for the given CommitID. |
| + rpc GetValues(GetValuesRequest) (GetValuesResponse) |
| + |
| + // GetParams returns the Params for all of the given traces. |
| + rpc GetParams(GetParamsRequest) (GetParamsResponse) |
| + } |
| + |
| +See `go/tracedb/proto/tracestore.proto` for more details. |
| + |
| + |
| +To actually handle this in BoltDB we will need to create three buckets, one for |
| +the per-commit values in each trace, and another for the trace-level |
| +information, such as the params for each trace, and a third for mapping |
| +traceids to much shorter int64 values. |
| + |
| +traceid bucket |
| +-------------- |
| + |
| +To reduce the amount of data stored, we'll map traceids to 64 bit ints |
| +and use the 64 bit ints as the keys to the maps stored in the commit |
| +bucket. The traceid bucket maps traceids to trace64id, and vice versa. |
| + |
| +There is a special key, "the largest trace64id", which isn't a valid traceid, which |
| +contains the largest trace64id seen, and defaults to 0 if not set. |
| + |
| +commit bucket |
| +------------- |
| + |
| +The keys for the commit bucket are structured as: |
| + |
| + [timestamp]:[git hash]:[branch name] |
| + |
| +The key maps to a serialized values and their trace64ids. I.e. a serialized |
| +map[uint64][]byte, where the uint64 is the trace64id. |
|
stephana
2015/10/19 20:00:19
Shouldn't this be the '!' delimited concatenation
jcgregorio
2015/10/19 20:30:11
Fixed.
On 2015/10/19 at 20:00:19, stephana wrote:
|
| + |
| +trace bucket |
| +------------ |
| + |
| +The keys for the trace bucket are traceids. |
| + |
| + [traceid] |
| + |
| +The values are structs serialized Protocol Buffers that contain the params for |
| +each trace and the original traceid. |
| + |
| +constructor |
| +----------- |
| + |
| + func NewTraceStoreDB(conn *grpc.ClientConn, tb tiling.TraceBuilder) (DB, error) { |
| + |
| +Cloud BigTable Implementation |
| +============================= |
| + |
| +For production use the Go interface will also have a BigTable implementation. |
| +This will be designed to hold information for multiple types of applications, |
| +such as perf and gold, in the same tables. It will also be able to handle |
| +storing data from multiple instances of the same application, such as for |
| +gold-prod, gold-android, and gold-blink. |
| + |
| +Cluster ID: skia-infra |
| + |
| + Table Name | Column Families |
| + -------------|---------------- |
| + commits | key values |
| + traces | key params |
| + |
| +commits |
| +------- |
| +The commits table contains all the data stored in the traces, either the |
| +float64s or the digests |
| + |
| +The key for the commits table is: |
| + |
| + md5('id':'branch':'app') |
| + |
| +The 'key' column family contains the following columns: |
| + id - The git hash or trybot patch id. |
| + branch - The git branch name or the code review id. |
| + app - The name of the app, such as 'gold-prod', 'gold-blink', or 'perf'. |
| + ts - The timestamp of the commit. |
| + |
| +The 'values' column family contains the following columns: |
| + "[traceid]" - One column for each traceid, the cell value is either a float64 or a digest. |
| + |
| + |
| +traces |
| +------ |
| +The Traces table will contain information about each trace. |
| + |
| +The key for the traces table is: |
| + |
| + md5('traceid':'app') |
| + |
| +The 'key' column family contains the following columns: |
| + traceid - The trace id. |
| + app - The name of the app, such as 'gold-prod', 'gold-blink', or 'perf'. |
| + |
| +The 'params' column family contains the following columns: |
| + params - A serialized map[string]string of the trace params. |
| + |
| + |
| +constructor |
| +----------- |
| + |
| + func NewBigTableTraceStoreDB(app string, tb tiling.TraceBuilder, client *bigtable.Client) (DB, error) |
| + |
| Usage |
| ===== |
| -Here is how the single TileFromRangeAndSources can be used to satisfy all the above requirements: |
| +Here is how the single TileFromCommits can be used to satisfy all the above requirements: |
| 1. Build a tile of the last N commits from master. |
| - * Find the ~Nth commit via gitinfo, along with its timestamp. Then call |
| + * Find the last N commits via gitinfo, construct CommitIDs for each one, then call: |
| - TileFromRangeAndSources(nth.Timestamp, head.Timestamp, []string{"master"}) |
| + TileFromCommits(commits) |
| 2. Build a Tile for a trybot. |
| * Find the Reitveld issue id and created time of each patchset. Use the |
| @@ -183,10 +307,6 @@ Here is how the single TileFromRangeAndSources can be used to satisfy all the ab |
| TileFromCommits(commits) |
| - or if you know the timestamp when the issue was created: |
| - |
| - TileFromRangeAndSources(created.Timestamp, time.Now(), []string{"[codereview id]"}) |
| - |
| 3. Build a Tile for a single trybot result vs a specific commit. |
| * Find the Reitveld issue id and created time of the patchset. Find the |
| commitid of the target commit: |
| @@ -194,18 +314,19 @@ Here is how the single TileFromRangeAndSources can be used to satisfy all the ab |
| TileFromCommits([]*CommitID{trybot, commit}) |
| 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf). |
| - * Given the time range: |
| + * Given the time range, build CommitIDs from gitinfo, then call: |
| - TileFromRangeAndSources(beginTimestamp, endTimestamp, []string{"master"}) |
| + TileFromCommits(commits) |
| 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main). |
| - * Given the time range, the empty slice for source means include all sources: |
| + * Given the time range, call List, then TileFromCommits: |
| - TileFromRangeAndSources(beginTimestamp, endTimestamp, []string{}) |
| + commits, err := List(beginTimestamp, endTimestamp) |
| + TileFromCommits(commits) |
| 6. Build a Tile for all commits to main and a given branch for a given time range. (See how a single branch compares to main). |
| - * Find the ~Nth commit via gitinfo. Then call: |
| + * Find the ~Nth commit via gitinfo. Then call List, filter the results, then call TileFromCommits. |
| - TileFromRangeAndSources(nth.Timestamp, head.Timestamp, []string{"master", "[codereview id]"}) |
| - |
| - Note that this might return multiple tries, i.e. one for each patchset. |
| + commits, err := List(beginTimestamp, endTimestamp) |
| + // Filter commits to only include values from the desired branches. |
| + TileFromCommits(commits) |