| OLD | NEW |
| (Empty) | |
| 1 tracedb |
| 2 ======= |
| 3 |
| 4 The tracedb package is designed to replace the current storage system for |
| 5 traces, tiles, with a new backend that allows for much more flexibility |
| 6 and an increase in the size of data that can be stored. The new system needs |
| 7 to support both branches and trybots (note that in the future there may be no |
| 8 difference between the two), while still supporting the current capabilities |
| 9 of looking at master. |
| 10 |
| 11 The current structure for a Tile looks like: |
| 12 |
| 13 type GoldenTrace struct { |
| 14 Params_ map[string]string |
| 15 Values []string |
| 16 } |
| 17 |
| 18 type PerfTrace struct { |
| 19 Values []float64 `json:"values"` |
| 20 Params_ map[string]string `json:"params"` |
| 21 } |
| 22 |
| 23 // Commit is information about each Git commit. |
| 24 type Commit struct { |
| 25 CommitTime int64 `json:"commit_time" bq:"timestamp" db:"ts"` |
| 26 Hash string `json:"hash" bq:"gitHash" db:"githash"` |
| 27 Author string `json:"author" db:"author"` |
| 28 } |
| 29 |
| 30 // Tile is a config.TILE_SIZE commit slice of data. |
| 31 // |
| 32 // The length of the Commits array is the same length as all of the Values |
| 33 // arrays in all of the Traces. |
| 34 type Tile struct { |
| 35 Traces map[string]Trace `json:"traces"` |
| 36 ParamSet map[string][]string `json:"param_set"` |
| 37 Commits []*Commit `json:"commits"` |
| 38 |
| 39 // What is the scale of this Tile, i.e. it contains every Nth point, where |
| 40 // N=const.TILE_SCALE^Scale. |
| 41 Scale int `json:"scale"` |
| 42 TileIndex int `json:"tileIndex"` |
| 43 } |
| 44 |
| 45 Where `PerfTrace` and `GoldenTrace` implement the `Trace` interface. |
| 46 |
| 47 Requirements |
| 48 ============ |
| 49 |
| 50 In the following list you may substitute 'branch' for 'trybot'. |
| 51 |
| 52 1. Build a tile of the last N commits from master. (Our only usage today.) |
| 53 2. Build a Tile for a trybot. |
| 54 3. Build a Tile for a single trybot result vs a specific commit. |
| 55 4. Build a Tile for all commits to master in a given time range. (Be able to go
back in time for either Gold or Perf.) |
| 56 5. Build a Tile for all commits to all branches in a given time range. (Show how
all branches compare against main.) |
| 57 6. Build a Tile for all commits to main and a given branch for a given time rang
e. (See how a single branch compares to main.) |
| 58 |
| 59 Assumptions |
| 60 =========== |
| 61 |
| 62 1. We will use queries to the interface to build in-memory Tiles. |
| 63 2. We can extract a timestamp from Rietveld for each patch. |
| 64 |
| 65 Design |
| 66 ====== |
| 67 |
| 68 The design will actually be done in two layers, tracedb.DB, which is the Go |
| 69 interface for talking to the data store, and then a separate service that |
| 70 implements a gRPC interface and stores the data in BoltDB. |
| 71 |
| 72 |
| 73 +-------------+ |
| 74 | tracedb.DB | |
| 75 | interface | |
| 76 +-------------+ |
| 77 | |
| 78 | |
| 79 | |
| 80 +------v------+ |
| 81 | gRPC Server | |
| 82 | BoltDB | |
| 83 +-------------+ |
| 84 |
| 85 |
| 86 tracedb.DB Interface |
| 87 -------------------- |
| 88 |
| 89 This is the Go interface to the storage for traces. The interface to tracedb loo
ks like: |
| 90 |
| 91 // DB represents the interface to any datastore for perf and gold results. |
| 92 // |
| 93 // Notes: |
| 94 // 1. The Commits in the Tile will only contain the commit id and |
| 95 // the timestamp, the Author will not be populated. |
| 96 // 2. The Tile's Scale and TileIndex will be set to 0. |
| 97 // |
| 98 type DB interface { |
| 99 // Add new information to the datastore. |
| 100 // |
| 101 // The values maps a trace id to a Entry. |
| 102 // |
| 103 // Note that only allowing adding data for a single commit at a time |
| 104 // should work well with ingestion while still breaking up writes into |
| 105 // shorter actions. |
| 106 Add(commitID *CommitID, values map[string]*Entry) error |
| 107 |
| 108 // Remove the given commit from the datastore. |
| 109 Remove(commitID *CommitID) error |
| 110 |
| 111 // List returns all the CommitID's between begin and end. |
| 112 List(begin, end time.Time) ([]*CommitID, error) |
| 113 |
| 114 // Create a Tile for the given commit ids. Will build the Tile using the |
| 115 // commits in the order they are provided. |
| 116 // |
| 117 // Note that the Commits in the Tile will only contain the commit id and |
| 118 // the timestamp, the Author will not be populated. |
| 119 TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error) |
| 120 |
| 121 // Close the datastore. |
| 122 Close() error |
| 123 } |
| 124 |
| 125 The above interface depends on the CommitID struct, which is: |
| 126 |
| 127 // CommitID represents the time of a particular commit, where a commit could
either be |
| 128 // a real commit into the repo, or an event like running a trybot. |
| 129 type CommitID struct { |
| 130 Timestamp time.Time |
| 131 ID string // Normally a git hash, but could also be Rietveld patch
id. |
| 132 Source string // The branch name, e.g. "master", or the Rietveld issue
id. |
| 133 } |
| 134 |
| 135 And Entry, which is: |
| 136 |
| 137 // Entry holds the params and a value for single measurement. |
| 138 type Entry struct { |
| 139 Params map[string]string |
| 140 |
| 141 // Value is the value of the measurement. |
| 142 // |
| 143 // It should be the digest string converted to a []byte, or a float64 |
| 144 // converted to a little endian []byte. I.e. tiling.Trace.SetAt |
| 145 // should be able to consume this value. |
| 146 Value []byte |
| 147 } |
| 148 |
| 149 Note that this will require adding a new method to the Trace interface: |
| 150 |
| 151 // Sets the value of the measurement at index. |
| 152 // |
| 153 // Each specialization will convert []byte to the correct type. |
| 154 SetAt(index int, value []byte) error |
| 155 |
| 156 |
| 157 BoltDB Implementation |
| 158 ===================== |
| 159 |
| 160 For local testing the Go interface above will be implemented in terms of the |
| 161 gRPC interface defined below with a BoltDB store. I.e. there will be a |
| 162 standalone server that implements the following gRPC interface. |
| 163 |
| 164 The gRPC interface is similar to the Go interface, with Add and List operating |
| 165 exactly the same. The only difference is in retrieving data, which means that |
| 166 TileForCommits is broken down into two different calls, GetValues, and |
| 167 GetParams, which the caller can use to build a Tile from. |
| 168 |
| 169 // TraceDB stores trace information for both Gold and Perf. |
| 170 service TraceDB { |
| 171 // Returns a list of traceids that don't have Params stored in the datasto
re. |
| 172 rpc MissingParams(MissingParamsRequest) returns (MissingParamsResponse) {} |
| 173 |
| 174 // Adds Params for a set of traceids. |
| 175 rpc AddParams(AddParamsRequest) returns (EmptyResponse) {} |
| 176 |
| 177 // Adds data for a set of traces for a particular commitid. |
| 178 rpc Add(AddRequest) returns (AddResponse) {} |
| 179 |
| 180 // Removes data for a particular commitid. |
| 181 rpc Remove(RemoveRequest) returns (EmptyResponse) {} |
| 182 |
| 183 // List returns all the CommitIDs that exist in the given time range. |
| 184 rpc List(ListRequest) return (ListResponse) {} |
| 185 |
| 186 // GetValues returns all the trace values stored for the given CommitID. |
| 187 rpc GetValues(GetValuesRequest) (GetValuesResponse) |
| 188 |
| 189 // GetParams returns the Params for all of the given traces. |
| 190 rpc GetParams(GetParamsRequest) (GetParamsResponse) |
| 191 } |
| 192 |
| 193 See `go/tracedb/proto/tracestore.proto` for more details. |
| 194 |
| 195 |
| 196 To actually handle this in BoltDB we will need to create three buckets, one for |
| 197 the per-commit values in each trace, and another for the trace-level |
| 198 information, such as the params for each trace, and a third for mapping |
| 199 traceids to much shorter int64 values. |
| 200 |
| 201 traceid bucket |
| 202 -------------- |
| 203 |
| 204 To reduce the amount of data stored, we'll map traceids to 64 bit ints |
| 205 and use the 64 bit ints as the keys to the maps stored in the commit |
| 206 bucket. The traceid bucket maps traceids to trace64id, and vice versa. |
| 207 |
| 208 There is a special key, "the largest trace64id", which isn't a valid traceid, wh
ich |
| 209 contains the largest trace64id seen, and defaults to 0 if not set. |
| 210 |
| 211 commit bucket |
| 212 ------------- |
| 213 |
| 214 The keys for the commit bucket are structured as: |
| 215 |
| 216 [timestamp]![git hash]![branch name] |
| 217 |
| 218 The key maps to a serialized values and their trace64ids. I.e. a serialized |
| 219 map[uint64][]byte, where the uint64 is the trace64id. |
| 220 |
| 221 trace bucket |
| 222 ------------ |
| 223 |
| 224 The keys for the trace bucket are traceids. |
| 225 |
| 226 [traceid] |
| 227 |
| 228 The values are structs serialized Protocol Buffers that contain the params for |
| 229 each trace and the original traceid. |
| 230 |
| 231 constructor |
| 232 ----------- |
| 233 |
| 234 func NewTraceStoreDB(conn *grpc.ClientConn, tb tiling.TraceBuilder) (DB, err
or) { |
| 235 |
| 236 Usage |
| 237 ===== |
| 238 |
| 239 Here is how the single TileFromCommits can be used to satisfy all the above requ
irements: |
| 240 |
| 241 1. Build a tile of the last N commits from master. |
| 242 * Find the last N commits via gitinfo, construct CommitIDs for each one, then
call: |
| 243 |
| 244 TileFromCommits(commits) |
| 245 |
| 246 2. Build a Tile for a trybot. |
| 247 * Find the Rietveld issue id and created time of each patchset. Use the |
| 248 patchset ids and created timestamps to create a slice of CommitID's to use |
| 249 in: |
| 250 |
| 251 TileFromCommits(commits) |
| 252 |
| 253 3. Build a Tile for a single trybot result vs a specific commit. |
| 254 * Find the Rietveld issue id and created time of the patchset. Find the |
| 255 commitid of the target commit: |
| 256 |
| 257 TileFromCommits([]*CommitID{trybot, commit}) |
| 258 |
| 259 4. Build a Tile for all commits to master in a given time range. (Be able to go
back in time for either Gold or Perf). |
| 260 * Given the time range, build CommitIDs from gitinfo, then call: |
| 261 |
| 262 TileFromCommits(commits) |
| 263 |
| 264 5. Build a Tile for all commits to all branches in a given time range. (Show how
all branches compare against main). |
| 265 * Given the time range, call List, then TileFromCommits: |
| 266 |
| 267 commits, err := List(beginTimestamp, endTimestamp) |
| 268 TileFromCommits(commits) |
| 269 |
| 270 6. Build a Tile for all commits to main and a given branch for a given time rang
e. (See how a single branch compares to main). |
| 271 * Find the ~Nth commit via gitinfo. Then call List, filter the results, then c
all TileFromCommits. |
| 272 |
| 273 commits, err := List(beginTimestamp, endTimestamp) |
| 274 // Filter commits to only include values from the desired branches. |
| 275 TileFromCommits(commits) |
| OLD | NEW |