Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 tracedb | 1 tracedb |
|
stephana
2015/10/20 14:42:32
Could go/traceservice/ be merged into go/tracedb/
| |
| 2 ======= | 2 ======= |
| 3 | 3 |
| 4 The tracedb package is designed to replace the current storage system for | 4 The tracedb package is designed to replace the current storage system for |
| 5 traces, tiles, with a new BoltDB backend that allows for much more flexibility | 5 traces, tiles, with a new backend that allows for much more flexibility |
| 6 and an increase in the size of data that can be stored. The new system needs | 6 and an increase in the size of data that can be stored. The new system needs |
| 7 to support both branches and trybots (note that in the future there may be no | 7 to support both branches and trybots (note that in the future there may be no |
| 8 difference between the two), while still supporting the current capabilities | 8 difference between the two), while still supporting the current capabilities |
| 9 of looking at master. | 9 of looking at master. |
| 10 | 10 |
| 11 The current structure for a Tile looks like: | 11 The current structure for a Tile looks like: |
| 12 | 12 |
| 13 type GoldenTrace struct { | 13 type GoldenTrace struct { |
| 14 Params_ map[string]string | 14 Params_ map[string]string |
| 15 Values []string | 15 Values []string |
| (...skipping 36 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 52 1. Build a tile of the last N commits from master. (Our only usage today.) | 52 1. Build a tile of the last N commits from master. (Our only usage today.) |
| 53 2. Build a Tile for a trybot. | 53 2. Build a Tile for a trybot. |
| 54 3. Build a Tile for a single trybot result vs a specific commit. | 54 3. Build a Tile for a single trybot result vs a specific commit. |
| 55 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf.) | 55 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf.) |
| 56 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main.) | 56 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main.) |
| 57 6. Build a Tile for all commits to main and a given branch for a given time rang e. (See how a single branch compares to main.) | 57 6. Build a Tile for all commits to main and a given branch for a given time rang e. (See how a single branch compares to main.) |
| 58 | 58 |
| 59 Assumptions | 59 Assumptions |
| 60 =========== | 60 =========== |
| 61 | 61 |
| 62 1. We will use queries to the BoltDB to build in-memory Tiles. | 62 1. We will use queries to the interface to build in-memory Tiles. |
| 63 2. We can extract a timestamp from Reitveld for each patch. | 63 2. We can extract a timestamp from Reitveld for each patch. |
| 64 | 64 |
| 65 Design | 65 Design |
| 66 ====== | 66 ====== |
| 67 | 67 |
| 68 To actually handle this in BoltDB we will need to create two buckets, one | 68 The design will actually be done in two layers, tracedb.DB, which is the Go inte rface |
| 69 for the per-commit values in each trace, and another for the trace-level | 69 for talking to the data store, and then there will be two concrete implementatio ns. |
| 70 information, such as the params for each trace. | 70 The first implementation will be the gRPC based server, and the second will be C loud BigTable. |
| 71 | 71 |
| 72 commit bucket | |
| 73 ------------- | |
| 74 | 72 |
| 75 The keys for the commit bucket are structured as: | 73 +-------------+ |
| 74 | tracedb.DB | | |
| 75 | interface | | |
| 76 +-------------+ | |
| 77 | | |
| 78 +-----------+-----------+ | |
| 79 | | | |
| 80 | | | |
| 81 +------v------+ +-------v------+ | |
| 82 | gRPC Server | | | | |
| 83 | BoltDB | | GCE BigTable | | |
| 84 +-------------+ +--------------+ | |
| 76 | 85 |
| 77 [timestamp]:[git hash]:[branch name]:[trace_key] | |
| 78 | 86 |
| 79 and the keys map to a single value []byte, that is either the Gold digest or | 87 tracedb.DB Interface |
| 80 the Perf float64 measurement value. | 88 -------------------- |
| 81 | 89 |
| 82 Note that to search through a time range for a specific branch name we'll need | 90 This is the Go interface to the storage for traces. The interface to tracedb loo ks like: |
| 83 to do the filtering inside the closure we pass to BoltDB. | |
| 84 | |
| 85 trace bucket | |
| 86 ------------ | |
| 87 | |
| 88 The keys for the trace bucket are just the trace keys. | |
| 89 | |
| 90 [trace_key] | |
| 91 | |
| 92 The values are structs serialized as JSON that contain the params for each | |
| 93 trace. We are using JSON over GOB since these are relatively small structs. | |
| 94 | |
| 95 Interface | |
| 96 --------- | |
| 97 | |
| 98 The interface to tracedb looks like: | |
| 99 | 91 |
| 100 // DB represents the interface to any datastore for perf and gold results. | 92 // DB represents the interface to any datastore for perf and gold results. |
| 101 // | 93 // |
| 102 // Notes: | 94 // Notes: |
| 103 // 1. If 'sources' is an empty slice it will match all sources. | 95 // 1. The Commits in the Tile will only contain the commit id and |
| 104 // 2. The Commits in the Tile will only contain the commit id and | |
| 105 // the timestamp, the Author will not be populated. | 96 // the timestamp, the Author will not be populated. |
| 106 // 3. The Tile's Scale and TileIndex will be set to 0. | 97 // 2. The Tile's Scale and TileIndex will be set to 0. |
| 107 // | 98 // |
| 108 type DB interface { | 99 type DB interface { |
| 109 // Add new information to the datastore. | 100 // Add new information to the datastore. |
| 110 // | 101 // |
| 111 // source - Either a branch name or a Rietveld issue id. | 102 // The values maps a trace id to a Entry. |
| 112 // values - maps the trace id to a DBEntry. | 103 // |
| 113 // | 104 // Note that only allowing adding data for a single commit at a time |
| 114 // Note that only allowing adding data for a single commit at a time | 105 // should work well with ingestion while still breaking up writes into |
| 115 // should work well with ingestion while still breaking up writes into | 106 // shorter actions. |
| 116 // shorter actions. | 107 Add(commitID *CommitID, values map[string]*Entry) error |
| 117 Add(commitID *CommitID, source string, values map[string]*DBEntry) error | |
| 118 | 108 |
| 119 // Create a Tile based on the given query parameters. | 109 // Remove the given commit from the datastore. |
| 120 // | 110 Remove(commitID *CommitID) error |
| 121 // If 'sources' is an empty slice it will match all sources. | |
| 122 // | |
| 123 // Note that the Commits in the Tile will only contain the commit id and | |
| 124 // the timestamp, the Author will not be populated. | |
| 125 TileFromRangeAndSources(begin, end time.Time, sources []string) (*tiling .Tile, error) | |
| 126 | 111 |
| 127 // Create a Tile for the given commit ids. Commits should be provided in | 112 // List returns all the CommitID's between begin and end. |
| 128 // time order. | 113 List(begin, end time.Time) ([]*CommitID, error) |
| 129 // | 114 |
| 130 // Note that the Commits in the Tile will only contain the commit id and | 115 // Create a Tile for the given commit ids. Will build the Tile using the |
| 131 // the timestamp, the Author will not be populated. | 116 // commits in the order they are provided. |
| 132 TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error) | 117 // |
| 118 // Note that the Commits in the Tile will only contain the commit id and | |
| 119 // the timestamp, the Author will not be populated. | |
| 120 TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error) | |
| 121 | |
| 122 // Close the datastore. | |
| 123 Close() error | |
|
stephana
2015/10/19 15:03:49
There is no way to enumerate the CommitIDs current
jcgregorio
2015/10/19 15:11:44
To do that simply call:
List(time.Time{}, time.
| |
| 133 } | 124 } |
| 134 | 125 |
| 135 The above interface depends on the CommitID struct, which is: | 126 The above interface depends on the CommitID struct, which is: |
| 136 | 127 |
| 137 // CommitID represents the time of a particular commit, where a commit could either be | 128 // CommitID represents the time of a particular commit, where a commit could either be |
| 138 // a real commit into the repo, or an event like running a trybot. | 129 // a real commit into the repo, or an event like running a trybot. |
| 139 type CommitID struct { | 130 type CommitID struct { |
| 140 Timestamp time.Time | 131 Timestamp time.Time |
| 141 ID string // Normally a git hash, but could also be Rietveld issue id + patch id. | 132 ID string // Normally a git hash, but could also be Rietveld patch id. |
| 133 Source string // The branch name, e.g. "master", or the Reitveld issue id. | |
| 142 } | 134 } |
|
stephana
2015/10/19 15:03:49
typo: Rietveld
I don't see a simple way to enume
jcgregorio
2015/10/19 15:11:44
Use List() with a beginning and ending time that y
stephana
2015/10/19 15:22:40
That means I have to load the equivalent of a curr
jcgregorio
2015/10/19 20:30:11
Fixed Typo.
| |
| 143 | 135 |
| 144 func (c *CommitID) String() string { | 136 And Entry, which is: |
| 145 return fmt.Sprintf("%s%s", c.Timestamp.Format(time.RFC3339), c.ID) | |
| 146 } | |
| 147 | 137 |
| 148 And DBEntry, which is: | 138 // Entry holds the params and a value for single measurement. |
| 149 | 139 type Entry struct { |
| 150 // DBEntry holds the params and a value for single measurement. | |
| 151 type DBEntry struct { | |
| 152 Params map[string]string | 140 Params map[string]string |
| 153 | 141 |
| 154 // Value is the value of the measurement. | 142 // Value is the value of the measurement. |
| 155 // | 143 // |
| 156 // It should be the digest string converted to a []byte, or a float64 | 144 // It should be the digest string converted to a []byte, or a float64 |
| 157 // converted to a little endian []byte. I.e. tiling.Trace.SetAt | 145 // converted to a little endian []byte. I.e. tiling.Trace.SetAt |
| 158 // should be able to consume this value. | 146 // should be able to consume this value. |
| 159 Value []byte | 147 Value []byte |
| 160 } | 148 } |
| 161 | 149 |
| 162 Note that this will require adding a new method to the Trace interface: | 150 Note that this will require adding a new method to the Trace interface: |
| 163 | 151 |
| 164 // Sets the value of the measurement at index. | 152 // Sets the value of the measurement at index. |
| 165 // | 153 // |
| 166 // Each specialization will convert []byte to the correct type. | 154 // Each specialization will convert []byte to the correct type. |
| 167 SetAt(index int, value []byte) error | 155 SetAt(index int, value []byte) error |
| 168 | 156 |
| 157 | |
| 158 BoltDB Implementation | |
| 159 ===================== | |
| 160 | |
| 161 For local testing the Go interface above will be implemented in terms of the | |
| 162 gRPC interface defined below with a BoltDB store. I.e. there will be a | |
| 163 standalone server that implements the following gRPC interface. | |
| 164 | |
| 165 The gRPC interface is similar to the Go interface, with Add and List operating | |
| 166 exactly the same. The only difference is in retrieving data, which means that | |
| 167 TileForCommits is broken down into two different calls, GetValues, and | |
| 168 GetParams, which the caller can use to build a Tile from. | |
| 169 | |
| 170 // TraceDB stores trace information for both Gold and Perf. | |
| 171 service TraceDB { | |
| 172 // Returns a list of traceids that don't have Params stored in the datasto re. | |
| 173 rpc MissingParams(MissingParamsRequest) returns (MissingParamsResponse) {} | |
| 174 | |
| 175 // Adds Params for a set of traceids. | |
| 176 rpc AddParams(AddParamsRequest) returns (EmptyResponse) {} | |
| 177 | |
| 178 // Adds data for a set of traces for a particular commitid. | |
| 179 rpc Add(AddRequest) returns (AddResponse) {} | |
| 180 | |
| 181 // Removes data for a particular commitid. | |
| 182 rpc Remove(RemoveRequest) returns (EmptyResponse) {} | |
| 183 | |
| 184 // List returns all the CommitIDs that exist in the given time range. | |
| 185 rpc List(ListRequest) return (ListResponse) {} | |
| 186 | |
| 187 // GetValues returns all the trace values stored for the given CommitID. | |
| 188 rpc GetValues(GetValuesRequest) (GetValuesResponse) | |
| 189 | |
| 190 // GetParams returns the Params for all of the given traces. | |
| 191 rpc GetParams(GetParamsRequest) (GetParamsResponse) | |
| 192 } | |
| 193 | |
| 194 See `go/tracedb/proto/tracestore.proto` for more details. | |
| 195 | |
| 196 | |
| 197 To actually handle this in BoltDB we will need to create three buckets, one for | |
| 198 the per-commit values in each trace, and another for the trace-level | |
| 199 information, such as the params for each trace, and a third for mapping | |
| 200 traceids to much shorter int64 values. | |
| 201 | |
| 202 traceid bucket | |
| 203 -------------- | |
| 204 | |
| 205 To reduce the amount of data stored, we'll map traceids to 64 bit ints | |
| 206 and use the 64 bit ints as the keys to the maps stored in the commit | |
| 207 bucket. The traceid bucket maps traceids to trace64id, and vice versa. | |
| 208 | |
| 209 There is a special key, "the largest trace64id", which isn't a valid traceid, wh ich | |
| 210 contains the largest trace64id seen, and defaults to 0 if not set. | |
| 211 | |
| 212 commit bucket | |
| 213 ------------- | |
| 214 | |
| 215 The keys for the commit bucket are structured as: | |
| 216 | |
| 217 [timestamp]:[git hash]:[branch name] | |
| 218 | |
| 219 The key maps to a serialized values and their trace64ids. I.e. a serialized | |
| 220 map[uint64][]byte, where the uint64 is the trace64id. | |
|
stephana
2015/10/19 20:00:19
Shouldn't this be the '!' delimited concatenation
jcgregorio
2015/10/19 20:30:11
Fixed.
On 2015/10/19 at 20:00:19, stephana wrote:
| |
| 221 | |
| 222 trace bucket | |
| 223 ------------ | |
| 224 | |
| 225 The keys for the trace bucket are traceids. | |
| 226 | |
| 227 [traceid] | |
| 228 | |
| 229 The values are structs serialized Protocol Buffers that contain the params for | |
| 230 each trace and the original traceid. | |
| 231 | |
| 232 constructor | |
| 233 ----------- | |
| 234 | |
| 235 func NewTraceStoreDB(conn *grpc.ClientConn, tb tiling.TraceBuilder) (DB, err or) { | |
| 236 | |
| 237 Cloud BigTable Implementation | |
| 238 ============================= | |
| 239 | |
| 240 For production use the Go interface will also have a BigTable implementation. | |
| 241 This will be designed to hold information for multiple types of applications, | |
| 242 such as perf and gold, in the same tables. It will also be able to handle | |
| 243 storing data from multiple instances of the same application, such as for | |
| 244 gold-prod, gold-android, and gold-blink. | |
| 245 | |
| 246 Cluster ID: skia-infra | |
| 247 | |
| 248 Table Name | Column Families | |
| 249 -------------|---------------- | |
| 250 commits | key values | |
| 251 traces | key params | |
| 252 | |
| 253 commits | |
| 254 ------- | |
| 255 The commits table contains all the data stored in the traces, either the | |
| 256 float64s or the digests | |
| 257 | |
| 258 The key for the commits table is: | |
| 259 | |
| 260 md5('id':'branch':'app') | |
| 261 | |
| 262 The 'key' column family contains the following columns: | |
| 263 id - The git hash or trybot patch id. | |
| 264 branch - The git branch name or the code review id. | |
| 265 app - The name of the app, such as 'gold-prod', 'gold-blink', or 'perf'. | |
| 266 ts - The timestamp of the commit. | |
| 267 | |
| 268 The 'values' column family contains the following columns: | |
| 269 "[traceid]" - One column for each traceid, the cell value is either a float64 or a digest. | |
| 270 | |
| 271 | |
| 272 traces | |
| 273 ------ | |
| 274 The Traces table will contain information about each trace. | |
| 275 | |
| 276 The key for the traces table is: | |
| 277 | |
| 278 md5('traceid':'app') | |
| 279 | |
| 280 The 'key' column family contains the following columns: | |
| 281 traceid - The trace id. | |
| 282 app - The name of the app, such as 'gold-prod', 'gold-blink', or 'perf'. | |
| 283 | |
| 284 The 'params' column family contains the following columns: | |
| 285 params - A serialized map[string]string of the trace params. | |
| 286 | |
| 287 | |
| 288 constructor | |
| 289 ----------- | |
| 290 | |
| 291 func NewBigTableTraceStoreDB(app string, tb tiling.TraceBuilder, client *big table.Client) (DB, error) | |
| 292 | |
| 169 Usage | 293 Usage |
| 170 ===== | 294 ===== |
| 171 | 295 |
| 172 Here is how the single TileFromRangeAndSources can be used to satisfy all the ab ove requirements: | 296 Here is how the single TileFromCommits can be used to satisfy all the above requ irements: |
| 173 | 297 |
| 174 1. Build a tile of the last N commits from master. | 298 1. Build a tile of the last N commits from master. |
| 175 * Find the ~Nth commit via gitinfo, along with its timestamp. Then call | 299 * Find the last N commits via gitinfo, construct CommitIDs for each one, then call: |
| 176 | 300 |
| 177 TileFromRangeAndSources(nth.Timestamp, head.Timestamp, []string{"master"}) | 301 TileFromCommits(commits) |
| 178 | 302 |
| 179 2. Build a Tile for a trybot. | 303 2. Build a Tile for a trybot. |
| 180 * Find the Reitveld issue id and created time of each patchset. Use the | 304 * Find the Reitveld issue id and created time of each patchset. Use the |
| 181 patchset ids and created timestamps to create a slice of CommitID's to use | 305 patchset ids and created timestamps to create a slice of CommitID's to use |
| 182 in: | 306 in: |
| 183 | 307 |
| 184 TileFromCommits(commits) | 308 TileFromCommits(commits) |
| 185 | 309 |
| 186 or if you know the timestamp when the issue was created: | |
| 187 | |
| 188 TileFromRangeAndSources(created.Timestamp, time.Now(), []string{"[coderevi ew id]"}) | |
| 189 | |
| 190 3. Build a Tile for a single trybot result vs a specific commit. | 310 3. Build a Tile for a single trybot result vs a specific commit. |
| 191 * Find the Reitveld issue id and created time of the patchset. Find the | 311 * Find the Reitveld issue id and created time of the patchset. Find the |
| 192 commitid of the target commit: | 312 commitid of the target commit: |
| 193 | 313 |
| 194 TileFromCommits([]*CommitID{trybot, commit}) | 314 TileFromCommits([]*CommitID{trybot, commit}) |
| 195 | 315 |
| 196 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf). | 316 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf). |
| 197 * Given the time range: | 317 * Given the time range, build CommitIDs from gitinfo, then call: |
| 198 | 318 |
| 199 TileFromRangeAndSources(beginTimestamp, endTimestamp, []string{"master"}) | 319 TileFromCommits(commits) |
| 200 | 320 |
| 201 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main). | 321 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main). |
| 202 * Given the time range, the empty slice for source means include all sources: | 322 * Given the time range, call List, then TileFromCommits: |
| 203 | 323 |
| 204 TileFromRangeAndSources(beginTimestamp, endTimestamp, []string{}) | 324 commits, err := List(beginTimestamp, endTimestamp) |
| 325 TileFromCommits(commits) | |
| 205 | 326 |
| 206 6. Build a Tile for all commits to main and a given branch for a given time rang e. (See how a single branch compares to main). | 327 6. Build a Tile for all commits to main and a given branch for a given time rang e. (See how a single branch compares to main). |
| 207 * Find the ~Nth commit via gitinfo. Then call: | 328 * Find the ~Nth commit via gitinfo. Then call List, filter the results, then c all TileFromCommits. |
| 208 | 329 |
| 209 TileFromRangeAndSources(nth.Timestamp, head.Timestamp, []string{"master", "[codereview id]"}) | 330 commits, err := List(beginTimestamp, endTimestamp) |
| 210 | 331 // Filter commits to only include values from the desired branches. |
| 211 Note that this might return multiple tries, i.e. one for each patchset. | 332 TileFromCommits(commits) |
| OLD | NEW |