Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(65)

Side by Side Diff: go/tracedb/DESIGN.md

Issue 1411663004: Create gRPC client and server, traceservice, that stores trace data in a BoltDB backend. (Closed) Base URL: https://skia.googlesource.com/buildbot@master
Patch Set: clean Created 5 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | traceservice/proto/README.md » ('j') | traceservice/proto/impl.go » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 tracedb 1 tracedb
stephana 2015/10/20 14:42:32 Could go/traceservice/ be merged into go/tracedb/
2 ======= 2 =======
3 3
4 The tracedb package is designed to replace the current storage system for 4 The tracedb package is designed to replace the current storage system for
5 traces, tiles, with a new BoltDB backend that allows for much more flexibility 5 traces, tiles, with a new backend that allows for much more flexibility
6 and an increase in the size of data that can be stored. The new system needs 6 and an increase in the size of data that can be stored. The new system needs
7 to support both branches and trybots (note that in the future there may be no 7 to support both branches and trybots (note that in the future there may be no
8 difference between the two), while still supporting the current capabilities 8 difference between the two), while still supporting the current capabilities
9 of looking at master. 9 of looking at master.
10 10
11 The current structure for a Tile looks like: 11 The current structure for a Tile looks like:
12 12
13 type GoldenTrace struct { 13 type GoldenTrace struct {
14 Params_ map[string]string 14 Params_ map[string]string
15 Values []string 15 Values []string
(...skipping 36 matching lines...) Expand 10 before | Expand all | Expand 10 after
52 1. Build a tile of the last N commits from master. (Our only usage today.) 52 1. Build a tile of the last N commits from master. (Our only usage today.)
53 2. Build a Tile for a trybot. 53 2. Build a Tile for a trybot.
54 3. Build a Tile for a single trybot result vs a specific commit. 54 3. Build a Tile for a single trybot result vs a specific commit.
55 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf.) 55 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf.)
56 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main.) 56 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main.)
57 6. Build a Tile for all commits to main and a given branch for a given time rang e. (See how a single branch compares to main.) 57 6. Build a Tile for all commits to main and a given branch for a given time rang e. (See how a single branch compares to main.)
58 58
59 Assumptions 59 Assumptions
60 =========== 60 ===========
61 61
62 1. We will use queries to the BoltDB to build in-memory Tiles. 62 1. We will use queries to the interface to build in-memory Tiles.
63 2. We can extract a timestamp from Reitveld for each patch. 63 2. We can extract a timestamp from Reitveld for each patch.
64 64
65 Design 65 Design
66 ====== 66 ======
67 67
68 To actually handle this in BoltDB we will need to create two buckets, one 68 The design will actually be done in two layers, tracedb.DB, which is the Go inte rface
69 for the per-commit values in each trace, and another for the trace-level 69 for talking to the data store, and then there will be two concrete implementatio ns.
70 information, such as the params for each trace. 70 The first implementation will be the gRPC based server, and the second will be C loud BigTable.
71 71
72 commit bucket
73 -------------
74 72
75 The keys for the commit bucket are structured as: 73            +-------------+
74            | tracedb.DB  |
75            | interface   |
76            +-------------+
77                   |
78       +-----------+-----------+
79 | |
80       |                       |
81  +------v------+        +-------v------+
82  | gRPC Server |        | |
83  | BoltDB      |        | GCE BigTable |
84  +-------------+        +--------------+
76 85
77 [timestamp]:[git hash]:[branch name]:[trace_key]
78 86
79 and the keys map to a single value []byte, that is either the Gold digest or 87 tracedb.DB Interface
80 the Perf float64 measurement value. 88 --------------------
81 89
82 Note that to search through a time range for a specific branch name we'll need 90 This is the Go interface to the storage for traces. The interface to tracedb loo ks like:
83 to do the filtering inside the closure we pass to BoltDB.
84
85 trace bucket
86 ------------
87
88 The keys for the trace bucket are just the trace keys.
89
90 [trace_key]
91
92 The values are structs serialized as JSON that contain the params for each
93 trace. We are using JSON over GOB since these are relatively small structs.
94
95 Interface
96 ---------
97
98 The interface to tracedb looks like:
99 91
100 // DB represents the interface to any datastore for perf and gold results. 92 // DB represents the interface to any datastore for perf and gold results.
101 // 93 //
102 // Notes: 94 // Notes:
103 // 1. If 'sources' is an empty slice it will match all sources. 95 // 1. The Commits in the Tile will only contain the commit id and
104 // 2. The Commits in the Tile will only contain the commit id and
105 // the timestamp, the Author will not be populated. 96 // the timestamp, the Author will not be populated.
106 // 3. The Tile's Scale and TileIndex will be set to 0. 97 // 2. The Tile's Scale and TileIndex will be set to 0.
107 // 98 //
108 type DB interface { 99 type DB interface {
109 // Add new information to the datastore. 100 // Add new information to the datastore.
110 // 101 //
111 // source - Either a branch name or a Rietveld issue id. 102 // The values maps a trace id to a Entry.
112 // values - maps the trace id to a DBEntry. 103 //
113 // 104 // Note that only allowing adding data for a single commit at a time
114 // Note that only allowing adding data for a single commit at a time 105 // should work well with ingestion while still breaking up writes into
115 // should work well with ingestion while still breaking up writes into 106 // shorter actions.
116 // shorter actions. 107 Add(commitID *CommitID, values map[string]*Entry) error
117 Add(commitID *CommitID, source string, values map[string]*DBEntry) error
118 108
119 // Create a Tile based on the given query parameters. 109 // Remove the given commit from the datastore.
120 // 110 Remove(commitID *CommitID) error
121 // If 'sources' is an empty slice it will match all sources.
122 //
123 // Note that the Commits in the Tile will only contain the commit id and
124 // the timestamp, the Author will not be populated.
125 TileFromRangeAndSources(begin, end time.Time, sources []string) (*tiling .Tile, error)
126 111
127 // Create a Tile for the given commit ids. Commits should be provided in 112 // List returns all the CommitID's between begin and end.
128 // time order. 113 List(begin, end time.Time) ([]*CommitID, error)
129 // 114
130 // Note that the Commits in the Tile will only contain the commit id and 115 // Create a Tile for the given commit ids. Will build the Tile using the
131 // the timestamp, the Author will not be populated. 116 // commits in the order they are provided.
132 TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error) 117 //
118 // Note that the Commits in the Tile will only contain the commit id and
119 // the timestamp, the Author will not be populated.
120 TileFromCommits(commitIDs []*CommitID) (*tiling.Tile, error)
121
122 // Close the datastore.
123 Close() error
stephana 2015/10/19 15:03:49 There is no way to enumerate the CommitIDs current
jcgregorio 2015/10/19 15:11:44 To do that simply call: List(time.Time{}, time.
133 } 124 }
134 125
135 The above interface depends on the CommitID struct, which is: 126 The above interface depends on the CommitID struct, which is:
136 127
137 // CommitID represents the time of a particular commit, where a commit could either be 128 // CommitID represents the time of a particular commit, where a commit could either be
138 // a real commit into the repo, or an event like running a trybot. 129 // a real commit into the repo, or an event like running a trybot.
139 type CommitID struct { 130 type CommitID struct {
140 Timestamp time.Time 131 Timestamp time.Time
141 ID string // Normally a git hash, but could also be Rietveld issue id + patch id. 132 ID string // Normally a git hash, but could also be Rietveld patch id.
133 Source string // The branch name, e.g. "master", or the Reitveld issue id.
142 } 134 }
stephana 2015/10/19 15:03:49 typo: Rietveld I don't see a simple way to enume
jcgregorio 2015/10/19 15:11:44 Use List() with a beginning and ending time that y
stephana 2015/10/19 15:22:40 That means I have to load the equivalent of a curr
jcgregorio 2015/10/19 20:30:11 Fixed Typo.
143 135
144 func (c *CommitID) String() string { 136 And Entry, which is:
145 return fmt.Sprintf("%s%s", c.Timestamp.Format(time.RFC3339), c.ID)
146 }
147 137
148 And DBEntry, which is: 138 // Entry holds the params and a value for single measurement.
149 139 type Entry struct {
150 // DBEntry holds the params and a value for single measurement.
151 type DBEntry struct {
152 Params map[string]string 140 Params map[string]string
153 141
154 // Value is the value of the measurement. 142 // Value is the value of the measurement.
155 // 143 //
156 // It should be the digest string converted to a []byte, or a float64 144 // It should be the digest string converted to a []byte, or a float64
157 // converted to a little endian []byte. I.e. tiling.Trace.SetAt 145 // converted to a little endian []byte. I.e. tiling.Trace.SetAt
158 // should be able to consume this value. 146 // should be able to consume this value.
159 Value []byte 147 Value []byte
160 } 148 }
161 149
162 Note that this will require adding a new method to the Trace interface: 150 Note that this will require adding a new method to the Trace interface:
163 151
164 // Sets the value of the measurement at index. 152 // Sets the value of the measurement at index.
165 // 153 //
166 // Each specialization will convert []byte to the correct type. 154 // Each specialization will convert []byte to the correct type.
167 SetAt(index int, value []byte) error 155 SetAt(index int, value []byte) error
168 156
157
158 BoltDB Implementation
159 =====================
160
161 For local testing the Go interface above will be implemented in terms of the
162 gRPC interface defined below with a BoltDB store. I.e. there will be a
163 standalone server that implements the following gRPC interface.
164
165 The gRPC interface is similar to the Go interface, with Add and List operating
166 exactly the same. The only difference is in retrieving data, which means that
167 TileForCommits is broken down into two different calls, GetValues, and
168 GetParams, which the caller can use to build a Tile from.
169
170 // TraceDB stores trace information for both Gold and Perf.
171 service TraceDB {
172 // Returns a list of traceids that don't have Params stored in the datasto re.
173 rpc MissingParams(MissingParamsRequest) returns (MissingParamsResponse) {}
174
175 // Adds Params for a set of traceids.
176 rpc AddParams(AddParamsRequest) returns (EmptyResponse) {}
177
178 // Adds data for a set of traces for a particular commitid.
179 rpc Add(AddRequest) returns (AddResponse) {}
180
181 // Removes data for a particular commitid.
182 rpc Remove(RemoveRequest) returns (EmptyResponse) {}
183
184 // List returns all the CommitIDs that exist in the given time range.
185 rpc List(ListRequest) return (ListResponse) {}
186
187 // GetValues returns all the trace values stored for the given CommitID.
188 rpc GetValues(GetValuesRequest) (GetValuesResponse)
189
190 // GetParams returns the Params for all of the given traces.
191 rpc GetParams(GetParamsRequest) (GetParamsResponse)
192 }
193
194 See `go/tracedb/proto/tracestore.proto` for more details.
195
196
197 To actually handle this in BoltDB we will need to create three buckets, one for
198 the per-commit values in each trace, and another for the trace-level
199 information, such as the params for each trace, and a third for mapping
200 traceids to much shorter int64 values.
201
202 traceid bucket
203 --------------
204
205 To reduce the amount of data stored, we'll map traceids to 64 bit ints
206 and use the 64 bit ints as the keys to the maps stored in the commit
207 bucket. The traceid bucket maps traceids to trace64id, and vice versa.
208
209 There is a special key, "the largest trace64id", which isn't a valid traceid, wh ich
210 contains the largest trace64id seen, and defaults to 0 if not set.
211
212 commit bucket
213 -------------
214
215 The keys for the commit bucket are structured as:
216
217 [timestamp]:[git hash]:[branch name]
218
219 The key maps to a serialized values and their trace64ids. I.e. a serialized
220 map[uint64][]byte, where the uint64 is the trace64id.
stephana 2015/10/19 20:00:19 Shouldn't this be the '!' delimited concatenation
jcgregorio 2015/10/19 20:30:11 Fixed. On 2015/10/19 at 20:00:19, stephana wrote:
221
222 trace bucket
223 ------------
224
225 The keys for the trace bucket are traceids.
226
227 [traceid]
228
229 The values are structs serialized Protocol Buffers that contain the params for
230 each trace and the original traceid.
231
232 constructor
233 -----------
234
235 func NewTraceStoreDB(conn *grpc.ClientConn, tb tiling.TraceBuilder) (DB, err or) {
236
237 Cloud BigTable Implementation
238 =============================
239
240 For production use the Go interface will also have a BigTable implementation.
241 This will be designed to hold information for multiple types of applications,
242 such as perf and gold, in the same tables. It will also be able to handle
243 storing data from multiple instances of the same application, such as for
244 gold-prod, gold-android, and gold-blink.
245
246 Cluster ID: skia-infra
247
248 Table Name | Column Families
249 -------------|----------------
250 commits | key values
251 traces | key params
252
253 commits
254 -------
255 The commits table contains all the data stored in the traces, either the
256 float64s or the digests
257
258 The key for the commits table is:
259
260 md5('id':'branch':'app')
261
262 The 'key' column family contains the following columns:
263 id - The git hash or trybot patch id.
264 branch - The git branch name or the code review id.
265 app - The name of the app, such as 'gold-prod', 'gold-blink', or 'perf'.
266 ts - The timestamp of the commit.
267
268 The 'values' column family contains the following columns:
269 "[traceid]" - One column for each traceid, the cell value is either a float64 or a digest.
270
271
272 traces
273 ------
274 The Traces table will contain information about each trace.
275
276 The key for the traces table is:
277
278 md5('traceid':'app')
279
280 The 'key' column family contains the following columns:
281 traceid - The trace id.
282 app - The name of the app, such as 'gold-prod', 'gold-blink', or 'perf'.
283
284 The 'params' column family contains the following columns:
285 params - A serialized map[string]string of the trace params.
286
287
288 constructor
289 -----------
290
291 func NewBigTableTraceStoreDB(app string, tb tiling.TraceBuilder, client *big table.Client) (DB, error)
292
169 Usage 293 Usage
170 ===== 294 =====
171 295
172 Here is how the single TileFromRangeAndSources can be used to satisfy all the ab ove requirements: 296 Here is how the single TileFromCommits can be used to satisfy all the above requ irements:
173 297
174 1. Build a tile of the last N commits from master. 298 1. Build a tile of the last N commits from master.
175 * Find the ~Nth commit via gitinfo, along with its timestamp. Then call 299 * Find the last N commits via gitinfo, construct CommitIDs for each one, then call:
176 300
177 TileFromRangeAndSources(nth.Timestamp, head.Timestamp, []string{"master"}) 301 TileFromCommits(commits)
178 302
179 2. Build a Tile for a trybot. 303 2. Build a Tile for a trybot.
180 * Find the Reitveld issue id and created time of each patchset. Use the 304 * Find the Reitveld issue id and created time of each patchset. Use the
181 patchset ids and created timestamps to create a slice of CommitID's to use 305 patchset ids and created timestamps to create a slice of CommitID's to use
182 in: 306 in:
183 307
184 TileFromCommits(commits) 308 TileFromCommits(commits)
185 309
186 or if you know the timestamp when the issue was created:
187
188 TileFromRangeAndSources(created.Timestamp, time.Now(), []string{"[coderevi ew id]"})
189
190 3. Build a Tile for a single trybot result vs a specific commit. 310 3. Build a Tile for a single trybot result vs a specific commit.
191 * Find the Reitveld issue id and created time of the patchset. Find the 311 * Find the Reitveld issue id and created time of the patchset. Find the
192 commitid of the target commit: 312 commitid of the target commit:
193 313
194 TileFromCommits([]*CommitID{trybot, commit}) 314 TileFromCommits([]*CommitID{trybot, commit})
195 315
196 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf). 316 4. Build a Tile for all commits to master in a given time range. (Be able to go back in time for either Gold or Perf).
197 * Given the time range: 317 * Given the time range, build CommitIDs from gitinfo, then call:
198 318
199 TileFromRangeAndSources(beginTimestamp, endTimestamp, []string{"master"}) 319 TileFromCommits(commits)
200 320
201 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main). 321 5. Build a Tile for all commits to all branches in a given time range. (Show how all branches compare against main).
202 * Given the time range, the empty slice for source means include all sources: 322 * Given the time range, call List, then TileFromCommits:
203 323
204 TileFromRangeAndSources(beginTimestamp, endTimestamp, []string{}) 324 commits, err := List(beginTimestamp, endTimestamp)
325 TileFromCommits(commits)
205 326
206 6. Build a Tile for all commits to main and a given branch for a given time rang e. (See how a single branch compares to main). 327 6. Build a Tile for all commits to main and a given branch for a given time rang e. (See how a single branch compares to main).
207 * Find the ~Nth commit via gitinfo. Then call: 328 * Find the ~Nth commit via gitinfo. Then call List, filter the results, then c all TileFromCommits.
208 329
209 TileFromRangeAndSources(nth.Timestamp, head.Timestamp, []string{"master", "[codereview id]"}) 330 commits, err := List(beginTimestamp, endTimestamp)
210 331 // Filter commits to only include values from the desired branches.
211 Note that this might return multiple tries, i.e. one for each patchset. 332 TileFromCommits(commits)
OLDNEW
« no previous file with comments | « no previous file | traceservice/proto/README.md » ('j') | traceservice/proto/impl.go » ('J')

Powered by Google App Engine
This is Rietveld 408576698