appengine/swarming/server/task_queues.py - Issue 2832203002: task_queues: Add more scaffolding

Side by Side Diff: appengine/swarming/server/task_queues.py

Issue 2832203002: task_queues: Add more scaffolding (Closed)

Patch Set: . Created 3 years, 8 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
1 # Copyright 2017 The LUCI Authors. All rights reserved.	1 # Copyright 2017 The LUCI Authors. All rights reserved.

2 # Use of this source code is governed under the Apache License, Version 2.0	2 # Use of this source code is governed under the Apache License, Version 2.0

3 # that can be found in the LICENSE file.	3 # that can be found in the LICENSE file.

4	4

5 """Ambient task queues generated from the actual load.	5 """Ambient task queues generated from the actual load.

6	6

7 This means that the task queues are deduced by the actual load, they are never	7 This means that the task queues are deduced by the actual load, they are never

8 explicitly defined. They 'disapear' once the load stops, that is, no task with	8 explicitly defined. They are eventually deleted by a cron job once no incoming

9 the exact set of dimensions is triggered anymore.	9 task with the exact set of dimensions is triggered anymore.

10	10

11 Used to optimize scheduling.	11 Used to optimize scheduling.

	12

	13 +---------+

	14 \|BotRoot \| <bot_management.py>

	15 \|id=bot_id\|

	16 +---------+

	17 \|

	18 \|

	19 +--------------------+

	20 \| \|

	21 v v

	22 +-------------------+ +-------------------+

	23 \|BotTaskDimensions \| ... \|BotTaskDimensions \|

	24 \|id=<dimension_hash>\| ... \|id=<dimension_hash>\|

	25 +-------------------+ +-------------------+

	26

	27 +-------Root------------+

	28 \|TaskDimensionsRoot \| (not stored)

	29 \|id=<pool:foo or id:foo>\|

	30 +-----------------------+

	31 \|

	32 v

	33 +-------------------+

	34 \|TaskDimensions \|

	35 \|id=<dimension_hash>\|

	36 +-------------------+

12 """	37 """

13	38

	39 import datetime

	40

	41 from google.appengine.ext import ndb

	42

	43

	44 # Frequency at which these entities must be refreshed. This value is a trade off

	45 # between constantly updating BotTaskDimensions and TaskDimensions vs keeping

	46 # them alive for longer than necessary, causing unnecessary queries in

	47 # get_queues() users.

	48 _ADVANCE = datetime.timedelta(hours=1)

	49

	50

14 ### Models.	51 ### Models.

15	52

16	53

	54 class BotTaskDimensions(ndb.Model):

	55 """Stores the precalculated hashes for this bot.

	56

	57 Parent is BotRoot.

	58 key id is <dimensions_hash>.

	59

	60 This hash could be conflicting different properties set, but this doesn't

	61 matter at this level because disambiguation is done in TaskDimensions

	62 entities.

	63

	64 The number of stored entities is:

	65 <number of bots> x <TaskDimensions each bot support>

	66

	67 The actual number if a direct function of the variety of the TaskDimensions.

	68 """

	69 # Validity time, at which this entity should be considered irrelevant.

	70 valid_until_ts = ndb.DateTimeProperty()

	71

	72

	73 class TaskDimensionsRoot(ndb.Model):

	74 """Ghost root entity to group kinds of tasks to a common root.

	75

	76 This root entity is not stored in the DB.

	77

	78 id is either 'id:<value>' or 'pool:<value>'. For a request dimensions set that

	79 specifies both keys, one TaskDimensions is listed in each root.

	80 """

	81 pass

	82

	83

	84 class TaskDimensionsSet(ndb.Model):

	85 """Embedded struct to store a set of dimensions.

	86

	87 This entity is not stored, it is contained inside TaskDimensions.sets.

	88 """

	89 # 'key:value' strings. This is stored to enable match(). This is important as

	90 # the dimensions_hash can be colliding! In this case, a set of

	91 # dimensions_flat can exist.

	92 dimensions_flat = ndb.StringProperty(repeated=True, indexed=False)

	93

	94 def match(self, bot_dimensions):

	95 """Returns True if this bot can run this request dimensions set."""

	96 for d in self.dimensions_flat:

	97 key, value = d.split(':', 1)

	98 if value not in bot_dimensions.get(key, []):

	99 return False

	100 return True

	101

	102

	103 class TaskDimensions(ndb.Model):

	104 """List dimensions for each kind of task.

	105

	106 Parent is TaskDimensionsRoot
	Vadim Sh. 2017/04/26 18:26:24 Any reason we use parents instead of storing each Any reason we use parents instead of storing each entity separately? I can imagine situations where 1 QPS here may be limiting. M-A Ruel 2017/04/26 21:07:07 It's for coherency. There's a risk if inside a poo Show quoted text On 2017/04/26 18:26:24, Vadim Sh. wrote: > Any reason we use parents instead of storing each entity separately? I can > imagine situations where 1 QPS here may be limiting. It's for coherency. There's a risk if inside a pool there's a lot of variations (thousands) but the key point is to have no transaction. The only transaction is in the single threaded cron job to remove stale entities.
	107 key id is <dimensions_hash>

	108

	109 A single dimensions_hash may represent multiple independent queues in a single

	110 root. This is because the hash is very compressed (32 bits). This is handled

	111 specifically here by having one set of TaskDimensionsSet per 'set'.

	112

	113 The worst case of having hash collision is unneeded scanning for unrelated

	114 tasks in get_queues(). This is bad but not the end of the world.

	115

	116 It is only a function of the number of different tasks, so it is not expected

	117 to be very large, only in the few hundreds. The exception is when one task per

	118 bot is triggered, which leads to have at <number of bots> TaskDimensions

	119 entities.

	120 """

	121 # Validity time, at which this entity should be considered irrelevant.

	122 # Entities with valid_until_ts in the past are considered inactive and are not

	123 # used. valid_until_ts is set in assert_task() to "TaskRequest.expiration_ts +

	124 # _ADVANCE". It is updated when an assert_task() call sees that valid_until_ts

	125 # becomes lower than TaskRequest.expiration_ts for a later task. This enables

	126 # not updating the entity too frequently, at the cost of keeping a dead queue

	127 # "alive" for a bit longer than strictly necessary.

	128 valid_until_ts = ndb.DateTimeProperty()

	129

	130 # One or multiple sets of request dimensions this dimensions_hash represents.

	131 sets = ndb.LocalStructuredProperty(TaskDimensionsSet, repeated=True)

	132

	133 def confirm(self, dimensions):

	134 """Confirms that this instance actually stores this set."""

	135 return self.confirm_flat(

	136 frozenset('%s:%s' % (k, v) for k, v in dimensions.iteritems()))
	Vadim Sh. 2017/04/26 18:26:24 remove frozenset, since confirm_flat already conve remove frozenset, since confirm_flat already converts iterable to a set M-A Ruel 2017/04/26 21:07:07 Done. Show quoted text On 2017/04/26 18:26:24, Vadim Sh. wrote: > remove frozenset, since confirm_flat already converts iterable to a set Done.
	137

	138 def confirm_flat(self, dimensions_flat):

	139 """Confirms that this instance actually stores this set."""

	140 x = frozenset(dimensions_flat)
	Vadim Sh. 2017/04/26 18:26:24 'fronzen' here is not needed, 'x' is never returne 'fronzen' here is not needed, 'x' is never returned or stored M-A Ruel 2017/04/26 21:07:07 It's meant as a "reader expectation" signal. I thi Show quoted text On 2017/04/26 18:26:24, Vadim Sh. wrote: > 'fronzen' here is not needed, 'x' is never returned or stored It's meant as a "reader expectation" signal. I think there is perf implication but I'm not sure. Vadim Sh. 2017/04/26 21:24:55 This is confusing reader expectation signal :-/ fr Show quoted text On 2017/04/26 21:07:07, M-A Ruel wrote: > On 2017/04/26 18:26:24, Vadim Sh. wrote: > > 'fronzen' here is not needed, 'x' is never returned or stored > > It's meant as a "reader expectation" signal. This is confusing reader expectation signal :-/ fronzenset is used when returning someing (or storing something) to indicate it is constant. Here it's neither returned, nor stored.
	141 return any(not x.difference(s.dimensions_flat) for s in self.sets)

	142

	143 def match(self, bot_dimensions):

	144 """Returns True if this bot can run one of the request dimensions sets

	145 represented by this dimensions_hash.

	146 """

	147 return any(s.match(bot_dimensions) for s in self.sets)
	Vadim Sh. 2017/04/26 18:26:24 strictly speaking, this can be optimized by using strictly speaking, this can be optimized by using some tree-like structure, but it's probably not worth it M-A Ruel 2017/04/26 21:07:07 Acknowledged. Show quoted text On 2017/04/26 18:26:24, Vadim Sh. wrote: > strictly speaking, this can be optimized by using some tree-like structure, but > it's probably not worth it Acknowledged.
	148

	149

17 ### Private APIs.	150 ### Private APIs.

18	151

19	152

20 ### Public APIs.	153 ### Public APIs.

21	154

22	155

	156 def hash_dimensions(dimensions_json):

	157 """Returns a 32 bits int that is a hash of the dimensions specified.

	158

	159 dimensions_json must already be encoded as json.
	Vadim Sh. 2017/04/26 18:26:24 I hope you changed it to accept dimensions as is a I hope you changed it to accept dimensions as is and serialize it here instead. M-A Ruel 2017/04/26 21:07:07 Oh yes, I had forgot to backport to this CL. Done Show quoted text On 2017/04/26 18:26:24, Vadim Sh. wrote: > I hope you changed it to accept dimensions as is and serialize it here instead. Oh yes, I had forgot to backport to this CL. Done for completeness.
	160

	161 The return value is guaranteed to be non-zero so it can be used as a key id in

	162 a ndb.Key.

	163 """

	164 del dimensions_json

	165

	166

23 def assert_bot(bot_dimensions):	167 def assert_bot(bot_dimensions):

24 """Prepares the dimensions for the queues."""	168 """Prepares the dimensions for the queues."""

25 assert len(bot_dimensions[u'id']) == 1, bot_dimensions	169 assert len(bot_dimensions[u'id']) == 1, bot_dimensions

26	170

27	171

28 def assert_task(request):	172 def assert_task(request):

29 """Prepares the dimensions for the queues.	173 """Makes sure the TaskRequest dimensions are listed as a known queue.

30	174

31 The generated entities are root entities.	175 This function must be called before storing the TaskRequest in the DB.

32 """	176

33 del request	177 When a cache miss occurs, a task queue is triggered.

	178

	179 Warning: the task will not be run until the task queue ran, which causes a

	180 user visible delay. This only occurs on new kind of requests, which is not

	181 that often in practice.

	182 """

	183 assert not request.key, request.key

	184

	185

	186 def get_queues(bot_id):

	187 """Queries all the known task queues in parallel and yields the task in order

	188 of priority.

	189

	190 The ordering is opportunistic, not strict.

	191 """

	192 assert isinstance(bot_id, unicode), repr(bot_id)
	Vadim Sh. 2017/04/26 18:26:24 unicode, really? Do we have bots with non-ascii na unicode, really? Do we have bots with non-ascii names? It's not possible. M-A Ruel 2017/04/26 21:07:07 everything is unicode, I just wanted to make sure Show quoted text On 2017/04/26 18:26:24, Vadim Sh. wrote: > unicode, really? Do we have bots with non-ascii names? It's not possible. everything is unicode, I just wanted to make sure I didn't mess the type.
	193 return []

	194

	195

	196 def rebuild_task_cache(dimensions_hash, dimensions_flat):

	197 """Rebuilds the TaskDimensions cache.

	198

	199 This code implicitly depends on bot_management.bot_event() being called for

	200 the bots.

	201

	202 This function is called in two cases:

	203 - A new kind of dimensions never seen before

	204 - The TaskDimensions.valid_until_ts expired

	205

	206 It is a cache miss, query all the bots and check for the ones which can run

	207 the task.

	208

	209 Warning: There's a race condition, where the TaskDimensions query could be

	210 missing some instances due to eventually coherent consistency in the BotInfo

	211 query. This only happens when there's new request dimensions set AND a bot

	212 that can run this task recently showed up.

	213

	214 Runtime expectation: the scale on the number of bots that can run the task,

	215 via BotInfo.dimensions_flat filtering. As there can be tens of thousands of

	216 bots that can run the task, this can take a long time to store all the

	217 entities on a new kind of request. As such, it must be called in the backend.

	218 """

	219 del dimensions_hash

	220 del dimensions_flat

	221

	222

	223 def tidy_stale():

	224 """Searches for all stale BotTaskDimensions and TaskDimensions and delete

	225 them.

	226

	227 Their .valid_until_ts is compared to the current time and the entity is

	228 deleted if it's older.

	229

	230 The number of entities processed is expected to be relatively low, in the few

	231 tens at most.

	232 """

	233 pass

OLD	NEW

« appengine/swarming/server/bot_management.py ('K') | « appengine/swarming/server/bot_management.py ('k') | appengine/swarming/server/task_to_run_test.py » ('j') | no next file with comments »