|
|
Chromium Code Reviews
DescriptionAdding memoized functions
Memoization is necessary when training loglinear models to avoid recomputing features over and over. Recomputing features affects not just the constant factors but also the asymptotic complexity (in a way that's already observable for small ``N``). The memoization is factored out into a class because doing so greatly reduces the code complexity over doing it manually in all the places where we need it.
BUG=
TBR=stgao@chromium.org
Committed: https://chromium.googlesource.com/infra/infra/+/d1e13b1f320c86e949a5e7310dc52b37cd1d0163
Patch Set 1 : breaking CL apart #Patch Set 2 : Filled out the tests #Patch Set 3 : rebase #
Total comments: 19
Patch Set 4 : Addressing nits #
Total comments: 5
Patch Set 5 : addressing nits #Patch Set 6 : forgot if I pushed the latest before CQ #
Dependent Patchsets: Messages
Total messages: 33 (18 generated)
Description was changed from ========== Implementing some helpful mathematics libraries BUG= ========== to ========== Adding memoized function N.B., this is still WIP and not ready for review. BUG= ==========
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Patchset #1 (id:40001) has been deleted
Patchset #1 (id:60001) has been deleted
Description was changed from ========== Adding memoized function N.B., this is still WIP and not ready for review. BUG= ========== to ========== Adding memoized function N.B., this is still WIP and not ready for review. BUG= TBR=stgao@chromium.org ==========
wrengr@chromium.org changed reviewers: + katesonia@chromium.org, mbarbella@chromium.org
Description was changed from ========== Adding memoized function N.B., this is still WIP and not ready for review. BUG= TBR=stgao@chromium.org ========== to ========== Adding memoized functions BUG= TBR=stgao@chromium.org ==========
PTAL
https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... File appengine/findit/libs/math/functions.py (right): https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:4: nit: 2 empty lines. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:20: def __call__(self, x): So we assume all the Funtions take only one parameter x? How about *args, **kwargs? https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... File appengine/findit/libs/math/test/functions_test.py (right): https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:13: nit: 2 lines https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:41: f._f = G # Change the function, so it's different than the memos. Is changing _f only for testing or it can be actually used, if so, we should add a setter for it.
https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... File appengine/findit/libs/math/functions.py (right): https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:52: # TODO(wrengr): can we remove the extra indirection somehow? I'd be surprised, especially for this case. Probably best to omit these TODOs. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... File appengine/findit/libs/math/test/functions_test.py (right): https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:14: class FunctionsTest(unittest.TestCase): Same nits for whitespace and docstrings.
inferno@chromium.org changed reviewers: + inferno@chromium.org
I don't see the value of adding this complexity. Can you explain the reasoning for this in CL description. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... File appengine/findit/libs/math/functions.py (right): https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:52: # TODO(wrengr): can we remove the extra indirection somehow? On 2016/12/02 23:50:21, Martin Barbella wrote: > I'd be surprised, especially for this case. Probably best to omit these TODOs. Are you planning to work on these TODOs in next few weeks. I don't want these to linger in code. Also, rather than phrasing these as a question to yourself, put like # TODO(wrengr): Find a way to remove the extra indirection step. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:54: try: Docstring https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... File appengine/findit/libs/math/test/functions_test.py (right): https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:35: del f._f # Discard the function (to be sure the next call comes form memos) Put comment in previous line ending with period., same for next line as well. typo: from. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:40: f(5) # Call it once, to make a memo Put comment in previous line ending with period., same for next line as well.
Patchset #4 (id:140001) has been deleted
Description was changed from ========== Adding memoized functions BUG= TBR=stgao@chromium.org ========== to ========== Adding memoized functions Memoization is necessary when training loglinear models to avoid recomputing features over and over. Recomputing features affects not just the constant factors but also the asymptotic complexity (in a way that's already observable for small ``N``). The memoization is factored out into a class because doing so greatly reduces the code complexity over doing it manually in all the places where we need it. BUG= TBR=stgao@chromium.org ==========
https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... File appengine/findit/libs/math/functions.py (right): https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:4: On 2016/12/02 23:47:18, Sharu Jiang wrote: > nit: 2 empty lines. Done. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:20: def __call__(self, x): On 2016/12/02 23:47:18, Sharu Jiang wrote: > So we assume all the Funtions take only one parameter x? How about *args, > **kwargs? Yes, we assume a single argument for the sake of simplicity. I don't think the added complexity is worth it to handle the more general case. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:52: # TODO(wrengr): can we remove the extra indirection somehow? On 2016/12/05 00:02:10, inferno wrote: > On 2016/12/02 23:50:21, Martin Barbella wrote: > > I'd be surprised, especially for this case. Probably best to omit these TODOs. > > Are you planning to work on these TODOs in next few weeks. I don't want these > to linger in code. > Also, rather than phrasing these as a question to yourself, put like > # TODO(wrengr): Find a way to remove the extra indirection step. Yes. Though I can file a bug instead if desired https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:54: try: On 2016/12/05 00:02:10, inferno wrote: > Docstring saying what? __call__ is a standard part of Python and has the standard meaning here. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... File appengine/findit/libs/math/test/functions_test.py (right): https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:13: On 2016/12/02 23:47:18, Sharu Jiang wrote: > nit: 2 lines Done. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:14: class FunctionsTest(unittest.TestCase): On 2016/12/02 23:50:21, Martin Barbella wrote: > Same nits for whitespace and docstrings. Done. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:35: del f._f # Discard the function (to be sure the next call comes form memos) On 2016/12/05 00:02:10, inferno wrote: > Put comment in previous line ending with period., same for next line as well. > typo: from. Done. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:40: f(5) # Call it once, to make a memo On 2016/12/05 00:02:10, inferno wrote: > Put comment in previous line ending with period., same for next line as well. Done. https://codereview.chromium.org/2548603003/diff/120001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:41: f._f = G # Change the function, so it's different than the memos. On 2016/12/02 23:47:18, Sharu Jiang wrote: > Is changing _f only for testing or it can be actually used, if so, we should add > a setter for it. Changing _f is only for testing. Ideally there should be no _f at all, just self (cf., https://crbug.com/671275)
lgtm
lgtm
stgao@chromium.org changed reviewers: + stgao@chromium.org
What's the most important difference of this from https://chromium.googlesource.com/infra/infra/+/cae4d6478d25b9be3607bb2098971... ? Could cache_decorator be reused here especially as http://crbug.com/671275 mentioned that decorator is better? https://codereview.chromium.org/2548603003/diff/160001/appengine/findit/libs/... File appengine/findit/libs/math/functions.py (right): https://codereview.chromium.org/2548603003/diff/160001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:42: class MemoizedFunction(Function): Do we want the memoization work across different instances of this class? In this implementation, it seems that the cached computation results is available in the same instance of this class, but not in anther instance. It depends on the real usage to decide whether this is what we expect. https://codereview.chromium.org/2548603003/diff/160001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:52: def __call__(self, x): Here, we support one parameter only. Do the usecases have only one input paramter? https://codereview.chromium.org/2548603003/diff/160001/appengine/findit/libs/... File appengine/findit/libs/math/test/functions_test.py (right): https://codereview.chromium.org/2548603003/diff/160001/appengine/findit/libs/... appengine/findit/libs/math/test/functions_test.py:12: F = lambda x: x + 1 style nit: for top-level definition, if it is not to be used by other modules, we usually prefix the name with underscore. For this case, we usually do: _F = ... -G = ...
On 2016/12/05 19:30:56, stgao (slow on Monday) wrote: > What's the most important difference of this from > https://chromium.googlesource.com/infra/infra/+/cae4d6478d25b9be3607bb2098971... > ? The biggest difference is that these memos live entirely within Python, whereas the cache_decorator goes over the wire to store things. IMO, computing the features is expensive enough that it makes sense to memoize, but not so expensive that it'd be worth the overhead of going over the wire or storing them for longer than the life of a process. Of course, we could always switch things to use the cache_decorator if that proves to be false. > Could cache_decorator be reused here especially as http://crbug.com/671275 > mentioned that decorator is better? That problem shouldn't be too difficult to fix; it'd just take the time to do so. The extra indirection is surely cheaper than the cost of using cache_decorator
https://codereview.chromium.org/2548603003/diff/160001/appengine/findit/libs/... File appengine/findit/libs/math/functions.py (right): https://codereview.chromium.org/2548603003/diff/160001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:42: class MemoizedFunction(Function): On 2016/12/05 19:30:56, stgao (slow on Monday) wrote: > Do we want the memoization work across different instances of this class? > > In this implementation, it seems that the cached computation results is > available in the same instance of this class, but not in anther instance. > It depends on the real usage to decide whether this is what we expect. Storing the memos with the instance is fine for our purposes. The ultimate design (see https://codereview.chromium.org/2517383005/) is that we have an object for the loglinear model itself with some methods mapping X to functions from Y to whatever. We want to memoize both the X to function map, and the Y to whatever maps. With one exception, we don't really pass these functions around, we just call them. (The one exception is when computing expectations, since we want to compute the expectation of one of these methods. https://codereview.chromium.org/2544493004/) The benefit of storing the memos with the instance is that it makes it easy for us to clear the memos, which is helpful when training the loglinear model since (a) we want to memoize things, but (b) the memos are invalidated whenever the weights change. https://codereview.chromium.org/2548603003/diff/160001/appengine/findit/libs/... appengine/findit/libs/math/functions.py:52: def __call__(self, x): On 2016/12/05 19:30:56, stgao (slow on Monday) wrote: > Here, we support one parameter only. Do the usecases have only one input > paramter? Yep. The use cases all take a single argument, and it doesn't seem worth the complexity to handle the more general case.
The CQ bit was checked by wrengr@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from katesonia@chromium.org, inferno@chromium.org Link to the patchset: https://codereview.chromium.org/2548603003/#ps180001 (title: "addressing nits")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/12/05 21:10:22, wrengr wrote: > On 2016/12/05 19:30:56, stgao (slow on Monday) wrote: > > What's the most important difference of this from > > > https://chromium.googlesource.com/infra/infra/+/cae4d6478d25b9be3607bb2098971... > > ? > > The biggest difference is that these memos live entirely within Python, whereas > the cache_decorator goes over the wire to store things. IMO, computing the > features is expensive enough that it makes sense to memoize, but not so > expensive that it'd be worth the overhead of going over the wire or storing them > for longer than the life of a process. Of course, we could always switch things > to use the cache_decorator if that proves to be false. It seems that we could achieve the same goal with a in-memory cache implementation that is similar to the implementation in this CL. Using memcache over the wire is mainly for deployment in production on App Engine, although we could set up local memcache stub too. BTW, I have a comment in the code for the life of the memos. They seem to live for as long as the instance of MemoizedFunction, but not the entire life of the python process. Regarding this, we could discuss in the code for better context. Also, not sure if this needs support for multi-threading. It depends on the usage too. IIRC, for the delta-test running, it has multi-thread mode. > > > Could cache_decorator be reused here especially as http://crbug.com/671275 > > mentioned that decorator is better? > > That problem shouldn't be too difficult to fix; it'd just take the time to do > so. The extra indirection is surely cheaper than the cost of using > cache_decorator It is indeed cheaper than cache_decorator currently, but not much if we have an in-momery cache implementation for the cache_decorator. One pitfall though is that cache_decorator won't cache the result if it is None. But this could be changed if needed.
On 2016/12/05 21:29:12, stgao (slow on Monday) wrote: > BTW, I have a comment in the code for the life of the memos. > They seem to live for as long as the instance of MemoizedFunction, but not the > entire life of the python process. > Regarding this, we could discuss in the code for better context. Restricting it to the life of the instance is fine (i.e., intended). We can discuss more at crbug.com/669639 in the context of https://codereview.chromium.org/2517383005/ and https://codereview.chromium.org/2544493004/ > Also, not sure if this needs support for multi-threading. It depends on the > usage too. > IIRC, for the delta-test running, it has multi-thread mode. Should be fine, but good point to bear in mind
The CQ bit was checked by wrengr@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from katesonia@chromium.org, inferno@chromium.org Link to the patchset: https://codereview.chromium.org/2548603003/#ps200001 (title: "forgot if I pushed the latest before CQ")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 200001, "attempt_start_ts": 1481062306424090,
"parent_rev": "1557ef798f3c5a0b00eb1e9a698adaa5f478b4a8", "commit_rev":
"d1e13b1f320c86e949a5e7310dc52b37cd1d0163"}
Message was sent while issue was closed.
Description was changed from ========== Adding memoized functions Memoization is necessary when training loglinear models to avoid recomputing features over and over. Recomputing features affects not just the constant factors but also the asymptotic complexity (in a way that's already observable for small ``N``). The memoization is factored out into a class because doing so greatly reduces the code complexity over doing it manually in all the places where we need it. BUG= TBR=stgao@chromium.org ========== to ========== Adding memoized functions Memoization is necessary when training loglinear models to avoid recomputing features over and over. Recomputing features affects not just the constant factors but also the asymptotic complexity (in a way that's already observable for small ``N``). The memoization is factored out into a class because doing so greatly reduces the code complexity over doing it manually in all the places where we need it. BUG= TBR=stgao@chromium.org Committed: https://chromium.googlesource.com/infra/infra/+/d1e13b1f320c86e949a5e7310dc52... ==========
Message was sent while issue was closed.
Committed patchset #6 (id:200001) as https://chromium.googlesource.com/infra/infra/+/d1e13b1f320c86e949a5e7310dc52... |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
