|
|
Chromium Code Reviews|
Created:
3 years, 11 months ago by Sharu Jiang Modified:
3 years, 11 months ago CC:
chromium-reviews, infra-reviews+infra_chromium.org, stgao, inferno Target Ref:
refs/heads/master Project:
infra Visibility:
Public. |
Description[Predator] Move ``SingleFeatureScore`` to LLM.
BUG=674262
TBR=stgao@chromium.org
Review-Url: https://codereview.chromium.org/2617273002
Committed: https://chromium.googlesource.com/infra/infra/+/27163f40f7f9442d53e3a540f7945755a49e2bcf
Patch Set 1 : . #
Total comments: 4
Patch Set 2 : Fix a bug and nits. #Patch Set 3 : For discussion not for review #Patch Set 4 : Fix nits. #Patch Set 5 : Update doc strs. #
Total comments: 37
Patch Set 6 : Address comments. #
Total comments: 38
Patch Set 7 : Address comments. #
Total comments: 2
Patch Set 8 : Fix nit. #
Total comments: 2
Patch Set 9 : Address comment. #
Total comments: 3
Messages
Total messages: 30 (13 generated)
katesonia@chromium.org changed reviewers: + wrengr@chromium.org
PTAL.
Patchset #1 (id:1) has been deleted
Description was changed from ========== [Predator] Move ``SingleFeatureScore`` to LLM. BUG=674262 ========== to ========== [Predator] Move ``SingleFeatureScore`` to LLM. BUG=674262 TBR=stgao@chromium.org ==========
https://codereview.chromium.org/2617273002/diff/20001/appengine/findit/crash/... File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/20001/appengine/findit/crash/... appengine/findit/crash/loglinear/changelist_classifier.py:53: weight_list, weights) we shouldn't have to pass both the weight dict and list; the dict should be sufficient. https://codereview.chromium.org/2617273002/diff/20001/appengine/findit/crash/... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/20001/appengine/findit/crash/... appengine/findit/crash/loglinear/model.py:64: epsilon=None): given the dict of weights the list is redundant. For clarity I'd rather keep the "weights" name for describing the dict https://codereview.chromium.org/2617273002/diff/20001/appengine/findit/crash/... appengine/findit/crash/loglinear/model.py:87: self._weights = np.array([ Once we no longer take the list of weights, this will need to be autovivified somehow. The order of feature names is defined by the final result of feature_function. Of course, we can't get at the order of that list without providing the X and Y arguments; so it'd make the most sense to change the feature_function to be an instance of a class/interface which specifies both the X -> Y -> list(FV) function as well as the list(str) of feature names. Notably, this introduces the constraint that all features must have unique names, which LLM didn't require previously (though SingleFeatureScore did/does assume it). https://codereview.chromium.org/2617273002/diff/20001/appengine/findit/crash/... appengine/findit/crash/loglinear/model.py:261: to blame. documentation wasn't updated; though see my previous comments.
Patchset #4 (id:80001) has been deleted
Patchset #6 (id:130001) has been deleted
https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:53: def _LogZeroish(self, x): This should also be moved to UnnormalizedLLM. That way it can be based on the same epsilon used there, rather than only checking that it is exactly equal to -inf. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:136: features (list of FeatureValue): the values whose ``reason`` since now we're passing an iterator rather than a list, you should update this to say either "iterator" or "iterable" (the function doesn't really care which it gets). https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:162: return sorted(formatted_reasons, unrelated to the CL's goals, but I just noticed: This should be changed to ``formatted_reasons.sort(...); return formatted_reasons`` since there's no point in allocating a new list here. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:169: features (list of FeatureValue): the values whose ``changed_files`` ditto https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:200: return sorted(all_changed_files.values(), ditto. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:147: fs (list of functions): A collection of curried functions ``X -> Y -> A``. Type of ``fs`` should be an iterable of ``Feature`` objects. And since w're making this a class rather than a generic function, you might as well say ``FeatureValue`` rather than ``A``. Also, note that if we start with an iterator(Feature), there's no way to share work among the different features. So this shouldn't be the only way we have of constructing a ``FeatureFunction``. In general, a FeatureFunction can be *any* function of type X -> Y -> dict(FeatureValue) [1]. There are a bunch of different ways to construct such a function, not just by gluing together a list of Features (e.g., we could also glue together a list of FeatureFunctions to get a single FeatureFunction; thus FeatureFunction is also what we've been calling a "metafeature"). This is why I had ToFeatureFunction as a standalone function rather than as a class. [1]: or per crbug.com/680207, any function of type X -> Y -> FeatureName -> FeatureValue. We might possibly want to amend that with a frozenset(FeatureName) so that we know all possible keys of our "lazy dict", since that may come in handy for creating new FeatureFunction objects from old ones; however it is unnecessary for scoring purposes. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:160: ``ToFeatureFunction(fs)(x)(y)[f.name] == f(x)(y)``. the name "ToFeatureFunction"is no longer used anywhere, so you should rephrase this to: ``FeatureFunction(fs)(x)(y)[i] == fs[i](x)(y)`` https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:66: math.fabs(weight) >= epsilon else 0. The ``if...`` should be moved to after the ``for...in...``; thus, ``{name: weight for name, weight in weights.iteritems() if ...}`` https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:107: lambda fxy: sum(self.SingleFeatureScore(feature) Should use ``math.fsum`` whenever adding floats; never use ``sum``. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:108: for feature in fxy.itervalues()))) N.B., you're not taking advantage of the sparsity of the weights dictionary here; you're still iterating over all the features. See the first part of crbug.com/680207 We really want to do ``math.fsum(fxy[i].value * weight_i for i, weight_i in self._weights.iteritems())`` https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:150: return float(np.count_nonzero(self.weights.itervalues())) You can't use ``np.count_nonzero`` anymore since ``self.weights`` isn't an np.ndarray anymore. Instead do ``float(len(self.weights))`` https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/training.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:50: self._feature_order = initial_weights.keys() You should explicitly convert this to a list (or tuple) since those have a specified order, rather than leaving it as a set which doesn't. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:81: self._weights = new_weights To avoid excessive conversion back and forth, this class should store both the dict and np.ndarray. That is, I suggest: def __init__(self,...): ... self._weights = initial_weights self._np_weights = self._DictToNpArray(initial_weights, 0.) @property def weights(self): return self._weights @weights.setter def weights(self, weights_dict): self.ClearWeightBasedMemos() self._weights = weights_dict self._np_weights = self._DictToNpArray(weights_dict, 0.) @property def np_weights(self): return self._np_weights @np_weights.setter def np_weights(self, weights_array): self.ClearWeightBasedMemos() self._weights = self._NpArrayToDict(weights_array, self._LogZeroish) self._np_weights = weights_array def _DictToNpArray(self, d, default=None): return np.array([d.get(i, default) for i in self._feature_order]) def _NpArrayToDict(self, arr, predicate=None): if predicate is None: predicate = lambda _: True return {k: v for k, v in zip(self._feature_order, arr) if predicate(v)} https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:102: return np.array([fxys[feature_name].value if feature_name in fxys else 0. is clearer to use ``fxys.get(feature_name, 0.)`` https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:115: for feature_name in self._feature_order]) Again, to avoid excessive conversion, we can do this once in the ``weights`` setter and then this method simply returns the cached np.array https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:193: return -self.LogLikelihoodGradient() + l2_penalty * new_weights I don't think that's any cleaner than the old code
Patchset #4 (id:100001) has been deleted
https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:53: def _LogZeroish(self, x): On 2017/01/11 20:38:30, wrengr wrote: > This should also be moved to UnnormalizedLLM. That way it can be based on the > same epsilon used there, rather than only checking that it is exactly equal to > -inf. Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:136: features (list of FeatureValue): the values whose ``reason`` On 2017/01/11 20:38:30, wrengr wrote: > since now we're passing an iterator rather than a list, you should update this > to say either "iterator" or "iterable" (the function doesn't really care which > it gets). Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:162: return sorted(formatted_reasons, On 2017/01/11 20:38:30, wrengr wrote: > unrelated to the CL's goals, but I just noticed: This should be changed to > ``formatted_reasons.sort(...); return formatted_reasons`` since there's no point > in allocating a new list here. Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:169: features (list of FeatureValue): the values whose ``changed_files`` On 2017/01/11 20:38:30, wrengr wrote: > ditto Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:200: return sorted(all_changed_files.values(), On 2017/01/11 20:38:30, wrengr wrote: > ditto. Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:147: fs (list of functions): A collection of curried functions ``X -> Y -> A``. On 2017/01/11 20:38:30, wrengr wrote: > Type of ``fs`` should be an iterable of ``Feature`` objects. And since w're > making this a class rather than a generic function, you might as well say > ``FeatureValue`` rather than ``A``. > > Also, note that if we start with an iterator(Feature), there's no way to share > work among the different features. So this shouldn't be the only way we have of > constructing a ``FeatureFunction``. In general, a FeatureFunction can be *any* > function of type X -> Y -> dict(FeatureValue) [1]. There are a bunch of > different ways to construct such a function, not just by gluing together a list > of Features (e.g., we could also glue together a list of FeatureFunctions to get > a single FeatureFunction; thus FeatureFunction is also what we've been calling a > "metafeature"). This is why I had ToFeatureFunction as a standalone function > rather than as a class. > > [1]: or per crbug.com/680207, any function of type X -> Y -> FeatureName -> > FeatureValue. We might possibly want to amend that with a frozenset(FeatureName) > so that we know all possible keys of our "lazy dict", since that may come in > handy for creating new FeatureFunction objects from old ones; however it is > unnecessary for scoring purposes. Acknowledged. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:160: ``ToFeatureFunction(fs)(x)(y)[f.name] == f(x)(y)``. On 2017/01/11 20:38:30, wrengr wrote: > the name "ToFeatureFunction"is no longer used anywhere, so you should rephrase > this to: ``FeatureFunction(fs)(x)(y)[i] == fs[i](x)(y)`` Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:66: math.fabs(weight) >= epsilon else 0. On 2017/01/11 20:38:30, wrengr wrote: > The ``if...`` should be moved to after the ``for...in...``; thus, ``{name: > weight for name, weight in weights.iteritems() if ...}`` Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:107: lambda fxy: sum(self.SingleFeatureScore(feature) On 2017/01/11 20:38:30, wrengr wrote: > Should use ``math.fsum`` whenever adding floats; never use ``sum``. Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:108: for feature in fxy.itervalues()))) On 2017/01/11 20:38:30, wrengr wrote: > N.B., you're not taking advantage of the sparsity of the weights dictionary > here; you're still iterating over all the features. See the first part of > crbug.com/680207 > > We really want to do ``math.fsum(fxy[i].value * weight_i for i, weight_i in > self._weights.iteritems())`` That's because I didn't filter those 0 weights in this cl yet, there are some places in training also need to be changed (while training, it's possible that a zero weight turns into a non-zero one.), I'd rather do this optimization in another cl. Added a todo. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:150: return float(np.count_nonzero(self.weights.itervalues())) On 2017/01/11 20:38:30, wrengr wrote: > You can't use ``np.count_nonzero`` anymore since ``self.weights`` isn't an > np.ndarray anymore. Instead do ``float(len(self.weights))`` Since I haven't filtered 0 weights yet, I just use float(len(self.weights) - self.weights.values().count(0.)) https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/training.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:50: self._feature_order = initial_weights.keys() On 2017/01/11 20:38:30, wrengr wrote: > You should explicitly convert this to a list (or tuple) since those have a > specified order, rather than leaving it as a set which doesn't. I think the ``.keys()`` will return a list, but explicitly converting to a list should be safer. Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:81: self._weights = new_weights On 2017/01/11 20:38:31, wrengr wrote: > To avoid excessive conversion back and forth, this class should store both the > dict and np.ndarray. That is, I suggest: > > def __init__(self,...): > ... > self._weights = initial_weights > self._np_weights = self._DictToNpArray(initial_weights, 0.) > > @property > def weights(self): > return self._weights > > @weights.setter > def weights(self, weights_dict): > self.ClearWeightBasedMemos() > self._weights = weights_dict > self._np_weights = self._DictToNpArray(weights_dict, 0.) > > @property > def np_weights(self): > return self._np_weights > > @np_weights.setter > def np_weights(self, weights_array): > self.ClearWeightBasedMemos() > self._weights = self._NpArrayToDict(weights_array, self._LogZeroish) > self._np_weights = weights_array > > def _DictToNpArray(self, d, default=None): > return np.array([d.get(i, default) for i in self._feature_order]) > > def _NpArrayToDict(self, arr, predicate=None): > if predicate is None: > predicate = lambda _: True > return {k: v for k, v in zip(self._feature_order, arr) if predicate(v)} Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:102: return np.array([fxys[feature_name].value if feature_name in fxys else 0. On 2017/01/11 20:38:30, wrengr wrote: > is clearer to use ``fxys.get(feature_name, 0.)`` I have to do this, because I need ``fxys[feature_name].value`` instead of ``fxys[feature_name]`` https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:115: for feature_name in self._feature_order]) On 2017/01/11 20:38:31, wrengr wrote: > Again, to avoid excessive conversion, we can do this once in the ``weights`` > setter and then this method simply returns the cached np.array Done. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:193: return -self.LogLikelihoodGradient() + l2_penalty * new_weights On 2017/01/11 20:38:31, wrengr wrote: > I don't think that's any cleaner than the old code This is because we need np_array here instead of a dict.
https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:108: for feature in fxy.itervalues()))) On 2017/01/12 01:41:38, Sharu Jiang wrote: > On 2017/01/11 20:38:30, wrengr wrote: > > N.B., you're not taking advantage of the sparsity of the weights dictionary > > here; you're still iterating over all the features. See the first part of > > crbug.com/680207 > > > > We really want to do ``math.fsum(fxy[i].value * weight_i for i, weight_i in > > self._weights.iteritems())`` > > That's because I didn't filter those 0 weights in this cl yet, there are some > places in training also need to be changed (while training, it's possible that a > zero weight turns into a non-zero one.), I'd rather do this optimization in > another cl. > > Added a todo. Yeah, the weights will change during training, but once we set the new weight dict (and clear any weight-dependent memos) none of the methods have any way of noticing the change; they'll keep on working just fine. Since this is the CL that makes the weights be stored as a dict, it makes sense to iterate over self.weights in this CL. Once we do, this method will automatically start taking advantage of the sparsity of the weights whenever another CL adds the filtering for that in; no need to go back and change it again at that point. It's one less thing for the person writing that CL to keep in mind, one less point for the git diff to break, and besides it's a one-line change. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:150: return float(np.count_nonzero(self.weights.itervalues())) On 2017/01/12 01:41:38, Sharu Jiang wrote: > On 2017/01/11 20:38:30, wrengr wrote: > > You can't use ``np.count_nonzero`` anymore since ``self.weights`` isn't an > > np.ndarray anymore. Instead do ``float(len(self.weights))`` > > Since I haven't filtered 0 weights yet, I just use > float(len(self.weights) - self.weights.values().count(0.)) Testing for exact equality with 0.0 isn't reliable. If self.weights isn't already filtered, then you can use the same ``math.fabs(weight) >= epsilon`` test as in __init__ https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... File appengine/findit/crash/loglinear/training.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:50: self._feature_order = initial_weights.keys() On 2017/01/12 01:41:38, Sharu Jiang wrote: > On 2017/01/11 20:38:30, wrengr wrote: > > You should explicitly convert this to a list (or tuple) since those have a > > specified order, rather than leaving it as a set which doesn't. > > I think the ``.keys()`` will return a list, but explicitly converting to a list > should be safer. Done. So it does. Weird. https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:102: return np.array([fxys[feature_name].value if feature_name in fxys else 0. On 2017/01/12 01:41:38, Sharu Jiang wrote: > On 2017/01/11 20:38:30, wrengr wrote: > > is clearer to use ``fxys.get(feature_name, 0.)`` > > I have to do this, because I need ``fxys[feature_name].value`` instead of > ``fxys[feature_name]`` hrm... that does make it tricky. This sort of thing is why I had FeatureValue.__float__ previously; with that we' could've ``float(fxys.get(feature_name, 0.))`` without needing to wrap ``0.`` up in something that provides a ``.value`` property. I really wish Python had let-bindings in its comprehension syntax. One alternative is to build an intermediate generator, thus: ``return np.array([0. if fxyn is None else fxyn.value for fxyn in (fxys.get(n) for n in self._feature_order)])``. Not sure that that's any cleaner though :( https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:193: return -self.LogLikelihoodGradient() + l2_penalty * new_weights On 2017/01/12 01:41:38, Sharu Jiang wrote: > On 2017/01/11 20:38:31, wrengr wrote: > > I don't think that's any cleaner than the old code > > This is because we need np_array here instead of a dict. Ah, right <chagrin>
https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:75: return self.RankAndFilterSuspects(annotated_report, suspects) I don't like that name change. IMO, filtering is already part of "ranking". (This is one way "ranking" differs from "sorting", which mustn't drop any elements.) https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:150: have a name property. The input functions don't return a dict(FV), they just return one FV. With that change, this entire description is exactly the description of what the ``Feature`` class is capturing. For the current interface, the collection of inputs is either: (a) a dict of X->Y->FV functions. The key gives the name, and we will automatically convert the raw function into a Feature under the covers. (b) an iterable of Feature objects. Here we use an iterable so we can't get the name form there; thus we require a Feature object since that specifies both the X->Y->FV and the name. Pick one or the other and run with it. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:156: """Fuction mapping ``X -> Y -> dict(A.name to A). The ``A`` should be ``FeatureValue`` here as well https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:159: A function ``X -> Y -> dict(A.name to A)`` where for all ``x``, ``y``, and ditto https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:66: # weight covector. It is trivial to do that now. Instead of saying ``{k: (v if b else 0) for k, v in d.iteritems()}` which artificially avoids making things sparse, just say ``{k: v for k, v in d.iteritems() if b}``. The latter is exactly the same as filtering the former to remove all the items where you chose the else-branch. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:117: preprocessing (like filtering or truncating) applied. I don't understand this sentence. If it was from the old code, I don't think it applies anymore. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:239: features (literable of FeatureValue): the values whose ``reason`` "literable" -> "iterable" https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/test/model_test.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/test/model_test.py:21: self._weights, 0.1) Do you really want so large of an epsilon? Why not stick with the default? (We do want to test that the default behaves reasonably, otherwise we should choose a different default) https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/test/training_test.py (left): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/test/training_test.py:38: def testWeightsSetterShapeMismatch(self): We probably still want something similar to this. SciPy shouldn't ever pass us the wrong shape, but we do want to make sure and raise an error if the setter ever does get the wrong shape. We shouldn't just silently accept the wrong shape, since getting it is indicative of a bug in the calling code. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/training.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:41: to converge. Should add a note that the dictionary should not be sparse, since we will only train on the features whose names are keys in this dict. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:51: self._np_weights = None I think it'd be clearer to set the _weights and _np_weights explicitly, rather than setting this to None and then invoking the setter. Also, using the setter introduces unnecessary inefficiency. E.g., it means computing ``self.NumPyArrayWeightsToWeights(self.WeightsToNumPyArrayWeights(initial_weights))`` which is unnecessary since that is equal to ``initial_weights``. Also it means calling ClearWeightBasedMemos which is an expensive noop at this point. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:60: """The NumPy Array of weight covector. "of weight covector" -> "of the weight covector" https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:64: the semantics rather than the implementation details. This paragraph should be deleted, since the whole point of this property is to provide the np.ndarray https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:70: """Mutate the np array weight covector, and clear memos as necessary. It's fine to leave off the "np array" part, since we're updating both representations of the weight covector. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:89: Note, the features nparray should have the same order as in "nparray" -> "np array" https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:105: return np.array([fxys[feature_name].value if feature_name in fxys else 0. If feature_name is not in fxys then that should be an error. It's okay to have weights be missing/zero; it's not okay to have features be missing. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:110: def WeightsAsNumPyArray(self): This is unnecessary. Callers should use the ``np_weights`` property directly. The paragraph below about why we have this should be moved to the docstring for the property https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:119: def NumPyArrayWeightsToWeights(self, np_weights): This "weights to weights" name is confusing. Also, should be private since this is only needed for internal use and isn't being provided for client code to use. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:141: def WeightsToNumPyArrayWeights(self, weights, default=0.): again, name is confusing and should also be private. Both these methods can be made to work for any np.ndarray with len(self._feature_order) elements. If they are made generic (in which case "weight" shouldn't be in the name), then it makes sense to have a default argument. But if they're kept weight-specific, then the default value should always be 0.0 and shouldn't be overridable.
https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/changelist_classifier.py:75: return self.RankAndFilterSuspects(annotated_report, suspects) On 2017/01/12 19:09:09, wrengr wrote: > I don't like that name change. IMO, filtering is already part of "ranking". > (This is one way "ranking" differs from "sorting", which mustn't drop any > elements.) Makes sense, changed back. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:150: have a name property. On 2017/01/12 19:09:09, wrengr wrote: > The input functions don't return a dict(FV), they just return one FV. With that > change, this entire description is exactly the description of what the > ``Feature`` class is capturing. For the current interface, the collection of > inputs is either: > > (a) a dict of X->Y->FV functions. The key gives the name, and we will > automatically convert the raw function into a Feature under the covers. > > (b) an iterable of Feature objects. Here we use an iterable so we can't get the > name form there; thus we require a Feature object since that specifies both the > X->Y->FV and the name. > > Pick one or the other and run with it. Oops, I will stick to the option (b). https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:156: """Fuction mapping ``X -> Y -> dict(A.name to A). On 2017/01/12 19:09:09, wrengr wrote: > The ``A`` should be ``FeatureValue`` here as well Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:159: A function ``X -> Y -> dict(A.name to A)`` where for all ``x``, ``y``, and On 2017/01/12 19:09:09, wrengr wrote: > ditto Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:66: # weight covector. On 2017/01/12 19:09:09, wrengr wrote: > It is trivial to do that now. Instead of saying ``{k: (v if b else 0) for k, v > in d.iteritems()}` which artificially avoids making things sparse, just say > ``{k: v for k, v in d.iteritems() if b}``. The latter is exactly the same as > filtering the former to remove all the items where you chose the else-branch. This one is trivial, there are some other places which needs to be changed, but I can change it if you want. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:117: preprocessing (like filtering or truncating) applied. On 2017/01/12 19:09:09, wrengr wrote: > I don't understand this sentence. If it was from the old code, I don't think it > applies anymore. Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:239: features (literable of FeatureValue): the values whose ``reason`` On 2017/01/12 19:09:09, wrengr wrote: > "literable" -> "iterable" Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/test/model_test.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/test/model_test.py:21: self._weights, 0.1) On 2017/01/12 19:09:09, wrengr wrote: > Do you really want so large of an epsilon? Why not stick with the default? (We > do want to test that the default behaves reasonably, otherwise we should choose > a different default) no, just picked a random number. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/test/training_test.py (left): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/test/training_test.py:38: def testWeightsSetterShapeMismatch(self): On 2017/01/12 19:09:09, wrengr wrote: > We probably still want something similar to this. SciPy shouldn't ever pass us > the wrong shape, but we do want to make sure and raise an error if the setter > ever does get the wrong shape. We shouldn't just silently accept the wrong > shape, since getting it is indicative of a bug in the calling code. Oops, forgot to add this back. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... File appengine/findit/crash/loglinear/training.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:41: to converge. On 2017/01/12 19:09:10, wrengr wrote: > Should add a note that the dictionary should not be sparse, since we will only > train on the features whose names are keys in this dict. Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:51: self._np_weights = None On 2017/01/12 19:09:10, wrengr wrote: > I think it'd be clearer to set the _weights and _np_weights explicitly, rather > than setting this to None and then invoking the setter. > > Also, using the setter introduces unnecessary inefficiency. E.g., it means > computing > ``self.NumPyArrayWeightsToWeights(self.WeightsToNumPyArrayWeights(initial_weights))`` > which is unnecessary since that is equal to ``initial_weights``. Also it means > calling ClearWeightBasedMemos which is an expensive noop at this point. Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:60: """The NumPy Array of weight covector. On 2017/01/12 19:09:10, wrengr wrote: > "of weight covector" -> "of the weight covector" Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:64: the semantics rather than the implementation details. On 2017/01/12 19:09:10, wrengr wrote: > This paragraph should be deleted, since the whole point of this property is to > provide the np.ndarray Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:70: """Mutate the np array weight covector, and clear memos as necessary. On 2017/01/12 19:09:10, wrengr wrote: > It's fine to leave off the "np array" part, since we're updating both > representations of the weight covector. Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:89: Note, the features nparray should have the same order as in On 2017/01/12 19:09:10, wrengr wrote: > "nparray" -> "np array" Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:105: return np.array([fxys[feature_name].value if feature_name in fxys else 0. On 2017/01/12 19:09:10, wrengr wrote: > If feature_name is not in fxys then that should be an error. It's okay to have > weights be missing/zero; it's not okay to have features be missing. Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:110: def WeightsAsNumPyArray(self): On 2017/01/12 19:09:10, wrengr wrote: > This is unnecessary. Callers should use the ``np_weights`` property directly. > The paragraph below about why we have this should be moved to the docstring for > the property Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:119: def NumPyArrayWeightsToWeights(self, np_weights): On 2017/01/12 19:09:10, wrengr wrote: > This "weights to weights" name is confusing. Also, should be private since this > is only needed for internal use and isn't being provided for client code to use. Done. https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:141: def WeightsToNumPyArrayWeights(self, weights, default=0.): On 2017/01/12 19:09:10, wrengr wrote: > again, name is confusing and should also be private. > > Both these methods can be made to work for any np.ndarray with > len(self._feature_order) elements. If they are made generic (in which case > "weight" shouldn't be in the name), then it makes sense to have a default > argument. But if they're kept weight-specific, then the default value should > always be 0.0 and shouldn't be overridable. Done.
katesonia@chromium.org changed reviewers: + inferno@chromium.org, mbarbella@chromium.org
PTAL
robertocn@chromium.org changed reviewers: + robertocn@chromium.org
https://codereview.chromium.org/2617273002/diff/190001/appengine/findit/crash... File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/190001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:144: """Given a dict of scalar-valued functions, return an dict-valued function. Should it be ``Given an iterable of scalar-valued, named functions...``?
https://codereview.chromium.org/2617273002/diff/190001/appengine/findit/crash... File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/190001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:144: """Given a dict of scalar-valued functions, return an dict-valued function. On 2017/01/19 23:45:30, RobertoCN wrote: > Should it be ``Given an iterable of scalar-valued, named functions...``? Right, forgot to update. Done.
chanli@chromium.org changed reviewers: + chanli@chromium.org
https://codereview.chromium.org/2617273002/diff/210001/appengine/findit/crash... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/210001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:115: return isinstance(weight, float) and math.fabs(weight) >= self._epsilon In your CL https://codereview.chromium.org/2625073003/, appengine/findit/crash/loglinear/weight.py, DropZeroWeights you used '<=' for zero value, but here you use '>=' for NonZeroWeight?
lgtm once open comments are addressed
https://codereview.chromium.org/2617273002/diff/210001/appengine/findit/crash... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/210001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:115: return isinstance(weight, float) and math.fabs(weight) >= self._epsilon On 2017/01/23 01:56:39, chanli wrote: > In your CL https://codereview.chromium.org/2625073003/, > appengine/findit/crash/loglinear/weight.py, DropZeroWeights you used '<=' for > zero value, but here you use '>=' for NonZeroWeight? Done.
The CQ bit was checked by katesonia@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mbarbella@chromium.org Link to the patchset: https://codereview.chromium.org/2617273002/#ps230001 (title: "Address comment.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 230001, "attempt_start_ts": 1485208441868010,
"parent_rev": "759bc78c3e24c998aee118524e02f9e2d2ff5479", "commit_rev":
"27163f40f7f9442d53e3a540f7945755a49e2bcf"}
Message was sent while issue was closed.
Description was changed from ========== [Predator] Move ``SingleFeatureScore`` to LLM. BUG=674262 TBR=stgao@chromium.org ========== to ========== [Predator] Move ``SingleFeatureScore`` to LLM. BUG=674262 TBR=stgao@chromium.org Review-Url: https://codereview.chromium.org/2617273002 Committed: https://chromium.googlesource.com/infra/infra/+/27163f40f7f9442d53e3a540f7945... ==========
Message was sent while issue was closed.
Committed patchset #9 (id:230001) as https://chromium.googlesource.com/infra/infra/+/27163f40f7f9442d53e3a540f7945...
Message was sent while issue was closed.
On 2017/01/23 22:09:50, commit-bot: I haz the power wrote: > Committed patchset #9 (id:230001) as > https://chromium.googlesource.com/infra/infra/+/27163f40f7f9442d53e3a540f7945... Committed it since many cls depends on it, feel free to leave comments if you have any concerns :)
Message was sent while issue was closed.
lgtm, with a few comments https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash... File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash... appengine/findit/crash/loglinear/feature.py:150: have a name property. I still say it'd be clearer to refer to the ``Feature`` interface rather than calling it a "function with a name property". https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash... File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash... appengine/findit/crash/loglinear/model.py:65: self._epsilon = epsilon Can simplify this conditional to ``self._epsilon = epsilon or EPSILON`` https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash... File appengine/findit/crash/loglinear/training.py (right): https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash... appengine/findit/crash/loglinear/training.py:52: # Use self._weights instead of initialz_weights, since self._weights already "initialz" -> "initial" |
