Issue 2617273002: [Predator] Move ``SingleFeatureScore`` to LLM.

Sharu Jiang

katesonia@chromium.org changed reviewers: + wrengr@chromium.org

3 years, 11 months ago (2017-01-07 00:07:27 UTC) #1

Sharu Jiang

Description was changed from ========== [Predator] Move ``SingleFeatureScore`` to LLM. BUG=674262 ========== to ========== [Predator] ...

3 years, 11 months ago (2017-01-07 00:40:01 UTC) #4

wrengr

https://codereview.chromium.org/2617273002/diff/20001/appengine/findit/crash/loglinear/changelist_classifier.py File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/20001/appengine/findit/crash/loglinear/changelist_classifier.py#newcode53 appengine/findit/crash/loglinear/changelist_classifier.py:53: weight_list, weights) we shouldn't have to pass both the ...

3 years, 11 months ago (2017-01-09 19:31:52 UTC) #5

wrengr

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash/loglinear/changelist_classifier.py File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash/loglinear/changelist_classifier.py#newcode53 appengine/findit/crash/loglinear/changelist_classifier.py:53: def _LogZeroish(self, x): This should also be moved to ...

3 years, 11 months ago (2017-01-11 20:38:31 UTC) #8

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/changelist_classifier.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:53: def
_LogZeroish(self, x):
This should also be moved to UnnormalizedLLM. That way it can be based on the
same epsilon used there, rather than only checking that it is exactly equal to
-inf.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:136: features (list of
FeatureValue): the values whose ``reason``
since now we're passing an iterator rather than a list, you should update this
to say either "iterator" or "iterable" (the function doesn't really care which
it gets).

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:162: return
sorted(formatted_reasons,
unrelated to the CL's goals, but I just noticed: This should be changed to
``formatted_reasons.sort(...); return formatted_reasons`` since there's no point
in allocating a new list here.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:169: features (list of
FeatureValue): the values whose ``changed_files``
ditto

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:200: return
sorted(all_changed_files.values(),
ditto.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/feature.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/feature.py:147: fs (list of functions): A
collection of curried functions ``X -> Y -> A``.
Type of ``fs`` should be an iterable of ``Feature`` objects. And since w're
making this a class rather than a generic function, you might as well say
``FeatureValue`` rather than ``A``.

Also, note that if we start with an iterator(Feature), there's no way to share
work among the different features. So this shouldn't be the only way we have of
constructing a ``FeatureFunction``. In general, a FeatureFunction can be *any*
function of type X -> Y -> dict(FeatureValue) [1]. There are a bunch of
different ways to construct such a function, not just by gluing together a list
of Features (e.g., we could also glue together a list of FeatureFunctions to get
a single FeatureFunction; thus FeatureFunction is also what we've been calling a
"metafeature"). This is why I had ToFeatureFunction as a standalone function
rather than as a class.

[1]: or per crbug.com/680207, any function of type X -> Y -> FeatureName ->
FeatureValue. We might possibly want to amend that with a frozenset(FeatureName)
so that we know all possible keys of our "lazy dict", since that may come in
handy for creating new FeatureFunction objects from old ones; however it is
unnecessary for scoring purposes.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/feature.py:160:
``ToFeatureFunction(fs)(x)(y)[f.name] == f(x)(y)``.
the name "ToFeatureFunction"is no longer used anywhere, so you should rephrase
this to: ``FeatureFunction(fs)(x)(y)[i] == fs[i](x)(y)``

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/model.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:66: math.fabs(weight) >= epsilon else
0.
The ``if...`` should be moved to after the ``for...in...``; thus, ``{name:
weight for name, weight in weights.iteritems() if ...}``

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:107: lambda fxy:
sum(self.SingleFeatureScore(feature)
Should use ``math.fsum`` whenever adding floats; never use ``sum``.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:108: for feature in
fxy.itervalues())))
N.B., you're not taking advantage of the sparsity of the weights dictionary
here; you're still iterating over all the features. See the first part of
crbug.com/680207

We really want to do ``math.fsum(fxy[i].value * weight_i for i, weight_i in
self._weights.iteritems())``

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:150: return
float(np.count_nonzero(self.weights.itervalues()))
You can't use ``np.count_nonzero`` anymore since ``self.weights`` isn't an
np.ndarray anymore. Instead do ``float(len(self.weights))``

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/training.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:50: self._feature_order =
initial_weights.keys()
You should explicitly convert this to a list (or tuple) since those have a
specified order, rather than leaving it as a set which doesn't.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:81: self._weights = new_weights
To avoid excessive conversion back and forth, this class should store both the
dict and np.ndarray. That is, I suggest:

    def __init__(self,...):
        ...
        self._weights = initial_weights
        self._np_weights = self._DictToNpArray(initial_weights, 0.)

    @property
    def weights(self):
        return self._weights

    @weights.setter
    def weights(self, weights_dict):
        self.ClearWeightBasedMemos()
        self._weights = weights_dict
        self._np_weights = self._DictToNpArray(weights_dict, 0.)

    @property
    def np_weights(self):
        return self._np_weights

    @np_weights.setter
    def np_weights(self, weights_array):
        self.ClearWeightBasedMemos()
        self._weights = self._NpArrayToDict(weights_array, self._LogZeroish)
        self._np_weights = weights_array

    def _DictToNpArray(self, d, default=None):
        return np.array([d.get(i, default) for i in self._feature_order])

    def _NpArrayToDict(self, arr, predicate=None):
        if predicate is None:
          predicate = lambda _: True
        return {k: v for k, v in zip(self._feature_order, arr) if predicate(v)}

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:102: return
np.array([fxys[feature_name].value if feature_name in fxys else 0.
is clearer to use ``fxys.get(feature_name, 0.)``

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:115: for feature_name in
self._feature_order])
Again, to avoid excessive conversion, we can do this once in the ``weights``
setter and then this method simply returns the cached np.array

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:193: return
-self.LogLikelihoodGradient() + l2_penalty * new_weights
I don't think that's any cleaner than the old code

Sharu Jiang

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash/loglinear/changelist_classifier.py File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash/loglinear/changelist_classifier.py#newcode53 appengine/findit/crash/loglinear/changelist_classifier.py:53: def _LogZeroish(self, x): On 2017/01/11 20:38:30, wrengr wrote: > ...

3 years, 11 months ago (2017-01-12 01:41:38 UTC) #10

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/changelist_classifier.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:53: def
_LogZeroish(self, x):
On 2017/01/11 20:38:30, wrengr wrote:
> This should also be moved to UnnormalizedLLM. That way it can be based on the
> same epsilon used there, rather than only checking that it is exactly equal to
> -inf.

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:136: features (list of
FeatureValue): the values whose ``reason``
On 2017/01/11 20:38:30, wrengr wrote:
> since now we're passing an iterator rather than a list, you should update this
> to say either "iterator" or "iterable" (the function doesn't really care which
> it gets).

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:162: return
sorted(formatted_reasons,
On 2017/01/11 20:38:30, wrengr wrote:
> unrelated to the CL's goals, but I just noticed: This should be changed to
> ``formatted_reasons.sort(...); return formatted_reasons`` since there's no
point
> in allocating a new list here.

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:169: features (list of
FeatureValue): the values whose ``changed_files``
On 2017/01/11 20:38:30, wrengr wrote:
> ditto

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:200: return
sorted(all_changed_files.values(),
On 2017/01/11 20:38:30, wrengr wrote:
> ditto.

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/feature.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/feature.py:147: fs (list of functions): A
collection of curried functions ``X -> Y -> A``.
On 2017/01/11 20:38:30, wrengr wrote:
> Type of ``fs`` should be an iterable of ``Feature`` objects. And since w're
> making this a class rather than a generic function, you might as well say
> ``FeatureValue`` rather than ``A``.
> 
> Also, note that if we start with an iterator(Feature), there's no way to share
> work among the different features. So this shouldn't be the only way we have
of
> constructing a ``FeatureFunction``. In general, a FeatureFunction can be *any*
> function of type X -> Y -> dict(FeatureValue) [1]. There are a bunch of
> different ways to construct such a function, not just by gluing together a
list
> of Features (e.g., we could also glue together a list of FeatureFunctions to
get
> a single FeatureFunction; thus FeatureFunction is also what we've been calling
a
> "metafeature"). This is why I had ToFeatureFunction as a standalone function
> rather than as a class.
> 
> [1]: or per crbug.com/680207, any function of type X -> Y -> FeatureName ->
> FeatureValue. We might possibly want to amend that with a
frozenset(FeatureName)
> so that we know all possible keys of our "lazy dict", since that may come in
> handy for creating new FeatureFunction objects from old ones; however it is
> unnecessary for scoring purposes.

Acknowledged.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/feature.py:160:
``ToFeatureFunction(fs)(x)(y)[f.name] == f(x)(y)``.
On 2017/01/11 20:38:30, wrengr wrote:
> the name "ToFeatureFunction"is no longer used anywhere, so you should rephrase
> this to: ``FeatureFunction(fs)(x)(y)[i] == fs[i](x)(y)``

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/model.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:66: math.fabs(weight) >= epsilon else
0.
On 2017/01/11 20:38:30, wrengr wrote:
> The ``if...`` should be moved to after the ``for...in...``; thus, ``{name:
> weight for name, weight in weights.iteritems() if ...}``

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:107: lambda fxy:
sum(self.SingleFeatureScore(feature)
On 2017/01/11 20:38:30, wrengr wrote:
> Should use ``math.fsum`` whenever adding floats; never use ``sum``.

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:108: for feature in
fxy.itervalues())))
On 2017/01/11 20:38:30, wrengr wrote:
> N.B., you're not taking advantage of the sparsity of the weights dictionary
> here; you're still iterating over all the features. See the first part of
> crbug.com/680207
> 
> We really want to do ``math.fsum(fxy[i].value * weight_i for i, weight_i in
> self._weights.iteritems())``

That's because I didn't filter those 0 weights in this cl yet, there are some
places in training also need to be changed (while training, it's possible that a
zero weight turns into a non-zero one.), I'd rather do this optimization in
another cl.

Added a todo.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:150: return
float(np.count_nonzero(self.weights.itervalues()))
On 2017/01/11 20:38:30, wrengr wrote:
> You can't use ``np.count_nonzero`` anymore since ``self.weights`` isn't an
> np.ndarray anymore. Instead do ``float(len(self.weights))``

Since I haven't filtered 0 weights yet, I just use 
float(len(self.weights) - self.weights.values().count(0.))

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/training.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:50: self._feature_order =
initial_weights.keys()
On 2017/01/11 20:38:30, wrengr wrote:
> You should explicitly convert this to a list (or tuple) since those have a
> specified order, rather than leaving it as a set which doesn't.

I think the ``.keys()`` will return a list, but explicitly converting to a list
should be safer. Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:81: self._weights = new_weights
On 2017/01/11 20:38:31, wrengr wrote:
> To avoid excessive conversion back and forth, this class should store both the
> dict and np.ndarray. That is, I suggest:
> 
>     def __init__(self,...):
>         ...
>         self._weights = initial_weights
>         self._np_weights = self._DictToNpArray(initial_weights, 0.)
> 
>     @property
>     def weights(self):
>         return self._weights
> 
>     @weights.setter
>     def weights(self, weights_dict):
>         self.ClearWeightBasedMemos()
>         self._weights = weights_dict
>         self._np_weights = self._DictToNpArray(weights_dict, 0.)
> 
>     @property
>     def np_weights(self):
>         return self._np_weights
> 
>     @np_weights.setter
>     def np_weights(self, weights_array):
>         self.ClearWeightBasedMemos()
>         self._weights = self._NpArrayToDict(weights_array, self._LogZeroish)
>         self._np_weights = weights_array
> 
>     def _DictToNpArray(self, d, default=None):
>         return np.array([d.get(i, default) for i in self._feature_order])
> 
>     def _NpArrayToDict(self, arr, predicate=None):
>         if predicate is None:
>           predicate = lambda _: True
>         return {k: v for k, v in zip(self._feature_order, arr) if
predicate(v)}

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:102: return
np.array([fxys[feature_name].value if feature_name in fxys else 0.
On 2017/01/11 20:38:30, wrengr wrote:
> is clearer to use ``fxys.get(feature_name, 0.)``

I have to do this, because I need ``fxys[feature_name].value`` instead of
``fxys[feature_name]``

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:115: for feature_name in
self._feature_order])
On 2017/01/11 20:38:31, wrengr wrote:
> Again, to avoid excessive conversion, we can do this once in the ``weights``
> setter and then this method simply returns the cached np.array

Done.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:193: return
-self.LogLikelihoodGradient() + l2_penalty * new_weights
On 2017/01/11 20:38:31, wrengr wrote:
> I don't think that's any cleaner than the old code

This is because we need np_array here instead of a dict.

wrengr

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash/loglinear/model.py File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash/loglinear/model.py#newcode108 appengine/findit/crash/loglinear/model.py:108: for feature in fxy.itervalues()))) On 2017/01/12 01:41:38, Sharu Jiang ...

3 years, 11 months ago (2017-01-12 18:16:17 UTC) #11

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/model.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:108: for feature in
fxy.itervalues())))
On 2017/01/12 01:41:38, Sharu Jiang wrote:
> On 2017/01/11 20:38:30, wrengr wrote:
> > N.B., you're not taking advantage of the sparsity of the weights dictionary
> > here; you're still iterating over all the features. See the first part of
> > crbug.com/680207
> > 
> > We really want to do ``math.fsum(fxy[i].value * weight_i for i, weight_i in
> > self._weights.iteritems())``
> 
> That's because I didn't filter those 0 weights in this cl yet, there are some
> places in training also need to be changed (while training, it's possible that
a
> zero weight turns into a non-zero one.), I'd rather do this optimization in
> another cl.
> 
> Added a todo.

Yeah, the weights will change during training, but once we set the new weight
dict (and clear any weight-dependent memos) none of the methods have any way of
noticing the change; they'll keep on working just fine.

Since this is the CL that makes the weights be stored as a dict, it makes sense
to iterate over self.weights in this CL. Once we do, this method will
automatically start taking advantage of the sparsity of the weights whenever
another CL adds the filtering for that in; no need to go back and change it
again at that point. It's one less thing for the person writing that CL to keep
in mind, one less point for the git diff to break, and besides it's a one-line
change.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:150: return
float(np.count_nonzero(self.weights.itervalues()))
On 2017/01/12 01:41:38, Sharu Jiang wrote:
> On 2017/01/11 20:38:30, wrengr wrote:
> > You can't use ``np.count_nonzero`` anymore since ``self.weights`` isn't an
> > np.ndarray anymore. Instead do ``float(len(self.weights))``
> 
> Since I haven't filtered 0 weights yet, I just use 
> float(len(self.weights) - self.weights.values().count(0.))

Testing for exact equality with 0.0 isn't reliable. If self.weights isn't
already filtered, then you can use the same ``math.fabs(weight) >= epsilon``
test as in __init__

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
File appengine/findit/crash/loglinear/training.py (right):

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:50: self._feature_order =
initial_weights.keys()
On 2017/01/12 01:41:38, Sharu Jiang wrote:
> On 2017/01/11 20:38:30, wrengr wrote:
> > You should explicitly convert this to a list (or tuple) since those have a
> > specified order, rather than leaving it as a set which doesn't.
> 
> I think the ``.keys()`` will return a list, but explicitly converting to a
list
> should be safer. Done.

So it does. Weird.

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:102: return
np.array([fxys[feature_name].value if feature_name in fxys else 0.
On 2017/01/12 01:41:38, Sharu Jiang wrote:
> On 2017/01/11 20:38:30, wrengr wrote:
> > is clearer to use ``fxys.get(feature_name, 0.)``
> 
> I have to do this, because I need ``fxys[feature_name].value`` instead of
> ``fxys[feature_name]``

hrm... that does make it tricky.

This sort of thing is why I had FeatureValue.__float__ previously; with that we'
could've ``float(fxys.get(feature_name, 0.))`` without needing to wrap ``0.`` up
in something that provides a ``.value`` property. I really wish Python had
let-bindings in its comprehension syntax. One alternative is to build an
intermediate generator, thus: ``return np.array([0. if fxyn is None else
fxyn.value for fxyn in (fxys.get(n) for n in self._feature_order)])``. Not sure
that that's any cleaner though :(

https://codereview.chromium.org/2617273002/diff/150001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:193: return
-self.LogLikelihoodGradient() + l2_penalty * new_weights
On 2017/01/12 01:41:38, Sharu Jiang wrote:
> On 2017/01/11 20:38:31, wrengr wrote:
> > I don't think that's any cleaner than the old code
> 
> This is because we need np_array here instead of a dict.

Ah, right <chagrin>

wrengr

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash/loglinear/changelist_classifier.py File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash/loglinear/changelist_classifier.py#newcode75 appengine/findit/crash/loglinear/changelist_classifier.py:75: return self.RankAndFilterSuspects(annotated_report, suspects) I don't like that name change. ...

3 years, 11 months ago (2017-01-12 19:09:10 UTC) #12

Sharu Jiang

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash/loglinear/changelist_classifier.py File appengine/findit/crash/loglinear/changelist_classifier.py (right): https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash/loglinear/changelist_classifier.py#newcode75 appengine/findit/crash/loglinear/changelist_classifier.py:75: return self.RankAndFilterSuspects(annotated_report, suspects) On 2017/01/12 19:09:09, wrengr wrote: > ...

3 years, 11 months ago (2017-01-13 01:08:35 UTC) #13

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
File appengine/findit/crash/loglinear/changelist_classifier.py (right):

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/changelist_classifier.py:75: return
self.RankAndFilterSuspects(annotated_report, suspects)
On 2017/01/12 19:09:09, wrengr wrote:
> I don't like that name change. IMO, filtering is already part of "ranking".
> (This is one way "ranking" differs from "sorting", which mustn't drop any
> elements.)

Makes sense, changed back.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
File appengine/findit/crash/loglinear/feature.py (right):

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/feature.py:150: have a name property.
On 2017/01/12 19:09:09, wrengr wrote:
> The input functions don't return a dict(FV), they just return one FV. With
that
> change, this entire description is exactly the description of what the
> ``Feature`` class is capturing. For the current interface, the collection of
> inputs is either:
> 
> (a) a dict of X->Y->FV functions. The key gives the name, and we will
> automatically convert the raw function into a Feature under the covers.
> 
> (b) an iterable of Feature objects. Here we use an iterable so we can't get
the
> name form there; thus we require a Feature object since that specifies both
the
> X->Y->FV and the name.
> 
> Pick one or the other and run with it.

Oops, I will stick to the option (b).

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/feature.py:156: """Fuction mapping ``X -> Y ->
dict(A.name to A).
On 2017/01/12 19:09:09, wrengr wrote:
> The ``A`` should be ``FeatureValue`` here as well

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/feature.py:159: A function ``X -> Y ->
dict(A.name to A)`` where for all ``x``, ``y``, and
On 2017/01/12 19:09:09, wrengr wrote:
> ditto

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
File appengine/findit/crash/loglinear/model.py (right):

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:66: # weight covector.
On 2017/01/12 19:09:09, wrengr wrote:
> It is trivial to do that now. Instead of saying ``{k: (v if b else 0) for k, v
> in d.iteritems()}` which artificially avoids making things sparse, just say
> ``{k: v for k, v in d.iteritems() if b}``. The latter is exactly the same as
> filtering the former to remove all the items where you chose the else-branch.

This one is trivial, there are some other places which needs to be changed, but
I can change it if you want.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:117: preprocessing (like filtering or
truncating) applied.
On 2017/01/12 19:09:09, wrengr wrote:
> I don't understand this sentence. If it was from the old code, I don't think
it
> applies anymore.

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:239: features (literable of
FeatureValue): the values whose ``reason``
On 2017/01/12 19:09:09, wrengr wrote:
> "literable" -> "iterable"

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
File appengine/findit/crash/loglinear/test/model_test.py (right):

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/test/model_test.py:21: self._weights, 0.1)
On 2017/01/12 19:09:09, wrengr wrote:
> Do you really want so large of an epsilon? Why not stick with the default? (We
> do want to test that the default behaves reasonably, otherwise we should
choose
> a different default)

no, just picked a random number.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
File appengine/findit/crash/loglinear/test/training_test.py (left):

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/test/training_test.py:38: def
testWeightsSetterShapeMismatch(self):
On 2017/01/12 19:09:09, wrengr wrote:
> We probably still want something similar to this. SciPy shouldn't ever pass us
> the wrong shape, but we do want to make sure and raise an error if the setter
> ever does get the wrong shape. We shouldn't just silently accept the wrong
> shape, since getting it is indicative of a bug in the calling code.

Oops, forgot to add this back.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
File appengine/findit/crash/loglinear/training.py (right):

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:41: to converge.
On 2017/01/12 19:09:10, wrengr wrote:
> Should add a note that the dictionary should not be sparse, since we will only
> train on the features whose names are keys in this dict.

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:51: self._np_weights = None
On 2017/01/12 19:09:10, wrengr wrote:
> I think it'd be clearer to set the _weights and _np_weights explicitly, rather
> than setting this to None and then invoking the setter.
> 
> Also, using the setter introduces unnecessary inefficiency. E.g., it means
> computing
>
``self.NumPyArrayWeightsToWeights(self.WeightsToNumPyArrayWeights(initial_weights))``
> which is unnecessary since that is equal to ``initial_weights``. Also it means
> calling ClearWeightBasedMemos which is an expensive noop at this point.

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:60: """The NumPy Array of weight
covector.
On 2017/01/12 19:09:10, wrengr wrote:
> "of weight covector" -> "of the weight covector"

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:64: the semantics rather than the
implementation details.
On 2017/01/12 19:09:10, wrengr wrote:
> This paragraph should be deleted, since the whole point of this property is to
> provide the np.ndarray

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:70: """Mutate the np array weight
covector, and clear memos as necessary.
On 2017/01/12 19:09:10, wrengr wrote:
> It's fine to leave off the "np array" part, since we're updating both
> representations of the weight covector.

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:89: Note, the features nparray
should have the same order as in
On 2017/01/12 19:09:10, wrengr wrote:
> "nparray" -> "np array"

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:105: return
np.array([fxys[feature_name].value if feature_name in fxys else 0.
On 2017/01/12 19:09:10, wrengr wrote:
> If feature_name is not in fxys then that should be an error. It's okay to have
> weights be missing/zero; it's not okay to have features be missing.

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:110: def WeightsAsNumPyArray(self):
On 2017/01/12 19:09:10, wrengr wrote:
> This is unnecessary. Callers should use the ``np_weights`` property directly.
> The paragraph below about why we have this should be moved to the docstring
for
> the property

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:119: def
NumPyArrayWeightsToWeights(self, np_weights):
On 2017/01/12 19:09:10, wrengr wrote:
> This "weights to weights" name is confusing. Also, should be private since
this
> is only needed for internal use and isn't being provided for client code to
use.

Done.

https://codereview.chromium.org/2617273002/diff/170001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:141: def
WeightsToNumPyArrayWeights(self, weights, default=0.):
On 2017/01/12 19:09:10, wrengr wrote:
> again, name is confusing and should also be private.
> 
> Both these methods can be made to work for any np.ndarray with
> len(self._feature_order) elements. If they are made generic (in which case
> "weight" shouldn't be in the name), then it makes sense to have a default
> argument. But if they're kept weight-specific, then the default value should
> always be 0.0 and shouldn't be overridable.

Done.

Sharu Jiang

katesonia@chromium.org changed reviewers: + inferno@chromium.org, mbarbella@chromium.org

3 years, 11 months ago (2017-01-18 19:27:30 UTC) #14

RobertoCN

robertocn@chromium.org changed reviewers: + robertocn@chromium.org

3 years, 11 months ago (2017-01-19 23:45:30 UTC) #16

RobertoCN

https://codereview.chromium.org/2617273002/diff/190001/appengine/findit/crash/loglinear/feature.py File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/190001/appengine/findit/crash/loglinear/feature.py#newcode144 appengine/findit/crash/loglinear/feature.py:144: """Given a dict of scalar-valued functions, return an dict-valued ...

3 years, 11 months ago (2017-01-19 23:45:30 UTC) #17

Sharu Jiang

https://codereview.chromium.org/2617273002/diff/190001/appengine/findit/crash/loglinear/feature.py File appengine/findit/crash/loglinear/feature.py (right): https://codereview.chromium.org/2617273002/diff/190001/appengine/findit/crash/loglinear/feature.py#newcode144 appengine/findit/crash/loglinear/feature.py:144: """Given a dict of scalar-valued functions, return an dict-valued ...

3 years, 11 months ago (2017-01-19 23:53:20 UTC) #18

chanli

chanli@chromium.org changed reviewers: + chanli@chromium.org

3 years, 11 months ago (2017-01-23 01:56:38 UTC) #19

chanli

https://codereview.chromium.org/2617273002/diff/210001/appengine/findit/crash/loglinear/model.py File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/210001/appengine/findit/crash/loglinear/model.py#newcode115 appengine/findit/crash/loglinear/model.py:115: return isinstance(weight, float) and math.fabs(weight) >= self._epsilon In your ...

3 years, 11 months ago (2017-01-23 01:56:39 UTC) #20

Sharu Jiang

https://codereview.chromium.org/2617273002/diff/210001/appengine/findit/crash/loglinear/model.py File appengine/findit/crash/loglinear/model.py (right): https://codereview.chromium.org/2617273002/diff/210001/appengine/findit/crash/loglinear/model.py#newcode115 appengine/findit/crash/loglinear/model.py:115: return isinstance(weight, float) and math.fabs(weight) >= self._epsilon On 2017/01/23 ...

3 years, 11 months ago (2017-01-23 21:53:32 UTC) #22

Sharu Jiang

The patchset sent to the CQ was uploaded after l-g-t-m from mbarbella@chromium.org Link to the ...

3 years, 11 months ago (2017-01-23 21:54:02 UTC) #24

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2617273002/230001

3 years, 11 months ago (2017-01-23 21:54:08 UTC) #25

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 230001, "attempt_start_ts": 1485208441868010, "parent_rev": "759bc78c3e24c998aee118524e02f9e2d2ff5479", "commit_rev": "27163f40f7f9442d53e3a540f7945755a49e2bcf"}

3 years, 11 months ago (2017-01-23 22:09:43 UTC) #26

commit-bot: I haz the power

Description was changed from ========== [Predator] Move ``SingleFeatureScore`` to LLM. BUG=674262 TBR=stgao@chromium.org ========== to ========== ...

3 years, 11 months ago (2017-01-23 22:09:49 UTC) #27

commit-bot: I haz the power

Committed patchset #9 (id:230001) as https://chromium.googlesource.com/infra/infra/+/27163f40f7f9442d53e3a540f7945755a49e2bcf

3 years, 11 months ago (2017-01-23 22:09:50 UTC) #28

Sharu Jiang

On 2017/01/23 22:09:50, commit-bot: I haz the power wrote: > Committed patchset #9 (id:230001) as ...

3 years, 11 months ago (2017-01-23 22:11:29 UTC) #29

wrengr

3 years, 11 months ago (2017-01-25 21:54:27 UTC) #30

Message was sent while issue was closed.

lgtm, with a few comments

https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash...
File appengine/findit/crash/loglinear/feature.py (right):

https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash...
appengine/findit/crash/loglinear/feature.py:150: have a name property.
I still say it'd be clearer to refer to the ``Feature`` interface rather than
calling it a "function with a name property".

https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash...
File appengine/findit/crash/loglinear/model.py (right):

https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash...
appengine/findit/crash/loglinear/model.py:65: self._epsilon = epsilon
Can simplify this conditional to ``self._epsilon = epsilon or EPSILON``

https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash...
File appengine/findit/crash/loglinear/training.py (right):

https://codereview.chromium.org/2617273002/diff/230001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:52: # Use self._weights instead of
initialz_weights, since self._weights already
"initialz" -> "initial"

Issue 2617273002: [Predator] Move ``SingleFeatureScore`` to LLM. (Closed)

Description

Patch Set 1 : . #

Patch Set 2 : Fix a bug and nits. #

Patch Set 3 : For discussion not for review #

Patch Set 4 : Fix nits. #

Patch Set 5 : Update doc strs. #

Patch Set 6 : Address comments. #

Patch Set 7 : Address comments. #

Patch Set 8 : Fix nit. #

Patch Set 9 : Address comment. #

Messages