Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1)

Side by Side Diff: appengine/findit/crash/loglinear/model.py

Issue 2617273002: [Predator] Move ``SingleFeatureScore`` to LLM. (Closed)
Patch Set: . Created 3 years, 11 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
1 # Copyright 2016 The Chromium Authors. All rights reserved. 1 # Copyright 2016 The Chromium Authors. All rights reserved.
2 # Use of this source code is governed by a BSD-style license that can be 2 # Use of this source code is governed by a BSD-style license that can be
3 # found in the LICENSE file. 3 # found in the LICENSE file.
4 4
5 # TODO(http://crbug.com/669639): there are lots of ways to make the code 5 # TODO(http://crbug.com/669639): there are lots of ways to make the code
6 # in this file better. We avoid having separate todos per task; instead 6 # in this file better. We avoid having separate todos per task; instead
7 # see that meta-issue ticket. 7 # see that meta-issue ticket.
8 8
9 import math 9 import math
10 import numpy as np 10 import numpy as np
(...skipping 42 matching lines...) Expand 10 before | Expand all | Expand 10 after
53 where particular values for that random variable called ``y``. The 53 where particular values for that random variable called ``y``. The
54 partition function is called ``Z``. And (somewhat non-conventionally) 54 partition function is called ``Z``. And (somewhat non-conventionally)
55 we will call the log of the partition function ``zeta``. 55 we will call the log of the partition function ``zeta``.
56 56
57 This class is distinct from ``LogLinearModel`` in that we do not require 57 This class is distinct from ``LogLinearModel`` in that we do not require
58 a specification of ``Y``. This class is sufficient for determining 58 a specification of ``Y``. This class is sufficient for determining
59 which ``y`` is the most likely, though it can only return a "score" 59 which ``y`` is the most likely, though it can only return a "score"
60 rather than returning a probability per se. 60 rather than returning a probability per se.
61 """ 61 """
62 62
63 def __init__(self, feature_function, weights, epsilon=None): 63 def __init__(self, feature_function, weights, feature_to_weight,
64 epsilon=None):
wrengr 2017/01/09 19:31:51 given the dict of weights the list is redundant. F
64 """Construct a new model with the given weights and feature function. 65 """Construct a new model with the given weights and feature function.
65 66
66 Args: 67 Args:
67 feature_function: A function ``X -> Y -> list(FeatureValue)``. N.B., 68 feature_function: A function ``X -> Y -> list(FeatureValue)``. N.B.,
68 for all ``x`` and ``y`` the length of ``feature_function(x)(y)`` 69 for all ``x`` and ``y`` the length of ``feature_function(x)(y)``
69 must be the same as the length of ``weights``. 70 must be the same as the length of ``weights``.
70 weights (list of float): coefficients for how important we consider 71 weights (list of float): coefficients for how important we consider
71 each component of the feature vector for deciding which ``y`` 72 each component of the feature vector for deciding which ``y``
72 to blame. 73 to blame.
74 feature_to_weight (dict of float): the weights for the features. The keys
75 of the dictionary are the names of the feature that weight is
76 for. We take this argument as a dict rather than as a list so that
77 callers needn't worry about what order to provide the weights in.
73 epsilon (float): The absolute-error threshold for considering a 78 epsilon (float): The absolute-error threshold for considering a
74 weight to be "equal to zero". N.B., this should be a positive 79 weight to be "equal to zero". N.B., this should be a positive
75 number, as we will compare it against the absolute value of 80 number, as we will compare it against the absolute value of
76 each weight. 81 each weight.
77 """ 82 """
78 if epsilon is None: 83 if epsilon is None:
79 epsilon = EPSILON 84 epsilon = EPSILON
85
86 self._feature_to_weight = feature_to_weight
80 self._weights = np.array([ 87 self._weights = np.array([
wrengr 2017/01/09 19:31:52 Once we no longer take the list of weights, this w
81 w if isinstance(w, float) and math.fabs(w) >= epsilon else 0. 88 w if isinstance(w, float) and math.fabs(w) >= epsilon else 0.
82 for w in weights]) 89 for w in weights])
83 90
84 self._quadrance = None 91 self._quadrance = None
85 92
86 # TODO(crbug.com/674752): we need better names for ``self._features``. 93 # TODO(crbug.com/674752): we need better names for ``self._features``.
87 def _Features(x): 94 def _Features(x):
88 """Wrap ``feature_function`` to memoize things and ensure types. 95 """Wrap ``feature_function`` to memoize things and ensure types.
89 96
90 This outer wrapping takes each ``x`` to a memoized instance of 97 This outer wrapping takes each ``x`` to a memoized instance of
(...skipping 23 matching lines...) Expand all
114 # TODO(crbug.com/674752): we need better names for ``self._scores``. 121 # TODO(crbug.com/674752): we need better names for ``self._scores``.
115 # N.B., this is just the inner product of ``self.weights`` 122 # N.B., this is just the inner product of ``self.weights``
116 # against ``self._features(x)``. If we can compute this in some 123 # against ``self._features(x)``. If we can compute this in some
117 # more efficient way, we should. In particular, we will want to 124 # more efficient way, we should. In particular, we will want to
118 # make the weights sparse, in which case we need to use a sparse 125 # make the weights sparse, in which case we need to use a sparse
119 # variant of the dot product. 126 # variant of the dot product.
120 self._scores = MemoizedFunction(lambda x: 127 self._scores = MemoizedFunction(lambda x:
121 self._features(x).map(lambda fxy: 128 self._features(x).map(lambda fxy:
122 self.weights.dot(np.array([feature.value for feature in fxy])))) 129 self.weights.dot(np.array([feature.value for feature in fxy]))))
123 130
131 def SingleFeatureScore(self, feature_value):
132 """Returns the score (aka weighted value) of a ``FeatureValue``.
133
134 This function assumes the report's stacktrace has already had any necessary
135 preprocessing (like filtering or truncating) applied.
136
137 Args:
138 feature_value (FeatureValue): the feature value to check.
139
140 Returns:
141 The score of the feature value.
142 """
143 return feature_value.value * self._feature_to_weight.get(
144 feature_value.name, 0.)
145
124 def ClearWeightBasedMemos(self): 146 def ClearWeightBasedMemos(self):
125 """Clear all the memos that depend on the weight covector.""" 147 """Clear all the memos that depend on the weight covector."""
126 self._quadrance = None 148 self._quadrance = None
127 self._scores.ClearMemos() 149 self._scores.ClearMemos()
128 150
129 def ClearAllMemos(self): 151 def ClearAllMemos(self):
130 """Clear all memos, even those independent of the weight covector.""" 152 """Clear all memos, even those independent of the weight covector."""
131 self.ClearWeightBasedMemos() 153 self.ClearWeightBasedMemos()
132 self._features.ClearMemos() 154 self._features.ClearMemos()
133 155
(...skipping 77 matching lines...) Expand 10 before | Expand all | Expand 10 after
211 return self._scores(x) 233 return self._scores(x)
212 234
213 235
214 class LogLinearModel(UnnormalizedLogLinearModel): 236 class LogLinearModel(UnnormalizedLogLinearModel):
215 """A loglinear probability model. 237 """A loglinear probability model.
216 238
217 This class is distinct from ``UnnormalizedLogLinearModel`` in that 239 This class is distinct from ``UnnormalizedLogLinearModel`` in that
218 we can provide probabilities (not just scores). However, to do so we 240 we can provide probabilities (not just scores). However, to do so we
219 require a specification of the subsets of ``Y`` for each ``x``. 241 require a specification of the subsets of ``Y`` for each ``x``.
220 """ 242 """
221 def __init__(self, Y_given_X, feature_function, weights, epsilon=None): 243 def __init__(self, Y_given_X, feature_function, weights, feature_to_weight,
244 epsilon=None):
222 """Construct a new probabilistic model. 245 """Construct a new probabilistic model.
223 246
224 Args: 247 Args:
225 Y_given_X: a function from ``X`` to an iterable object giving the 248 Y_given_X: a function from ``X`` to an iterable object giving the
226 subset of ``Y`` which has non-zero probability given the 249 subset of ``Y`` which has non-zero probability given the
227 ``x``. When in doubt about whether some ``y`` has zero probability 250 ``x``. When in doubt about whether some ``y`` has zero probability
228 or not, it is always safe/correct to return a larger subset of 251 or not, it is always safe/correct to return a larger subset of
229 ``Y`` (it'll just take more computation time is all). This is 252 ``Y`` (it'll just take more computation time is all). This is
230 needed for computing the partition function and expectation. N.B., 253 needed for computing the partition function and expectation. N.B.,
231 we do not actually need to know/enumerate of *all* of ``Y``, 254 we do not actually need to know/enumerate of *all* of ``Y``,
232 only the subsets for each ``x``. 255 only the subsets for each ``x``.
233 feature_function: A function ``X -> Y -> list(float)``. N.B., 256 feature_function: A function ``X -> Y -> list(float)``. N.B.,
234 for all ``x`` and ``y`` the length of ``feature_function(x)(y)`` 257 for all ``x`` and ``y`` the length of ``feature_function(x)(y)``
235 must be the same as the length of ``weights``. 258 must be the same as the length of ``weights``.
236 weights (list of float): coefficients for how important we consider 259 weights (list of float): coefficients for how important we consider
237 each component of the feature vector for deciding which ``y`` 260 each component of the feature vector for deciding which ``y``
238 to blame. 261 to blame.
wrengr 2017/01/09 19:31:52 documentation wasn't updated; though see my previo
239 epsilon (float): The absolute-error threshold for considering a 262 epsilon (float): The absolute-error threshold for considering a
240 weight to be "equal to zero". N.B., this should be a positive 263 weight to be "equal to zero". N.B., this should be a positive
241 number, as we will compare it against the absolute value of 264 number, as we will compare it against the absolute value of
242 each weight. 265 each weight.
243 """ 266 """
244 super(LogLinearModel, self).__init__(feature_function, weights, epsilon) 267 super(LogLinearModel, self).__init__(feature_function, weights,
268 feature_to_weight, epsilon)
245 269
246 self._Y = Y_given_X 270 self._Y = Y_given_X
247 271
248 def _LogZ(x): 272 def _LogZ(x):
249 score_given_x = self._scores(x) 273 score_given_x = self._scores(x)
250 return logsumexp([score_given_x(y) for y in self._Y(x)]) 274 return logsumexp([score_given_x(y) for y in self._Y(x)])
251 self._zetas = MemoizedFunction(_LogZ) 275 self._zetas = MemoizedFunction(_LogZ)
252 276
253 def ClearWeightBasedMemos(self): 277 def ClearWeightBasedMemos(self):
254 """Clear all the memos that depend on the weight covector.""" 278 """Clear all the memos that depend on the weight covector."""
(...skipping 64 matching lines...) Expand 10 before | Expand all | Expand 10 after
319 function returns; rather, it's a sort of average of all the results 343 function returns; rather, it's a sort of average of all the results
320 returned. For more information you can take a look at Wikipedia 344 returned. For more information you can take a look at Wikipedia
321 <https://en.wikipedia.org/wiki/Expected_value>. 345 <https://en.wikipedia.org/wiki/Expected_value>.
322 """ 346 """
323 prob_given_x = self.Probability(x) 347 prob_given_x = self.Probability(x)
324 # N.B., the ``*`` below is vector scaling! If we want to make this 348 # N.B., the ``*`` below is vector scaling! If we want to make this
325 # method polymorphic in the return type of ``f`` then we'll need an 349 # method polymorphic in the return type of ``f`` then we'll need an
326 # API that provides both scaling and ``vsum``. 350 # API that provides both scaling and ``vsum``.
327 return vsum([prob_given_x(y) * f(y) for y in self._Y(x)]) 351 return vsum([prob_given_x(y) * f(y) for y in self._Y(x)])
328 352
OLDNEW

Powered by Google App Engine
This is Rietveld 408576698