| OLD | NEW |
| 1 # Copyright 2016 The Chromium Authors. All rights reserved. | 1 # Copyright 2016 The Chromium Authors. All rights reserved. |
| 2 # Use of this source code is governed by a BSD-style license that can be | 2 # Use of this source code is governed by a BSD-style license that can be |
| 3 # found in the LICENSE file. | 3 # found in the LICENSE file. |
| 4 | 4 |
| 5 # TODO(http://crbug.com/669639): there are lots of ways to make the code | 5 # TODO(http://crbug.com/669639): there are lots of ways to make the code |
| 6 # in this file better. We avoid having separate todos per task; instead | 6 # in this file better. We avoid having separate todos per task; instead |
| 7 # see that meta-issue ticket. | 7 # see that meta-issue ticket. |
| 8 | 8 |
| 9 import math | 9 import math |
| 10 import numpy as np | 10 import numpy as np |
| (...skipping 33 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
| 44 give to each feature from the feature itself, and (b) they allow us | 44 give to each feature from the feature itself, and (b) they allow us |
| 45 to keep adding features at will without needing to worry about the | 45 to keep adding features at will without needing to worry about the |
| 46 fiddly details needed for it to be a probability model. | 46 fiddly details needed for it to be a probability model. |
| 47 | 47 |
| 48 Throughout the documentation we use the following conventional | 48 Throughout the documentation we use the following conventional |
| 49 terminology. The independent variable (aka: the given data which | 49 terminology. The independent variable (aka: the given data which |
| 50 sets up the classification problem) is called ``X``, where particular | 50 sets up the classification problem) is called ``X``, where particular |
| 51 values for that variable called ``x``. The dependent variable (aka: | 51 values for that variable called ``x``. The dependent variable (aka: |
| 52 the answers/labels returned by classification) is called ``Y``, | 52 the answers/labels returned by classification) is called ``Y``, |
| 53 where particular values for that random variable called ``y``. The | 53 where particular values for that random variable called ``y``. The |
| 54 partition function is called ``Z``. And, somewhat non-conventionally, | 54 partition function is called ``Z``. And (somewhat non-conventionally) |
| 55 we will call the log of the partition function ``zeta``. | 55 we will call the log of the partition function ``zeta``. |
| 56 | 56 |
| 57 This class is distinct from ``LogLinearModel`` in that we do not require | 57 This class is distinct from ``LogLinearModel`` in that we do not require |
| 58 a specification of ``Y``. This class is sufficient for determining | 58 a specification of ``Y``. This class is sufficient for determining |
| 59 which ``y`` is the most likely, though it can only return a "score" | 59 which ``y`` is the most likely, though it can only return a "score" |
| 60 rather than returning a probability per se. | 60 rather than returning a probability per se. |
| 61 """ | 61 """ |
| 62 | 62 |
| 63 def __init__(self, feature_function, weights, epsilon=None): | 63 def __init__(self, feature_function, weights, epsilon=None): |
| 64 """Construct a new model with the given weights and feature function. | 64 """Construct a new model with the given weights and feature function. |
| (...skipping 144 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
| 209 A ``MemoizedFunction`` of type ``Y -> float``. | 209 A ``MemoizedFunction`` of type ``Y -> float``. |
| 210 """ | 210 """ |
| 211 return self._scores(x) | 211 return self._scores(x) |
| 212 | 212 |
| 213 | 213 |
| 214 class LogLinearModel(UnnormalizedLogLinearModel): | 214 class LogLinearModel(UnnormalizedLogLinearModel): |
| 215 """A loglinear probability model. | 215 """A loglinear probability model. |
| 216 | 216 |
| 217 This class is distinct from ``UnnormalizedLogLinearModel`` in that | 217 This class is distinct from ``UnnormalizedLogLinearModel`` in that |
| 218 we can provide probabilities (not just scores). However, to do so we | 218 we can provide probabilities (not just scores). However, to do so we |
| 219 require a specification of the entire set ``Y``. | 219 require a specification of the subsets of ``Y`` for each ``x``. |
| 220 """ | 220 """ |
| 221 def __init__(self, Y, feature_function, weights, epsilon=None): | 221 def __init__(self, Y_given_X, feature_function, weights, epsilon=None): |
| 222 """Construct a new probabilistic model. | 222 """Construct a new probabilistic model. |
| 223 | 223 |
| 224 Args: | 224 Args: |
| 225 Y (iterable): the entire range of values for the independent | 225 Y_given_X: a function from ``X`` to an iterable object giving the |
| 226 variable. This is needed for computing the partition function. | 226 subset of ``Y`` which has non-zero probability given the |
| 227 ``x``. When in doubt about whether some ``y`` has zero probability |
| 228 or not, it is always safe/correct to return a larger subset of |
| 229 ``Y`` (it'll just take more computation time is all). This is |
| 230 needed for computing the partition function and expectation. N.B., |
| 231 we do not actually need to know/enumerate of *all* of ``Y``, |
| 232 only the subsets for each ``x``. |
| 227 feature_function: A function ``X -> Y -> list(float)``. N.B., | 233 feature_function: A function ``X -> Y -> list(float)``. N.B., |
| 228 for all ``x`` and ``y`` the length of ``feature_function(x)(y)`` | 234 for all ``x`` and ``y`` the length of ``feature_function(x)(y)`` |
| 229 must be the same as the length of ``weights``. | 235 must be the same as the length of ``weights``. |
| 230 weights (list of float): coefficients for how important we consider | 236 weights (list of float): coefficients for how important we consider |
| 231 each component of the feature vector for deciding which ``y`` | 237 each component of the feature vector for deciding which ``y`` |
| 232 to blame. | 238 to blame. |
| 233 epsilon (float): The absolute-error threshold for considering a | 239 epsilon (float): The absolute-error threshold for considering a |
| 234 weight to be "equal to zero". N.B., this should be a positive | 240 weight to be "equal to zero". N.B., this should be a positive |
| 235 number, as we will compare it against the absolute value of | 241 number, as we will compare it against the absolute value of |
| 236 each weight. | 242 each weight. |
| 237 """ | 243 """ |
| 238 super(LogLinearModel, self).__init__(feature_function, weights, epsilon) | 244 super(LogLinearModel, self).__init__(feature_function, weights, epsilon) |
| 239 | 245 |
| 240 self._Y = frozenset(Y) | 246 self._Y = Y_given_X |
| 241 | 247 |
| 242 def _LogZ(x): | 248 def _LogZ(x): |
| 243 score_given_x = self._scores(x) | 249 score_given_x = self._scores(x) |
| 244 return logsumexp([score_given_x(y) for y in self._Y]) | 250 return logsumexp([score_given_x(y) for y in self._Y(x)]) |
| 245 self._zetas = MemoizedFunction(_LogZ) | 251 self._zetas = MemoizedFunction(_LogZ) |
| 246 | 252 |
| 247 def ClearWeightBasedMemos(self): | 253 def ClearWeightBasedMemos(self): |
| 248 """Clear all the memos that depend on the weight covector.""" | 254 """Clear all the memos that depend on the weight covector.""" |
| 249 super(LogLinearModel, self).ClearWeightBasedMemos() | 255 super(LogLinearModel, self).ClearWeightBasedMemos() |
| 250 self._zetas.ClearMemos() | 256 self._zetas.ClearMemos() |
| 251 | 257 |
| 252 def Z(self, x): | 258 def Z(self, x): |
| 253 """The normal-domain conditional partition-function given ``x``. | 259 """The normal-domain conditional partition-function given ``x``. |
| 254 | 260 |
| (...skipping 56 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
| 311 An ``np.ndarray`` of ``float`` called the "expected value". N.B., | 317 An ``np.ndarray`` of ``float`` called the "expected value". N.B., |
| 312 the particular array returned may not actually be one that the | 318 the particular array returned may not actually be one that the |
| 313 function returns; rather, it's a sort of average of all the results | 319 function returns; rather, it's a sort of average of all the results |
| 314 returned. For more information you can take a look at Wikipedia | 320 returned. For more information you can take a look at Wikipedia |
| 315 <https://en.wikipedia.org/wiki/Expected_value>. | 321 <https://en.wikipedia.org/wiki/Expected_value>. |
| 316 """ | 322 """ |
| 317 prob_given_x = self.Probability(x) | 323 prob_given_x = self.Probability(x) |
| 318 # N.B., the ``*`` below is vector scaling! If we want to make this | 324 # N.B., the ``*`` below is vector scaling! If we want to make this |
| 319 # method polymorphic in the return type of ``f`` then we'll need an | 325 # method polymorphic in the return type of ``f`` then we'll need an |
| 320 # API that provides both scaling and ``vsum``. | 326 # API that provides both scaling and ``vsum``. |
| 321 return vsum([prob_given_x(y) * f(y) for y in self._Y]) | 327 return vsum([prob_given_x(y) * f(y) for y in self._Y(x)]) |
| 322 | 328 |
| OLD | NEW |