appengine/findit/crash/changelist_features/min_distance.py - Issue 2517383005: Implementing loglinear classification (without training), for CL classification

Side by Side Diff: appengine/findit/crash/changelist_features/min_distance.py

Issue 2517383005: Implementing loglinear classification (without training), for CL classification (Closed)

Patch Set: rebase Created 4 years ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

« no previous file with comments | « appengine/findit/crash/changelist_features/__init__.py ('k') | appengine/findit/crash/changelist_features/test/__init__.py » ('j') | appengine/findit/crash/changelist_features/test/min_distance_test.py » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
(Empty)
	1 # Copyright 2016 The Chromium Authors. All rights reserved.

	2 # Use of this source code is governed by a BSD-style license that can be

	3 # found in the LICENSE file.

	4

	5 from collections import namedtuple

	6

	7

	8 # N.B., this must not be infinity, else we'll start getting NaN values

	9 # from LinearMinDistanceFeature (and SquaredMinDistanceFeature).

	10 DEFAULT_MAXIMUM = 50

	11

	12

	13 # N.B., if we make this a namedtuple then it becomes horrible to inherit from.

	14 class MinDistanceFeature(object):

	15 def __init__(self, maximum=None):

	16 """

	17 Args:

	18 maximum (float): An upper bound on the return result. This

	19 argument is optional and (effectively) defaults to infinity.

	20 """

	21 self._maximum = maximum

	22

	23 def __call__(self, result):

	24 """Return the minimum ``AnalysisInfo.min_distance`` across all files.

	25

	26 Although this method looks like it should be a method on the

	27 ``Result`` class, we have it live here in order to make coverage

	28 tests happy. The downside of this is that we now have to modify

	29 multiple files whenever the guts of ``Result`` change. The upside

	30 is the aforementioned coverage tests, and that it helps keep the

	31 ``Result`` class looking cleaner.

	32

	33 Args:

	34 result (Result): the result to analyze.

	35

	36 Returns:

	37 The minimum distance between (the code for) a stack frame in the

	38 ``Result`` and the CL in the ``Result`` as a ``float``. If no

	39 ``maximum`` is given, then we return that minimum directly. If a

	40 ``maximum`` is given, then we return the smaller of it and the

	41 found minimum distance.

	42 """

	43 if not result.file_to_analysis_info:

	44 return self._maximum

	45

	46 minimum = min(analysis_info.min_distance

	47 for analysis_info

	48 in result.file_to_analysis_info.itervalues())

	49 if self._maximum is None:

	50 return minimum

	51

	52 return min(float(self._maximum), float(minimum))

	53

	54

	55 class LinearMinDistanceFeature(MinDistanceFeature):

	56 """Return the minimum distance scaled linearly between 0 and 1.

	57

	58 That is, when the minimum distance is 0 we return 1; when it is greater

	59 than the ``maximum`` passed to the constructor, we return 0. And in

	60 between we return values linearly interpolated between those points.

	61

	62 In principle this normalization isn't strictly required, as the weight

	63 of this feature can be be scaled to account for the normalization.

	64 However, by normalizing things we ensure that the feature's weight is

	65 independent of ``maximum``, which helps training.

	66 """

	67 def __init__(self, maximum=None):

	68 """

	69 Args:

	70 maximum (float): An upper bound on the return result. This

	71 argument is optional and defaults to ``DEFAULT_MAXIMUM``.

	72 """

	73 if maximum is None:

	74 maximum = DEFAULT_MAXIMUM
	inferno 2016/12/06 18:07:06 nit: you can just do this in contructor def __init nit: you can just do this in contructor def __init__(self, maximum=DEFAULT_MAXIMUM) Sharu Jiang 2016/12/06 20:49:19 I remember in this way, pylint will complain. Show quoted text On 2016/12/06 18:07:06, inferno wrote: > nit: you can just do this in contructor > def __init__(self, maximum=DEFAULT_MAXIMUM) I remember in this way, pylint will complain.
	75 super(LinearMinDistanceFeature, self).__init__(maximum)

	76

	77 def __call__(self, result):

	78 min_distance = super(LinearMinDistanceFeature, self).__call__(result)

	79 return (self._maximum - min_distance) / self._maximum

	80

	81

	82 class SquaredMinDistanceFeature(LinearMinDistanceFeature):

	83 """Return the minimum distance scaled quadratically between 0 and 1.

	84

	85 This feature together with ``LinearMinDistanceFeature`` (and a

	86 constant feature) allow us to capture any quadratic polynomial of the

	87 ``MinDistance``. That is, suppose we had a single feature ``c2x*2 +

	88 c1*x + 1`` with weight ``w``. Rather than using that feature directly

	89 (which would require us to specify the hyperparameters ``c2`` and

	90 ``c1``) we can instead use three features: ``w2(x2) + w1x + w0``;

	91 which enables us to avoid specifying the hyperparameters, by pushing

	92 them into the weight parameters instead.

	93 """

	94 def __call__(self, result):

	95 linear_min_distance = (

	96 super(SquaredMinDistanceFeature, self).__call__(result))

	97 return linear_min_distance * linear_min_distance

OLD	NEW