Issue 2544493004: [Predator] Implement training for loglinear models

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+298 lines, -35 lines)			Patch
A	appengine/findit/crash/loglinear/test/loglinear_testcase.py	View	1 2 3 4 5 6 7 8 9 10 11 12	1 chunk	+47 lines, -0 lines	0 comments	Download
M	appengine/findit/crash/loglinear/test/model_test.py	View	1 2 3 4 5 6 7 8 9 10 11 12	2 chunks	+12 lines, -35 lines	0 comments	Download
A	appengine/findit/crash/loglinear/test/training_test.py	View	1 2 3 4 5 6 7 8 9 10 11 12	1 chunk	+60 lines, -0 lines	0 comments	Download
A	appengine/findit/crash/loglinear/training.py	View	1 2 3 4 5 6 7 8 9 10 11	1 chunk	+179 lines, -0 lines	0 comments	Download

Dependent Patchsets:

Issue 2595263003 Patch 1

Messages

Total messages: 18 (8 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Sharu Jiang

I have a general question, I remember you mentioned we want to use 2 labels ...

4 years ago (2016-12-21 21:30:01 UTC) #6

wrengr

On 2016/12/21 21:30:01, Sharu Jiang wrote: > I have a general question, I remember you ...

4 years ago (2016-12-21 22:08:11 UTC) #7

wrengr

https://codereview.chromium.org/2544493004/diff/200001/appengine/findit/crash/loglinear/training.py File appengine/findit/crash/loglinear/training.py (right): https://codereview.chromium.org/2544493004/diff/200001/appengine/findit/crash/loglinear/training.py#newcode24 appengine/findit/crash/loglinear/training.py:24: training_data (iterable): a collection of ``(x, y)`` pairs where ...

4 years ago (2016-12-21 22:34:32 UTC) #8

https://codereview.chromium.org/2544493004/diff/200001/appengine/findit/crash...
File appengine/findit/crash/loglinear/training.py (right):

https://codereview.chromium.org/2544493004/diff/200001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:24: training_data (iterable): a
collection of ``(x, y)`` pairs where
On 2016/12/21 21:30:01, Sharu Jiang wrote:
> Using ``y`` denotation is a bit confusing, since we are using
> ``X->Y->list(Features)``, I think we should differentiate label ``y`` with any
> changelog ``y``. How about using ``l`` or something else instead?

``Y`` is the type of the second argument of the feature function; so it is the
type of "labels". Since this file is for trainable LLMs in general, I don't want
to specify ``Y`` (or ``X``) since those will vary for different models (e.g.,
for an LLM-based component classifier ``Y`` will be the components (or the
component name augmented with some additional metadata, like how Suspect is a CL
plus some metadata)).

https://codereview.chromium.org/2544493004/diff/200001/appengine/findit/crash...
appengine/findit/crash/loglinear/training.py:50: # re-provide it in order to
define the setter.
On 2016/12/21 21:30:01, Sharu Jiang wrote:
> Interesting to know, so we cannot overwrite the setter of superclass property?
> what kind of error it will have if we do so?

The problem has to do with the way the @property decorator works. If you're
curious what the exact crash stack looks like, this happens when you try
``run.sh test``:

  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/ENV/local/lib/python2.7/site-packages/expect_tests/pipeline.py",
line 516, in result_loop_single_context
    gen_loop_process(*test_gen_args)
  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/ENV/local/lib/python2.7/site-packages/expect_tests/pipeline.py",
line 334, in gen_loop_process
    result_queue.put_nowait)
  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/ENV/local/lib/python2.7/site-packages/expect_tests/type_definitions.py",
line 452, in gen_stage_loop
    for test in tests:
  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/ENV/local/lib/python2.7/site-packages/expect_tests/pipeline.py",
line 277, in generate_tests
    gens = get_test_gens_package(testing_context, subpath=subpath)
  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/ENV/local/lib/python2.7/site-packages/expect_tests/pipeline.py",
line 242, in get_test_gens_package
    mod = load_module(modname)
  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/ENV/local/lib/python2.7/site-packages/expect_tests/pipeline.py",
line 212, in load_module
    mod = __import__(modname)
  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/appengine/findit/crash/loglinear/test/training_test.py",
line 11, in <module>
    from crash.loglinear.training import TrainableLogLinearModel
  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/appengine/findit/crash/loglinear/training.py",
line 15, in <module>
    class TrainableLogLinearModel(LogLinearModel):
  File
"/usr/local/google/home/wrengr/chromium-srcs/infra/appengine/findit/crash/loglinear/training.py",
line 50, in TrainableLogLinearModel
    @weights.setter
NameError: name 'weights' is not defined
Exception occurred when listing tests. Aborting.

Sharu Jiang

On 2016/12/21 22:08:11, wrengr wrote: > On 2016/12/21 21:30:01, Sharu Jiang wrote: > > I ...

4 years ago (2016-12-21 23:47:03 UTC) #9

Sharu Jiang

lgtm with nits. https://codereview.chromium.org/2544493004/diff/220001/appengine/findit/crash/loglinear/test/training_test.py File appengine/findit/crash/loglinear/test/training_test.py (right): https://codereview.chromium.org/2544493004/diff/220001/appengine/findit/crash/loglinear/test/training_test.py#newcode9 appengine/findit/crash/loglinear/test/training_test.py:9: import crash.loglinear.test.model_test as loglinear_test We'd better ...

4 years ago (2016-12-21 23:54:17 UTC) #10

wrengr

https://codereview.chromium.org/2544493004/diff/220001/appengine/findit/crash/loglinear/test/training_test.py File appengine/findit/crash/loglinear/test/training_test.py (right): https://codereview.chromium.org/2544493004/diff/220001/appengine/findit/crash/loglinear/test/training_test.py#newcode9 appengine/findit/crash/loglinear/test/training_test.py:9: import crash.loglinear.test.model_test as loglinear_test On 2016/12/21 23:54:17, Sharu Jiang ...

3 years, 12 months ago (2016-12-22 19:58:00 UTC) #11

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2544493004/240001

3 years, 12 months ago (2016-12-22 20:24:36 UTC) #14

commit-bot: I haz the power

Committed patchset #13 (id:240001) as https://chromium.googlesource.com/infra/infra/+/fa948f23b0d94d42093e73f26871328980c76acf

3 years, 12 months ago (2016-12-22 21:21:14 UTC) #17

wrengr

3 years, 12 months ago (2016-12-22 21:21:45 UTC) #18

Message was sent while issue was closed.

On 2016/12/21 23:47:03, Sharu Jiang wrote:
> On 2016/12/21 22:08:11, wrengr wrote:
> > On 2016/12/21 21:30:01, Sharu Jiang wrote:
> > > I have a general question, I remember you mentioned we want to use 2
labels
> -
> > > (is_culprit, is_not_culprit) instead of a collection of changelogs as
> labels.
> > > How are we gonna apply that to the training or recommendation?
> > 
> > Thinking through it some more, I think the original idea of using the
Suspects
> > as the labels works better overall. Just using the two labels ends up
throwing
> > away too much information so isn't helpful.
> > 
> > Using Y = Suspect does still run into the issues I mentioned about sparsity
of
> > training data, but I think that's unavoidable for the task. Since we're
using
> a
> > conditional LLM (rather than a joint LLM), we should be able to avoid the
> 
> Would you mind explaining and adding reference in the doc to this? 
> 
> > problems about there being a lot of Suspects. That is, since the partition
> > function is conditional on some given CrashReport, we only need to consider
> the
> > Suspects for that particular CrashReport; those subsets of Suspects are
pretty
> > small, and we could actually label a bunch of them so training data is less
of
> > an issue. Technically this pushes the trainingdata-sparsity problem into
``X``
> > since there's still a huge number of CrashReports. But so long as we avoid
> > features which depend on gritty details of CrashReports, I think it'll work
> out.
> > (That is, so long as features don't do things like depend on whether some
> > specific file changed, or some specific diff occurred, etc. All our current
> > features are very general things that collapse a bunch of similar
> CrashReports,
> > which deals with the sparsity problems.)
> > 
> > I can write this up in greater detail in the LLM description doc if it'd
help.
> 
> Yes, please add this in the doc.

I added a discussion of this to the final section of
https://docs.google.com/document/d/1v8YOl8WFSrEK2u7us8Cpb6P36-_T6VqUXTDDx6iG0_w
. If there's anything unclear, let me know and I'll try to flesh it out more.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Issue 2544493004: [Predator] Implement training for loglinear models (Closed)

Description

Patch Set 1 #

Patch Set 2 : rebase #

Patch Set 3 : rebase #

Patch Set 4 : rebase #

Patch Set 5 : rebase #

Patch Set 6 : rebase #

Patch Set 7 : first attempt at testing #

Patch Set 8 : debugging #

Patch Set 9 : Got training to work / training tests to pass. #

Patch Set 10 : cleaning up todos and adding more tests #

Patch Set 11 : minor linting #

Patch Set 12 : addressing nit #

Patch Set 13 : Breaking out the shared code of loglinear/{model,training}_test.py #

Messages