Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(53)

Issue 2544493004: [Predator] Implement training for loglinear models (Closed)

Created:
4 years ago by wrengr
Modified:
3 years, 12 months ago
CC:
chromium-reviews, infra-reviews+infra_chromium.org, aarya
Target Ref:
refs/heads/master
Project:
infra
Visibility:
Public.

Description

Patch Set 1 #

Patch Set 2 : rebase #

Patch Set 3 : rebase #

Patch Set 4 : rebase #

Patch Set 5 : rebase #

Patch Set 6 : rebase #

Patch Set 7 : first attempt at testing #

Patch Set 8 : debugging #

Patch Set 9 : Got training to work / training tests to pass. #

Patch Set 10 : cleaning up todos and adding more tests #

Patch Set 11 : minor linting #

Total comments: 5

Patch Set 12 : addressing nit #

Total comments: 2

Patch Set 13 : Breaking out the shared code of loglinear/{model,training}_test.py #

Unified diffs Side-by-side diffs Delta from patch set Stats (+298 lines, -35 lines) Patch
A appengine/findit/crash/loglinear/test/loglinear_testcase.py View 1 2 3 4 5 6 7 8 9 10 11 12 1 chunk +47 lines, -0 lines 0 comments Download
M appengine/findit/crash/loglinear/test/model_test.py View 1 2 3 4 5 6 7 8 9 10 11 12 2 chunks +12 lines, -35 lines 0 comments Download
A appengine/findit/crash/loglinear/test/training_test.py View 1 2 3 4 5 6 7 8 9 10 11 12 1 chunk +60 lines, -0 lines 0 comments Download
A appengine/findit/crash/loglinear/training.py View 1 2 3 4 5 6 7 8 9 10 11 1 chunk +179 lines, -0 lines 0 comments Download

Dependent Patchsets:

Messages

Total messages: 18 (8 generated)
wrengr
The loglinear-model training cl is ready for review. PTAL
4 years ago (2016-12-20 23:22:56 UTC) #5
Sharu Jiang
I have a general question, I remember you mentioned we want to use 2 labels ...
4 years ago (2016-12-21 21:30:01 UTC) #6
wrengr
On 2016/12/21 21:30:01, Sharu Jiang wrote: > I have a general question, I remember you ...
4 years ago (2016-12-21 22:08:11 UTC) #7
wrengr
https://codereview.chromium.org/2544493004/diff/200001/appengine/findit/crash/loglinear/training.py File appengine/findit/crash/loglinear/training.py (right): https://codereview.chromium.org/2544493004/diff/200001/appengine/findit/crash/loglinear/training.py#newcode24 appengine/findit/crash/loglinear/training.py:24: training_data (iterable): a collection of ``(x, y)`` pairs where ...
4 years ago (2016-12-21 22:34:32 UTC) #8
Sharu Jiang
On 2016/12/21 22:08:11, wrengr wrote: > On 2016/12/21 21:30:01, Sharu Jiang wrote: > > I ...
4 years ago (2016-12-21 23:47:03 UTC) #9
Sharu Jiang
lgtm with nits. https://codereview.chromium.org/2544493004/diff/220001/appengine/findit/crash/loglinear/test/training_test.py File appengine/findit/crash/loglinear/test/training_test.py (right): https://codereview.chromium.org/2544493004/diff/220001/appengine/findit/crash/loglinear/test/training_test.py#newcode9 appengine/findit/crash/loglinear/test/training_test.py:9: import crash.loglinear.test.model_test as loglinear_test We'd better ...
4 years ago (2016-12-21 23:54:17 UTC) #10
wrengr
https://codereview.chromium.org/2544493004/diff/220001/appengine/findit/crash/loglinear/test/training_test.py File appengine/findit/crash/loglinear/test/training_test.py (right): https://codereview.chromium.org/2544493004/diff/220001/appengine/findit/crash/loglinear/test/training_test.py#newcode9 appengine/findit/crash/loglinear/test/training_test.py:9: import crash.loglinear.test.model_test as loglinear_test On 2016/12/21 23:54:17, Sharu Jiang ...
3 years, 12 months ago (2016-12-22 19:58:00 UTC) #11
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2544493004/240001
3 years, 12 months ago (2016-12-22 20:24:36 UTC) #14
commit-bot: I haz the power
Committed patchset #13 (id:240001) as https://chromium.googlesource.com/infra/infra/+/fa948f23b0d94d42093e73f26871328980c76acf
3 years, 12 months ago (2016-12-22 21:21:14 UTC) #17
wrengr
3 years, 12 months ago (2016-12-22 21:21:45 UTC) #18
Message was sent while issue was closed.
On 2016/12/21 23:47:03, Sharu Jiang wrote:
> On 2016/12/21 22:08:11, wrengr wrote:
> > On 2016/12/21 21:30:01, Sharu Jiang wrote:
> > > I have a general question, I remember you mentioned we want to use 2
labels
> -
> > > (is_culprit, is_not_culprit) instead of a collection of changelogs as
> labels.
> > > How are we gonna apply that to the training or recommendation?
> > 
> > Thinking through it some more, I think the original idea of using the
Suspects
> > as the labels works better overall. Just using the two labels ends up
throwing
> > away too much information so isn't helpful.
> > 
> > Using Y = Suspect does still run into the issues I mentioned about sparsity
of
> > training data, but I think that's unavoidable for the task. Since we're
using
> a
> > conditional LLM (rather than a joint LLM), we should be able to avoid the
> 
> Would you mind explaining and adding reference in the doc to this? 
> 
> > problems about there being a lot of Suspects. That is, since the partition
> > function is conditional on some given CrashReport, we only need to consider
> the
> > Suspects for that particular CrashReport; those subsets of Suspects are
pretty
> > small, and we could actually label a bunch of them so training data is less
of
> > an issue. Technically this pushes the trainingdata-sparsity problem into
``X``
> > since there's still a huge number of CrashReports. But so long as we avoid
> > features which depend on gritty details of CrashReports, I think it'll work
> out.
> > (That is, so long as features don't do things like depend on whether some
> > specific file changed, or some specific diff occurred, etc. All our current
> > features are very general things that collapse a bunch of similar
> CrashReports,
> > which deals with the sparsity problems.)
> > 
> > I can write this up in greater detail in the LLM description doc if it'd
help.
> 
> Yes, please add this in the doc.

I added a discussion of this to the final section of
https://docs.google.com/document/d/1v8YOl8WFSrEK2u7us8Cpb6P36-_T6VqUXTDDx6iG0_w
. If there's anything unclear, let me know and I'll try to flesh it out more.

Powered by Google App Engine
This is Rietveld 408576698