Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(248)

Issue 1808503002: Update distillability modeling scripts to predict long articles (Closed)

Created:
4 years, 9 months ago by wychen
Modified:
4 years, 4 months ago
Reviewers:
mdjones
Base URL:
git@github.com:chromium/dom-distiller.git@ml-visible
Target Ref:
refs/heads/master
Visibility:
Public.

Description

Update distillability modeling scripts to predict long articles The model produced from these scripts was added to Chrome here: https://crrev.com/e9ef74ab7411ca08359015167b2c2bc1b566f95b Highlight of improvements: * Generate label according to the distilled length * Feature extraction and selection - Derive features on the fly for scalability - Output features by group or one by one - Support mobile emulation - Support native feature extraction - Support using MHTML as input * Added sanity checking tools * Add documentation BUG=610944 R=mdjones@chromium.org Committed: f8f3308f99ec3dcfa83420b304dec3cc083c9008

Patch Set 1 #

Patch Set 2 : fix load-mhtml, add dev mode arguments #

Patch Set 3 : update docs #

Unified diffs Side-by-side diffs Delta from patch set Stats (+758 lines, -131 lines) Patch
M heuristics/distillable/README.md View 1 2 5 chunks +109 lines, -13 lines 0 comments Download
M heuristics/distillable/calculate_derived_features.py View 4 chunks +54 lines, -26 lines 0 comments Download
A heuristics/distillable/check_derived_features.py View 1 chunk +79 lines, -0 lines 0 comments Download
A heuristics/distillable/check_distilled_mhtml.py View 1 chunk +65 lines, -0 lines 0 comments Download
M heuristics/distillable/extract_features.js View 2 chunks +5 lines, -7 lines 0 comments Download
M heuristics/distillable/get_screenshots.py View 1 6 chunks +196 lines, -49 lines 0 comments Download
M heuristics/distillable/server.py View 1 chunk +1 line, -1 line 0 comments Download
M heuristics/distillable/write_features_csv.py View 1 chunk +248 lines, -35 lines 0 comments Download
M install-build-deps.sh View 1 chunk +1 line, -0 lines 0 comments Download

Depends on Patchset:

Messages

Total messages: 8 (3 generated)
wychen
PTAL
4 years, 9 months ago (2016-03-15 22:11:28 UTC) #2
wychen
No more local patches after https://codereview.chromium.org/1972503002/!
4 years, 7 months ago (2016-05-12 00:19:58 UTC) #4
wychen
PTAL The README.md file can be previewed here: https://github.com/wychen/dom-distiller/blob/ml-script/heuristics/distillable/README.md
4 years, 7 months ago (2016-05-12 20:21:34 UTC) #5
mdjones
Did a quick skim since this is an internal tool; lgtm
4 years, 4 months ago (2016-08-10 00:26:29 UTC) #6
wychen
4 years, 4 months ago (2016-08-10 01:52:55 UTC) #8
Message was sent while issue was closed.
Committed patchset #3 (id:40001) manually as
f8f3308f99ec3dcfa83420b304dec3cc083c9008 (presubmit successful).

Powered by Google App Engine
This is Rietveld 408576698