heuristics/distillable/README.md - Issue 1728863002: Reformat README.md to Google style

Unified Diff: heuristics/distillable/README.md

Issue 1728863002: Reformat README.md to Google style (Closed) Base URL: git@github.com:chromium/dom-distiller.git@master

Patch Set: Created 4 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: heuristics/distillable/README.md

diff --git a/heuristics/distillable/README.md b/heuristics/distillable/README.md

index 685c6f7c6cc2d99e2384f4735b60bc782ac24b51..6f4a1571e4fedd3ce8e3046a3a1ab13f1eba830c 100644

--- a/heuristics/distillable/README.md

+++ b/heuristics/distillable/README.md

@@ -21,12 +21,11 @@ short list for dry run.

## Data preparation for labeling

-Use ```get_screenshots.py``` to generate the screenshots of the original and

-distilled web page, and extract the features by running

-```extract_features.js```. You can see how it works by running the following

-command.

+Use `get_screenshots.py` to generate the screenshots of the original and

+distilled web page, and extract the features by running `extract_features.js`.

+You can see how it works by running the following command.

-```

+```bash

./get_screenshots.py --out out_dir --urls-file urls.txt

```

@@ -34,22 +33,22 @@ If everything goes fine, run it inside xvfb. Specifying the screen resolution

makes the size of the screenshots consistent. It also prevent the Chrome window

from interrupting your work on the main monitor.

-```

+```bash

xvfb-run -a -s "-screen 0 1600x5000x24" ./get_screenshots.py --out out_dir --urls-file urls.txt

```

One entry takes about 30 seconds. Depending on the number of entries, it could

-be a lengthy process. If it is interrupted, you could use option ```--resume```

-to continue.

+be a lengthy process. If it is interrupted, you could use option `--resume` to

+continue.

-```

+```bash

xvfb-run -a -s "-screen 0 1600x5000x24" ./get_screenshots.py --out out_dir --urls-file urls.txt --resume

```

Running multiple instances concurrently is recommended if the list is long

enough. You can create a Makefile like this:

-```

+```make

ALL=$(addsuffix .target,$(shell seq 1000))

all: $(ALL)

@@ -58,23 +57,23 @@ all: $(ALL)

xvfb-run -a -s "-screen 0 1600x5000x24" ./get_screenshots.py --out out_dir --urls-file urls.txt --resume

```

-And then run ```make -j20```. Adjust the parallelism according to how beefy your

+And then run `make -j20`. Adjust the parallelism according to how beefy your

machine is.

A small proportion of URLs would time out, or fail for some other reasons. When

-you've collected enough data, run the command again with option

-```--write-index``` to export data for the next stage.

+you've collected enough data, run the command again with option `--write-index`

+to export data for the next stage.

-```

+```bash

./get_screenshots.py --out out_dir --urls-file urls.txt --resume --write-index

```

## Labeling

-Use ```server.py``` to serve the web site for data labeling. Human effort is

-around 10~20 seconds per entry.

+Use `server.py` to serve the web site for data labeling. Human effort is around

+10~20 seconds per entry.

-```

+```bash

./server.py --data-dir out_dir

```

@@ -86,20 +85,20 @@ It should print something like:

Then visit that address in your browser.

-The labels would be written to ```out_dir/archive/``` periodically.

+The labels would be written to `out_dir/archive/` periodically.

## Data preparation for training

-In the step with ```--write-index```, ```get_screenshots.py``` writes the

-extracted raw features to ```out_dir/feature```. We can use

-```calculate_derived_features.py``` to convert it to the final derived features.

+In the step with `--write-index`, `get_screenshots.py` writes the extracted raw

+features to `out_dir/feature`. We can use `calculate_derived_features.py` to

+convert it to the final derived features.

-```

+```bash

./calculate_derived_features.py --core out_dir/feature --out derived.txt

```

-Then use ```write_features_csv.py``` to combine with the label.

+Then use `write_features_csv.py` to combine with the label.

-```

+```bash

./write_features_csv.py --marked $(ls -rt out_dir/archive/*|tail -n1) --features derived.txt --out labeled

```

« no previous file with comments | « README.md ('k') | no next file » | no next file with comments »