Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(50)

Issue 240073007: recognize and parse Schema.org Markup (Closed)

Created:
6 years, 8 months ago by kuan
Modified:
6 years, 7 months ago
Reviewers:
cjhopman
Base URL:
https://code.google.com/p/dom-distiller/@master
Visibility:
Public.

Description

recognize and parse Schema.org Markup 1) webpages tags are marked up with Schema.org vocabulary, along with microdata format, using the following attributes in any element, including <html> tag: a) itemscope: the element and its children are about a particular item b) itemtype: the type of item as defined by Schema.org c) itemprop: the property name as defined by itemtype types currently supported: ImageObject, Article, Person, Organization. 2) implement SchemaOrgParser - for each supported type, the class recognizes and parses each element as per specs in (1) in a one-time parsing during object instantiation - if a type is unsupported, its children are still parsed for top-level (i.e. non-itemprop) items, but not its itemprop attributes. 3) implement SchemaOrgParserAccessor - impl MarkupParser.Parser interface for SchemaOrgParser, added to list of parsers - add tests to test interface by accessing SchemaOrgParser. BUG=364356 R=cjhopman@chromium.org Committed: 8fe6964

Patch Set 1 #

Total comments: 18

Patch Set 2 : addressed all comments #

Total comments: 14

Patch Set 3 : addressed comments, impl some more specs #

Patch Set 4 : rm dbg info #

Patch Set 5 : fixed bug with nested types, added test #

Patch Set 6 : fix to not parse "itemprop" for unsupported parent #

Patch Set 7 : fine-tune prev bug fix #

Total comments: 21

Patch Set 8 : addressed all comments #

Patch Set 9 : addressed missed-out comments #

Total comments: 22

Patch Set 10 : addressed comments #

Patch Set 11 : rm unused props in image, mv fn to class-level #

Patch Set 12 : rm 1 more unused prop in image #

Total comments: 4

Patch Set 13 : addressed comments #

Unified diffs Side-by-side diffs Delta from patch set Stats (+1244 lines, -2 lines) Patch
M src/com/dom_distiller/client/MarkupParser.java View 1 2 3 4 5 6 7 8 1 chunk +2 lines, -2 lines 0 comments Download
A src/com/dom_distiller/client/SchemaOrgParser.java View 1 2 3 4 5 6 7 8 9 10 11 12 1 chunk +470 lines, -0 lines 0 comments Download
A src/com/dom_distiller/client/SchemaOrgParserAccessor.java View 1 2 3 4 5 6 7 8 9 10 11 12 1 chunk +170 lines, -0 lines 0 comments Download
A test/com/dom_distiller/client/SchemaOrgParserAccessorTest.java View 1 2 3 4 5 6 7 8 9 10 11 12 1 chunk +596 lines, -0 lines 0 comments Download
M test/com/dom_distiller/client/TestUtil.java View 1 chunk +6 lines, -0 lines 0 comments Download

Messages

Total messages: 15 (0 generated)
kuan
chris, based on your reviews for the IEReadingViewParser cl about using non-map approach to store ...
6 years, 8 months ago (2014-04-17 16:19:15 UTC) #1
cjhopman
https://codereview.chromium.org/240073007/diff/1/src/com/dom_distiller/client/MarkupParser.java File src/com/dom_distiller/client/MarkupParser.java (right): https://codereview.chromium.org/240073007/diff/1/src/com/dom_distiller/client/MarkupParser.java#newcode148 src/com/dom_distiller/client/MarkupParser.java:148: if (schema != null) mParsers.add(schema); new Foo() never returns ...
6 years, 8 months ago (2014-04-17 17:35:26 UTC) #2
kuan
i've addressed all comments in patch set 2. ptal. thx. https://codereview.chromium.org/240073007/diff/1/src/com/dom_distiller/client/MarkupParser.java File src/com/dom_distiller/client/MarkupParser.java (right): https://codereview.chromium.org/240073007/diff/1/src/com/dom_distiller/client/MarkupParser.java#newcode148 ...
6 years, 8 months ago (2014-04-18 00:19:03 UTC) #3
cjhopman
https://codereview.chromium.org/240073007/diff/1/src/com/dom_distiller/client/SchemaOrgParser.java File src/com/dom_distiller/client/SchemaOrgParser.java (right): https://codereview.chromium.org/240073007/diff/1/src/com/dom_distiller/client/SchemaOrgParser.java#newcode266 src/com/dom_distiller/client/SchemaOrgParser.java:266: for (int i = 0; i < allElems.getLength(); i++) ...
6 years, 8 months ago (2014-04-18 01:17:01 UTC) #4
kuan
i've addressed all comments, plus implemented some more of the specs: - multiple names in ...
6 years, 8 months ago (2014-04-18 23:34:37 UTC) #5
kuan
i discovered a bug with nested types, fixed it and added test in patch set ...
6 years, 8 months ago (2014-04-19 15:01:58 UTC) #6
cjhopman
https://codereview.chromium.org/240073007/diff/120001/src/com/dom_distiller/client/SchemaOrgParser.java File src/com/dom_distiller/client/SchemaOrgParser.java (right): https://codereview.chromium.org/240073007/diff/120001/src/com/dom_distiller/client/SchemaOrgParser.java#newcode36 src/com/dom_distiller/client/SchemaOrgParser.java:36: public class SchemaOrgParser implements MarkupParser.Parser { Can we split ...
6 years, 8 months ago (2014-04-21 16:52:22 UTC) #7
kuan
i've addressed all comments, ptal at delta between patch sets 9 and 7. thx. in ...
6 years, 8 months ago (2014-04-23 15:32:35 UTC) #8
kuan
just a gentle reminder...
6 years, 8 months ago (2014-04-25 17:33:01 UTC) #9
cjhopman
https://codereview.chromium.org/240073007/diff/120001/src/com/dom_distiller/client/SchemaOrgParser.java File src/com/dom_distiller/client/SchemaOrgParser.java (right): https://codereview.chromium.org/240073007/diff/120001/src/com/dom_distiller/client/SchemaOrgParser.java#newcode321 src/com/dom_distiller/client/SchemaOrgParser.java:321: // If <a> or <link> tags specify rel="author", extract ...
6 years, 8 months ago (2014-04-25 20:52:34 UTC) #10
kuan
i've addressed all comments in patch set 10. ptal. thx. https://codereview.chromium.org/240073007/diff/160001/src/com/dom_distiller/client/SchemaOrgParser.java File src/com/dom_distiller/client/SchemaOrgParser.java (right): https://codereview.chromium.org/240073007/diff/160001/src/com/dom_distiller/client/SchemaOrgParser.java#newcode73 ...
6 years, 7 months ago (2014-04-29 00:23:10 UTC) #11
cjhopman
https://codereview.chromium.org/240073007/diff/160001/src/com/dom_distiller/client/SchemaOrgParser.java File src/com/dom_distiller/client/SchemaOrgParser.java (right): https://codereview.chromium.org/240073007/diff/160001/src/com/dom_distiller/client/SchemaOrgParser.java#newcode200 src/com/dom_distiller/client/SchemaOrgParser.java:200: sTagAttributesMap.put("A", new String[] { "HREF", "REL" }); On 2014/04/29 ...
6 years, 7 months ago (2014-04-29 17:04:18 UTC) #12
kuan
i've addressed all comments in patch set 13. ptal. thx. https://codereview.chromium.org/240073007/diff/160001/src/com/dom_distiller/client/SchemaOrgParser.java File src/com/dom_distiller/client/SchemaOrgParser.java (right): https://codereview.chromium.org/240073007/diff/160001/src/com/dom_distiller/client/SchemaOrgParser.java#newcode200 ...
6 years, 7 months ago (2014-04-29 23:26:42 UTC) #13
cjhopman
lgtm
6 years, 7 months ago (2014-04-29 23:47:07 UTC) #14
kuan
6 years, 7 months ago (2014-04-30 04:55:45 UTC) #15
Message was sent while issue was closed.
Committed patchset #13 manually as r8fe6964 (presubmit successful).

Powered by Google App Engine
This is Rietveld 408576698