third_party/WebKit/Tools/Scripts/webkitpy/thirdparty/wpt/wpt/tools/html5lib/README.rst - Issue 2635033005: Add html5lib to the list of modules in webkitpy/thirdparty/wpt/wpt/tools.

Unified Diff: third_party/WebKit/Tools/Scripts/webkitpy/thirdparty/wpt/wpt/tools/html5lib/README.rst

Issue 2635033005: Add html5lib to the list of modules in webkitpy/thirdparty/wpt/wpt/tools. (Closed)

Patch Set: Update webkitpy/thirdparty/README.chromium. Created 3 years, 11 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

« no previous file with comments | « third_party/WebKit/Tools/Scripts/webkitpy/thirdparty/wpt/wpt/tools/html5lib/LICENSE ('k') | third_party/WebKit/Tools/Scripts/webkitpy/thirdparty/wpt/wpt/tools/html5lib/html5lib/__init__.py » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

Index: third_party/WebKit/Tools/Scripts/webkitpy/thirdparty/wpt/wpt/tools/html5lib/README.rst

diff --git a/third_party/WebKit/Tools/Scripts/webkitpy/thirdparty/wpt/wpt/tools/html5lib/README.rst b/third_party/WebKit/Tools/Scripts/webkitpy/thirdparty/wpt/wpt/tools/html5lib/README.rst

new file mode 100644

index 0000000000000000000000000000000000000000..9e0a0f740485b1901b162f0ba59613810a56cee7

--- /dev/null

+++ b/third_party/WebKit/Tools/Scripts/webkitpy/thirdparty/wpt/wpt/tools/html5lib/README.rst

@@ -0,0 +1,157 @@

+html5lib

+========

+.. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master

+ :target: https://travis-ci.org/html5lib/html5lib-python

+html5lib is a pure-python library for parsing HTML. It is designed to

+conform to the WHATWG HTML specification, as is implemented by all major

+web browsers.

+Usage

+-----

+Simple usage follows this pattern:

+.. code-block:: python

+ import html5lib

+ with open("mydocument.html", "rb") as f:

+ document = html5lib.parse(f)

+or:

+.. code-block:: python

+ import html5lib

+ document = html5lib.parse("<p>Hello World!")

+By default, the ``document`` will be an ``xml.etree`` element instance.

+Whenever possible, html5lib chooses the accelerated ``ElementTree``

+implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).

+Two other tree types are supported: ``xml.dom.minidom`` and

+``lxml.etree``. To use an alternative format, specify the name of

+a treebuilder:

+.. code-block:: python

+ import html5lib

+ with open("mydocument.html", "rb") as f:

+ lxml_etree_document = html5lib.parse(f, treebuilder="lxml")

+When using with ``urllib2`` (Python 2), the charset from HTTP should be

+pass into html5lib as follows:

+.. code-block:: python

+ from contextlib import closing

+ from urllib2 import urlopen

+ import html5lib

+ with closing(urlopen("http://example.com/")) as f:

+ document = html5lib.parse(f, encoding=f.info().getparam("charset"))

+When using with ``urllib.request`` (Python 3), the charset from HTTP

+should be pass into html5lib as follows:

+.. code-block:: python

+ from urllib.request import urlopen

+ import html5lib

+ with urlopen("http://example.com/") as f:

+ document = html5lib.parse(f, encoding=f.info().get_content_charset())

+To have more control over the parser, create a parser object explicitly.

+For instance, to make the parser raise exceptions on parse errors, use:

+.. code-block:: python

+ import html5lib

+ with open("mydocument.html", "rb") as f:

+ parser = html5lib.HTMLParser(strict=True)

+ document = parser.parse(f)

+When you're instantiating parser objects explicitly, pass a treebuilder

+class as the ``tree`` keyword argument to use an alternative document

+format:

+.. code-block:: python

+ import html5lib

+ parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))

+ minidom_document = parser.parse("<p>Hello World!")

+More documentation is available at http://html5lib.readthedocs.org/.

+Installation

+------------

+html5lib works on CPython 2.6+, CPython 3.2+ and PyPy. To install it,

+use:

+.. code-block:: bash

+ $ pip install html5lib

+Optional Dependencies

+---------------------

+The following third-party libraries may be used for additional

+functionality:

+- ``datrie`` can be used to improve parsing performance (though in

+ almost all cases the improvement is marginal);

+- ``lxml`` is supported as a tree format (for both building and

+ walking) under CPython (but *not* PyPy where it is known to cause

+ segfaults);

+- ``genshi`` has a treewalker (but not builder); and

+- ``charade`` can be used as a fallback when character encoding cannot

+ be determined; ``chardet``, from which it was forked, can also be used

+ on Python 2.

+- ``ordereddict`` can be used under Python 2.6

+ (``collections.OrderedDict`` is used instead on later versions) to

+ serialize attributes in alphabetical order.

+Bugs

+----

+Please report any bugs on the `issue tracker

+<https://github.com/html5lib/html5lib-python/issues>`_.

+Tests

+-----

+Unit tests require the ``nose`` library and can be run using the

+``nosetests`` command in the root directory; ``ordereddict`` is

+required under Python 2.6. All should pass.

+Test data are contained in a separate `html5lib-tests

+<https://github.com/html5lib/html5lib-tests>`_ repository and included

+as a submodule, thus for git checkouts they must be initialized::

+ $ git submodule init

+ $ git submodule update

+If you have all compatible Python implementations available on your

+system, you can run tests on all of them using the ``tox`` utility,

+which can be found on PyPI.

+Questions?

+----------

+There's a mailing list available for support on Google Groups,

+`html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,

+though you may get a quicker response asking on IRC in `#whatwg on

+irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_.