Kaydet (Commit) dc44f55c authored tarafından Ezio Melotti's avatar Ezio Melotti

#11113: add a new "html5" dictionary containing the named character references…

#11113: add a new "html5" dictionary containing the named character references defined by the HTML5 standard and the equivalent Unicode character(s) to the html.entities module.
üst b698d8e7
...@@ -9,13 +9,25 @@ ...@@ -9,13 +9,25 @@
-------------- --------------
This module defines three dictionaries, ``name2codepoint``, ``codepoint2name``, This module defines four dictionaries, :data:`html5`,
and ``entitydefs``. ``entitydefs`` is used to provide the :attr:`entitydefs` :data:`name2codepoint`, :data:`codepoint2name`, and :data:`entitydefs`.
:data:`entitydefs` is used to provide the :attr:`entitydefs`
attribute of the :class:`html.parser.HTMLParser` class. The definition provided attribute of the :class:`html.parser.HTMLParser` class. The definition provided
here contains all the entities defined by XHTML 1.0 that can be handled using here contains all the entities defined by XHTML 1.0 that can be handled using
simple textual substitution in the Latin-1 character set (ISO-8859-1). simple textual substitution in the Latin-1 character set (ISO-8859-1).
.. data:: html5
A dictionary that maps HTML5 named character references [#]_ to the
equivalent Unicode character(s), e.g. ``html5['gt;'] == '>'``.
Note that the trailing semicolon is included in the name (e.g. ``'gt;'``),
however some of the names are accepted by the standard even without the
semicolon: in this case the name is present with and without the ``';'``.
.. versionadded:: 3.3
.. data:: entitydefs .. data:: entitydefs
A dictionary mapping XHTML 1.0 entity definitions to their replacement text in A dictionary mapping XHTML 1.0 entity definitions to their replacement text in
...@@ -30,3 +42,8 @@ simple textual substitution in the Latin-1 character set (ISO-8859-1). ...@@ -30,3 +42,8 @@ simple textual substitution in the Latin-1 character set (ISO-8859-1).
.. data:: codepoint2name .. data:: codepoint2name
A dictionary that maps Unicode codepoints to HTML entity names. A dictionary that maps Unicode codepoints to HTML entity names.
.. rubric:: Footnotes
.. [#] See http://www.w3.org/TR/html5/named-character-references.html
This diff is collapsed.
...@@ -54,6 +54,10 @@ Library ...@@ -54,6 +54,10 @@ Library
It is used automatically on platforms supporting the necessary os.openat() It is used automatically on platforms supporting the necessary os.openat()
and os.unlinkat() functions. Main code by Martin von Löwis. and os.unlinkat() functions. Main code by Martin von Löwis.
- Issue #11113: add a new "html5" dictionary containing the named character
references defined by the HTML5 standard and the equivalent Unicode
character(s) to the html.entities module.
- Issue #15114: the strict mode of HTMLParser and the HTMLParseError exception - Issue #15114: the strict mode of HTMLParser and the HTMLParseError exception
are deprecated now that the parser is able to parse invalid markup. are deprecated now that the parser is able to parse invalid markup.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment