Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update markdown to 3.7 #1463

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

pyup-bot
Copy link
Collaborator

This PR updates Markdown from 3.1.1 to 3.7.

Changelog

3.7

Changed

Refactor `abbr` Extension

A new `AbbrTreeprocessor` has been introduced, which replaces the now deprecated
`AbbrInlineProcessor`. Abbreviation processing now happens after Attribute Lists,
avoiding a conflict between the two extensions (1460).

The `AbbrPreprocessor` class has been renamed to `AbbrBlockprocessor`, which
better reflects what it is. `AbbrPreprocessor` has been deprecated.

A call to `Markdown.reset()` now clears all previously defined abbreviations.

Abbreviations are now sorted by length before executing `AbbrTreeprocessor`
to ensure that multi-word abbreviations are implemented even if an abbreviation
exists for one of those component words. (1465)

Abbreviations without a definition are now ignored. This avoids applying
abbr tags to text without a title value.

Added an optional `glossary` configuration option to the abbreviations extension.
This provides a simple and efficient way to apply a dictionary of abbreviations
to every page.

Abbreviations can now be disabled by setting their definition to `""` or `''`.
This can be useful when using the `glossary` option.


Fixed

* Fixed links to source code on GitHub from the documentation (1453).

3.6

Changed

Refactor TOC Sanitation

* All postprocessors are now run on heading content.
* Footnote references are now stripped from heading content. Fixes 660.
* A more robust `striptags` is provided to convert headings to plain text.
Unlike, the `markupsafe` implementation, HTML entities are not unescaped.
* The plain text `name`, rich `html`, and unescaped raw `data-toc-label` are
saved to `toc_tokens`, allowing users to access the full rich text content of
the headings directly from `toc_tokens`.
* The value of `data-toc-label` is sanitized separate from heading content
before being written to `name`. This fixes a bug which allowed markup through
in certain circumstances. To access the raw unsanitized data, retrieve the
value from `token['data-toc-label']` directly.
* An `html.unescape` call is made just prior to calling `slugify` so that
`slugify` only operates on Unicode characters. Note that `html.unescape` is
not run on `name`, `html`, or `data-toc-label`.
* The functions `get_name` and `stashedHTML2text` defined in the `toc` extension
are both **deprecated**. Instead, third party extensions should use some
combination of the new functions `run_postprocessors`, `render_inner_html` and
`striptags`.

Fixed

* Include `scripts/*.py` in the generated source tarballs (1430).
* Ensure lines after heading in loose list are properly detabbed (1443).
* Give smarty tree processor higher priority than toc (1440).
* Permit carets (`^`) and square brackets (`]`) but explicitly exclude
backslashes (`\`) from abbreviations (1444).
* In attribute lists (`attr_list`, `fenced_code`), quoted attribute values are
now allowed to contain curly braces (`}`) (1414).

3.5.2

Fixed

* Fix type annotations for `convertFile` - it accepts only bytes-based buffers.
Also remove legacy checks from Python 2 (1400)
* Remove legacy import needed only in Python 2 (1403)
* Fix typo that left the attribute `AdmonitionProcessor.content_indent` unset
(1404)
* Fix edge-case crash in `InlineProcessor` with `AtomicString` (1406).
* Fix edge-case crash in `codehilite` with an empty `code` tag (1405).
* Improve and expand type annotations in the code base (1401).
* Fix handling of bogus comments (1425).

3.5.1

Fixed

* Fix a performance problem with HTML extraction where large HTML input could
trigger quadratic line counting behavior (1392).
* Improve and expand type annotations in the code base (1394).

3.5

Added

Add `permalink_leading` configuration option to the toc extension (1339)

A new boolean option `permalink_leading` controls the position of the permanent
link anchors generated with `permalink`. Setting `permalink_leading` to `True`
will cause the links to be inserted at the start of the header, before any other
header content. The default behavior for `permalink` is to append permanent
links to the header, placing them after all other header content.

Changed

* Add support for cPython version 3.12 (and PyPy 3.10) and drop support for
Python version 3.7 (1357).
* Refactor changelog to use the format defined at <https://keepachangelog.com/>.
* Update the list of empty HTML tags (1353).
* Add customizable TOC title class to TOC extension (1293).
* Add API documentation of the code base which is generated by
[mkdocstrings](https://mkdocstrings.github.io/) (#1220).

Fixed

* Fix a corner case in admonitions where if an indented code block was provided
as the first block, the output would be malformed (1329).

3.4.4

Fixed

* Add a special case for initial `'s` to smarty extension (1305).
* Unescape any backslash escaped inline raw HTML (1358).
* Unescape backslash escaped TOC token names (1360).

3.4.3

Fixed

* Restore console script (1327).

3.4.2

Fixed
* Officially support Python 3.11.
* Improve standalone * and _ parsing (1300).
* Consider `<html>` HTML tag a block-level element (1309).
* Switch from `setup.py` to `pyproject.toml`.

3.4.1

Fixed

* Fix an import issue with `importlib.util` (1274).

3.4

Changed

The `tables` extension now uses a `style` attribute instead of an `align` attribute for alignment.

The [HTML4 spec](https://www.w3.org/TR/html4/present/graphics.html#h-15.1.2)
specifically deprecates the use of the `align` attribute and it does not appear
at all in the [HTML5
spec](https://www.w3.org/TR/html53/tabular-data.html#attributes-common-to-td-and-th-elements).
Therefore, by default, the [tables](extensions/tables.md) extension will now use
the `style` attribute (setting just the `text-align` property) in `td` and `th`
blocks.

The former behavior is available by setting the `use_align_attribute`
configuration option to `True` when enabling the extension.

For example, to configure the old `align` behavior:

python
from markdown.extensions.tables import TableExtension

markdown.markdown(src, extensions=[TableExtension(use_align_attribute=True)])


Backslash unescaping moved to Treeprocessor (1131).

Unescaping backslash escapes has been moved to a Treeprocessor, which  enables
proper HTML escaping during serialization. However, it is recognized that
various third-party extensions may be calling the old class at
`postprocessors.UnescapePostprocessor`. Therefore, the old class remains in the
code base, but has been deprecated and will be removed in a future release. The
new class `treeprocessors.UnescapeTreeprocessor` should be used instead.

Previously deprecated objects have been removed

Various objects were deprecated in version 3.0 and began raising deprecation
warnings (see the [version 3.0 release notes](30-2018-09-21) for details). Any of those objects
which remained in version 3.3 have been removed from the code base in version 3.4
and will now raise errors. The relevant objects are listed below.


| Deprecated Object                      | Replacement Object                  |
|----------------------------------------|-------------------------------------|
| `markdown.version`                     | `markdown.__version__`              |
| `markdown.version_info`                | `markdown.__version_info__`         |
| `markdown.util.etree`                  | `xml.etree.ElementTree`             |
| `markdown.util.string_type`            | `str`                               |
| `markdown.util.text_type`              | `str`                               |
| `markdown.util.int2str`                | `chr`                               |
| `markdown.util.iterrange`              | `range`                             |
| `markdown.util.isBlockLevel`           | `markdown.Markdown().is_block_level`|
| `markdown.util.Processor().markdown`   | `markdown.util.Processor().md`      |
| `markdown.util.Registry().__setitem__` | `markdown.util.Registry().register` |
| `markdown.util.Registry().__delitem__` |`markdown.util.Registry().deregister`|
| `markdown.util.Registry().add`         | `markdown.util.Registry().register` |

In addition, the `md_globals` parameter of
`Markdown.extensions.Extension.extendMarkdown()` is no longer recognized as a
valid parameter and will raise an error if provided.

Added

* Some new configuration options have been added to the
[footnotes](extensions/footnotes.md) extension (1218):

 * Small refactor of the `BACKLINK_TITLE` option; The use of `format()`
   instead of "old" `%d` formatter allows one to specify text without the
   need to have the number of the footnote in it (like footnotes on
   Wikipedia for example). The modification is backward compatible so no
   configuration change is required.

 * Addition of a new option `SUPERSCRIPT_TEXT` that allows one to specify a
   custom placeholder for the footnote itself in the text.
   Ex: `[{}]` will give `<sup>[1]</sup>`, `({})` will give `<sup>(1)</sup>`,
   or by default, the current behavior: `<sup>1</sup>`.

* The [Table of Contents](extensions/toc.md) extension now accepts a
`toc_class` parameter which can be used to set the CSS class(es) on the
`<div>` that contains the Table of Contents (1224).

* The CodeHilite extension now supports a `pygments_formatter` option that can
be set to a custom formatter class (1187).

 - If `pygments_formatter` is set to a string (ex: `'html'`), Pygments'
   default formatter by that name is used.
 - If `pygments_formatter` is set to a formatter class (or any callable
   which returns a formatter instance), then an instance of that class is
   used.

 The formatter class is now passed an additional option, `lang_str`, to
 denote the language of the code block (1258). While Pygments' built-in
 formatters will ignore the option, a custom formatter assigned to the
 `pygments_formatter` option can make use of the `lang_str` to include the
 code block's language in the output.

Fixed

* Extension entry-points are only loaded if needed (1216).
* Added additional checks to the `<pre><code>` handling of
`PrettifyTreeprocessor` (1261, 1263).
* Fix XML deprecation warnings.

3.3.7

Fixed

* Disallow square brackets in reference link ids (1209).
* Retain configured `pygments_style` after first code block (1240).
* Ensure fenced code attributes are properly escaped (1247).

3.3.6

Fixed

* Fix a dependency issue (1195, 1196).

3.3.5

Fixed

* Make the `slugify_unicode` function not remove diacritical marks (1118).
* Fix `[toc]` detection when used with `nl2br` extension (1160).
* Re-use compiled regex for block level checks (1169).
* Don't process shebangs in fenced code blocks when using CodeHilite (1156).
* Improve email address validation for Automatic Links (1165).
* Ensure `<summary>` tags are parsed correctly (1079).
* Support Python 3.10 (1124).

3.3.4

Fixed

* Properly parse unclosed tags in code spans (1066).
* Properly parse processing instructions in md_in_html (1070).
* Properly parse code spans in md_in_html (1069).
* Preserve text immediately before an admonition (1092).
* Simplified regex for HTML placeholders (928) addressing (932).
* Ensure `permalinks` and `anchorlinks` are not restricted by `toc_depth` (1107).
* Fix corner cases with lists under admonitions (1102).

3.3.3

Fixed

* Unify all block-level tags (1047).
* Fix issue where some empty elements would have text rendered as `None` when using `md_in_html` (1049).
* Avoid catastrophic backtracking in `hr` regex (1055).
* Fix `hr` HTML handling (1053).

3.3.2

Fixed

* Properly parse inline HTML in md_in_html (1040 & 1045).
* Avoid crashing when md_in_html fails (1040).

3.3.1

Fixed

* Correctly parse raw `script` and `style` tags (1036).
* Ensure consistent class handling by `fenced_code` and `codehilite` (1032).

3.3

Changed

The prefix `language-` is now prepended to all language classes by default on code blocks.

The [HTML5
spec](https://www.w3.org/TR/html5/text-level-semantics.html#the-code-element)
recommends that the class defining the language of a code block be prefixed with
`language-`. Therefore, by default, both the
[fenced_code](extensions/fenced_code_blocks.md) and
[codehilite](extensions/code_hilite.md) extensions now prepend the prefix when
code highlighting is disabled.

If you have previously been including the prefix manually in your fenced code blocks, then you will not want a second
instance of the prefix. Similarly, if you are using a third party syntax highlighting tool which does not recognize
the prefix, or requires a different prefix, then you will want to redefine the prefix globally using the `lang_prefix`
configuration option of either the `fenced_code` or `codehilite` extensions.

For example, to configure `fenced_code` to not apply any prefix (the previous behavior), set the option to an empty string:

python
from markdown.extensions.fenced_code import FencedCodeExtension

markdown.markdown(src, extensions=[FencedCodeExtension(lang_prefix='')])


!!! note
 When code highlighting is
 [enabled](extensions/fenced_code_blocks.mdenabling-syntax-highlighting),
 the output from Pygments is used unaltered. Currently, Pygments does not
 provide an option to include the language class in the output, let alone
 prefix it. Therefore, any language prefix is only applied when syntax
 highlighting is disabled.

Attribute Lists are more strict (898).

Empty curly braces are now completely ignored by the [Attribute List](extensions/attr_list.md) extension. Previously, the extension would
recognize them as attribute lists and remove them from the document. Therefore, it is no longer necessary to backslash
escape a set of curly braces which are empty or only contain whitespace.

Despite not being documented, previously an attribute list could be defined anywhere within a table cell and get
applied to the cell (`<td>` element). Now the attribute list must be defined at the end of the cell content and must
be separated from the rest of the content by at least one space. This makes it easy to differentiate between attribute
lists defined on inline elements within a cell and the attribute list for the cell itself. It is also more consistent
with how attribute lists are defined on other types of elements.

The extension has also added support for defining attribute lists on table header cells (`<th>` elements) in the same
manner as data cells (`<td>` elements).

In addition, the documentation for the extensions received an overhaul. The features (987) and limitations (965) of the extension are now fully documented.

Added

* All Pygments' options are now available for syntax highlighting (816).
 - The [Codehilite](extensions/code_hilite.md) extension now accepts any options
   which Pygments supports as global configuration settings on the extension.
 - [Fenced Code Blocks](extensions/fenced_code_blocks.md) will accept any of the
   same options on individual code blocks.
 - Any of the previously supported aliases to Pygments' options continue to be
   supported at this time. However, it is recommended that the Pygments option names
   be used directly to ensure continued compatibility in the future.

* [Fenced Code Blocks](extensions/fenced_code_blocks.md) now work with
[Attribute Lists](extensions/attr_list.md) when syntax highlighting is disabled.
Any random HTML attribute can be defined and set on the `<code>` tag of fenced code
blocks when the `attr_list` extension is enabled (816).

* The HTML parser has been completely replaced. The new HTML parser is built on Python's
[`html.parser.HTMLParser`](https://docs.python.org/3/library/html.parser.html), which
alleviates various bugs and simplify maintenance of the code (803, 830).

* The [Markdown in HTML](extensions/md_in_html.md) extension has been rebuilt on the
new HTML Parser, which drastically simplifies it. Note that raw HTML elements with a
`markdown` attribute defined are now converted to ElementTree Elements and are rendered
by the serializer. Various bugs have been fixed (803, 595, 780, and 1012).

* Link reference parsing, abbreviation reference parsing and footnote reference parsing
has all been moved from `preprocessors` to `blockprocessors`, which allows them to be
nested within other block level elements. Specifically, this change was necessary to
maintain the current behavior in the rebuilt Markdown in HTML extension. A few random
edge-case bugs (see the included tests) were resolved in the process (803).

* An alternate function `markdown.extensions.headerid.slugify_unicode` has been included
with the [Table of Contents](extensions/toc.md) extension which supports Unicode
characters in table of contents slugs. The old `markdown.extensions.headerid.slugify`
method which removes non-ASCII characters remains the default. Import and pass
`markdown.extensions.headerid.slugify_unicode` to the `slugify` configuration option
to use the new behavior.

* Support was added for Python 3.9 and dropped for Python 3.5.

Fixed

* Document how to pass configuration options to Extra (1019).
* Fix HR which follows strong em (897).
* Support short reference image links (894).
* Avoid a `RecursionError` from deeply nested blockquotes (799).
* Fix issues with complex emphasis (979).
* Fix unescaping of HTML characters `<>` in CodeHilite (990).
* Fix complex scenarios involving lists and admonitions (1004).
* Fix complex scenarios with nested ordered and unordered lists in a definition list (918).

3.2.2

Fixed

* Add `checklinks` tox environment to ensure all links in documentation are good.
* Refactor extension API documentation (729).
* Load entry_points (for extensions) only once using `importlib.metadata`.
* Do not double escape entities in TOC.
* Correctly report if an extension raises a `TypeError` (939).
* Raise a `KeyError` when attempting to delete a nonexistent key from the
extension registry (939).
* Remove import of `packaging` (or `pkg_resources` fallback) entirely.
* Remove `setuptools` as a run-time dependency (`install_required`).

3.2.1

Fixed

* The `name` property in `toc_tokens` from the TOC extension now
escapes HTML special characters (`<`, `>`, and `&`).

3.2

Changed

Drop support for Python 2.7

Python 2.7 reaches end-of-life on 2020-01-01 and Python-Markdown 3.2 has dropped
support for it. Please upgrade to Python 3, or use Python-Markdown 3.1.

`em` and `strong` inline processor changes

In order to fix issue 792, `em`/`strong` inline processors were refactored. This
translated into removing many of the existing inline processors that handled this
logic:

* `em_strong`
* `strong`
* `emphasis`
* `strong2`
* `emphasis`

These processors were replaced with two new ones:

* `em_strong`
* `em_strong2`

The [`legacy_em`](extensions/legacy_em.md) extension was also modified with new,
refactored logic and simply overrides the `em_strong2` inline processor.

CodeHilite now always wraps with `<code>` tags

Before, the HTML generated by CodeHilite looked like:
- `<pre><code>foo = 'bar'</code></pre>` if you **were not** using Pygments.
- `<pre>foo = 'bar'</pre>`  if you **were** using Pygments.

To make the cases more consistent (and adhere to many Markdown specifications and
HTML code block markup suggestions), CodeHilite will now always additionally wrap
code with `<code>` tags. See 862 for more details.

This change does not alter the Python-Markdown API, but users relying on the old
markup will find their output now changed.

Internally, this change relies on the Pygments 2.4, so you must be using at least
that version to see this effect. Users with earlier Pygments versions will
continue to see the old behavior.

`markdown.util.etree` deprecated

Previously, Python-Markdown was using either the `xml.etree.cElementTree` module
or the `xml.etree.ElementTree` module, based on their availability. In modern
Python versions, the former is a deprecated alias for the latter. Thus, the
compatibility layer is deprecated and extensions are advised to use
`xml.etree.ElementTree` directly. Importing `markdown.util.etree` will raise
a `DeprecationWarning` beginning in version 3.2 and may be removed in a future
release.

Therefore, extension developers are encouraged to replace
`from markdown.util import etree` with
`import xml.etree.ElementTree as etree` in their code.

Added

* Some new configuration options have been added to the [toc](extensions/toc.md)
extension:

 * The `anchorlink_class` and `permalink_class` options allow class(es) to be
   assigned to the `anchorlink` and `permalink` respectively. This allows using
   icon fonts from CSS for the links. Therefore, an empty string passed to
   `permalink` now generates an empty `permalink`. Previously no `permalink`
   would have been generated. (776)

 * The `permalink_title` option allows the title attribute of a `permalink` to be
   set to something other than the default English string `Permanent link`. (877)

* Document thread safety (812).

* Markdown parsing in HTML has been exposed via a separate extension called
[`md_in_html`](extensions/md_in_html.md).

* Add support for Python 3.8.

Fixed

* HTML tag placeholders are no longer included in  `.toc_tokens` (899).
* Unescape backslash-escaped characters in TOC ids (864).
* Refactor bold and italic logic in order to solve complex nesting issues (792).
* Always wrap CodeHilite code in `code` tags (862).
Links

@pyup-bot pyup-bot mentioned this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant