Python Markup

Meta Tags: #Python #Markup

Mostly python-creole, the little gem ...

It's big and has a large community, but ... is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model.

Over 10 million sites use to markup their web pages and email messages. Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

Python Wiki Engines

Mediawiki Markup

PythonTrac and PythonMoin markup are largely compatible with Mediawiki markup.


mwparserfromhell (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.

It supports Python 2 and Python 3 ...

... originally developed for EarwigBot

Jan 2015

A Python robot that edits Wikipedia and interacts with people over IRC

Mediawiki Parser

This is a parser for MediaWiki's (MW) syntax. It's goal is to transform wikitext into an abstract syntax tree (AST) and then render this AST into various formats such as plain text and HTML ...

You must install the latest version of Pijnu ...

Nov 2012

Pijnu is a PEG parser generator and processor, written in Python, intended to be clear, easy, practical ...

Pijnu syntax is a custom, extended version of Parsing Expression Grammars (PEG); which itself is a kind of mix of BNF and regular expressions.

The major difference is that PEG is a grammar to express string recognition patterns, while BNF or regexp express string generation. As a consequence, PEG is better suited for parsing tasks. A PEG grammar clearly encodes the algorithm to parse a source string, that simply needs to be rewritten into a parser coded in a programming language.

April 2012


A MediaWiki parser for Python.

... requires=["lxml", "grako"] ...

Dec 2014

A generator of PEG/Packrat parsers from EBNF grammars.

March 2015

Grako (for grammar compiler) is a tool that takes grammars in a variation of EBNF as input, and outputs memoizing (Packrat) PEG parsers in Python.

Grako is different from other PEG parser generators:

Generated parsers use Python's very efficient exception-handling system to backtrack. Grako generated parsers simply assert what must be parsed. There are no complicated if-then-else sequences for decision making or backtracking. Memoization allows going over the same input sequence several times in linear time.

Positive and negative lookaheads, and the cut element (with its cleaning of the memoization cache) allow for additional, hand-crafted optimizations at the grammar level.

Delegation to Python's re module for lexemes allows for (Perl-like) powerful and efficient lexical analysis.

The use of Python's context managers considerably reduces the size of the generated parsers for code clarity, and enhanced CPU-cache hits.

Include files, rule inheritance, and rule inclusion give Grako grammars considerable expressive power.

Efficient support for direct and indirect left recursion allows for more intuitive grammars.

Grako, the runtime support, and the generated parsers have measurably low Cyclomatic complexity. At around 4.5 KLOC of Python, it is possible to study all its source code in a single session.

Grako's only dependencies are on the Python 2.7, 3.4, or PyPy 2.3 standard libraries.


Pywikibot Nightlies -


mwclient is a lightweight Python client library to the MediaWiki API which provides access to most API functionality.

Wikitools - Contains the APIRequest class, for doing queries directly, see API examples below - Contains the Wiki class, used for logging in to the site, storing cookies, and storing basic site information - Contains the Page class for dealing with individual pages on the wiki. Can be used to get page info and text, as well as edit and other actions if enabled on the wiki - Category is a subclass of Page with extra functions for working with categories - File is a subclass of Page with extra functions for working with files - note that there may be some issues with shared repositories, as the pages for files on shared repos technically don't exist on the local wiki. - Contains the User class for getting information about and blocking/unblocking users - Contains several functions for getting a list of Page objects from lists of titles, pageids, or API query results

Mediawiki API Page includes and

Most Powerful Python Markup Packages

Embedded HTML from

Markup languages save you time by taking care of details of HTML/CSS and PDF generation. They can display an awesome variety of things: text, graphics, source code and some execute code while they do so.

Reports with plots

IPython - advanced Python shell and notebook tool that can handle everything from Python to plots, including **LaTeX math notation.

pyreport - generate reports from Python scripts combining text, code, and plots.

Formatting source code

pygments - powerful tool to parse source code from virtually all languages and formats it to HTML, RTF, ASCII or LaTeX.

HTML pages

Markdown - the elegant markup format used by devcharm.

docutils - parses reStructuredText markup and formats it to HTML. - straightforward HTML/XML generator.

Parsing And Generating In General

Haven't gone through each one ... yet.

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Three features make it powerful:

Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. It doesn't take much code to write an application

Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't detect one. Then you just have to specify the original encoding.

Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility.

Also See


Last modified 7 weeks ago Last modified on 03/13/2017 01:51:36 PM