wiki:Notes/PythonMarkup

Python Markup

Meta Tags: #Python #Markup

Mostly python-creole, the little gem ...

https://pypi.python.org/pypi/python-creole

It's big and has a large community, but ...

https://pypi.python.org/pypi/Markdown


http://schema.org

Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.

Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model.

Over 10 million sites use Schema.org to markup their web pages and email messages. Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

http://schema.org/docs/documents.html

http://schema.org/docs/datamodel.html

https://schema.org/docs/gs.html

https://github.com/schemaorg/schemaorg

Python Wiki Engines

https://wiki.python.org/moin/PythonWikiEngines

Mediawiki Markup

PythonTrac and PythonMoin markup are largely compatible with Mediawiki markup.

mwparserfromhell

http://mwparserfromhell.readthedocs.org/en/latest/

https://github.com/earwig/mwparserfromhell/

mwparserfromhell (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.

It supports Python 2 and Python 3 ...

... originally developed for EarwigBot

Jan 2015

https://github.com/earwig/earwigbot

A Python robot that edits Wikipedia and interacts with people over IRC

https://github.com/earwig

Mediawiki Parser

https://github.com/peter17/mediawiki-parser

This is a parser for MediaWiki's (MW) syntax. It's goal is to transform wikitext into an abstract syntax tree (AST) and then render this AST into various formats such as plain text and HTML ...

You must install the latest version of Pijnu ...

Nov 2012

https://github.com/peter17/pijnu

Pijnu is a PEG parser generator and processor, written in Python, intended to be clear, easy, practical ...

Pijnu syntax is a custom, extended version of Parsing Expression Grammars (PEG); which itself is a kind of mix of BNF and regular expressions.

The major difference is that PEG is a grammar to express string recognition patterns, while BNF or regexp express string generation. As a consequence, PEG is better suited for parsing tasks. A PEG grammar clearly encodes the algorithm to parse a source string, that simply needs to be rewritten into a parser coded in a programming language.

April 2012

SMC.MW

https://github.com/lambdafu/smc.mw/

A MediaWiki parser for Python.

... requires=["lxml", "grako"] ...

Dec 2014

https://pypi.python.org/pypi/grako/3.5.1

A generator of PEG/Packrat parsers from EBNF grammars.

March 2015

http://pythonhosted.org//grako/

Grako (for grammar compiler) is a tool that takes grammars in a variation of EBNF as input, and outputs memoizing (Packrat) PEG parsers in Python.

Grako is different from other PEG parser generators:

Generated parsers use Python's very efficient exception-handling system to backtrack. Grako generated parsers simply assert what must be parsed. There are no complicated if-then-else sequences for decision making or backtracking. Memoization allows going over the same input sequence several times in linear time.

Positive and negative lookaheads, and the cut element (with its cleaning of the memoization cache) allow for additional, hand-crafted optimizations at the grammar level.

Delegation to Python's re module for lexemes allows for (Perl-like) powerful and efficient lexical analysis.

The use of Python's context managers considerably reduces the size of the generated parsers for code clarity, and enhanced CPU-cache hits.

Include files, rule inheritance, and rule inclusion give Grako grammars considerable expressive power.

Efficient support for direct and indirect left recursion allows for more intuitive grammars.

Grako, the runtime support, and the generated parsers have measurably low Cyclomatic complexity. At around 4.5 KLOC of Python, it is possible to study all its source code in a single session.

Grako's only dependencies are on the Python 2.7, 3.4, or PyPy 2.3 standard libraries.

PyWikiBot

https://www.mediawiki.org/wiki/Category:Pywikibot

https://www.mediawiki.org/wiki/Category:Pywikibot_scripts

https://www.mediawiki.org/wiki/Manual:Pywikibot

https://github.com/wikimedia/pywikibot-core

https://en.wikibooks.org/wiki/Pywikibot

https://pywikibot.readthedocs.org/en/latest/

https://en.wikiversity.org/wiki/Pywikipediabot

Pywikibot Nightlies - http://tools.wmflabs.org/pywikibot/

http://tools.wmflabs.org/pywikibot/core/

MWClient

https://github.com/mwclient/mwclient

mwclient is a lightweight Python client library to the MediaWiki API which provides access to most API functionality.

Wikitools

https://github.com/alexz-enwp/wikitools

api.py - Contains the APIRequest class, for doing queries directly, see API examples below

wiki.py - Contains the Wiki class, used for logging in to the site, storing cookies, and storing basic site information

page.py - Contains the Page class for dealing with individual pages on the wiki. Can be used to get page info and text, as well as edit and other actions if enabled on the wiki

category.py - Category is a subclass of Page with extra functions for working with categories

wikifile.py - File is a subclass of Page with extra functions for working with files - note that there may be some issues with shared repositories, as the pages for files on shared repos technically don't exist on the local wiki.

user.py - Contains the User class for getting information about and blocking/unblocking users

pagelist.py - Contains several functions for getting a list of Page objects from lists of titles, pageids, or API query results

https://code.google.com/p/python-wikitools/wiki/Documentation

Mediawiki API

http://www.mediawiki.org/wiki/API:Main_page

http://www.mediawiki.org/wiki/Category:MediaWiki_Development Page includes http://www.mediawiki.org/wiki/Semantic_Bundle and http://www.mediawiki.org/wiki/Markup_spec

http://www.mediawiki.org/wiki/Etherpad_index

Most Powerful Python Markup Packages

Embedded HTML from https://devcharm.com/articles/203/most-powerful-markup-packages-in-python/


Markup languages save you time by taking care of details of HTML/CSS and PDF generation. They can display an awesome variety of things: text, graphics, source code and some execute code while they do so.

Reports with plots

IPython - advanced Python shell and notebook tool that can handle everything from Python to plots, including **LaTeX math notation.

pyreport - generate reports from Python scripts combining text, code, and plots.

Formatting source code

pygments - powerful tool to parse source code from virtually all languages and formats it to HTML, RTF, ASCII or LaTeX.

HTML pages

Markdown - the elegant markup format used by devcharm.

docutils - parses reStructuredText markup and formats it to HTML.

markup.py - straightforward HTML/XML generator.

Parsing And Generating In General

Haven't gone through each one ... yet.

http://pyparsing.wikispaces.com/

https://pypi.python.org/pypi/parse

https://pypi.python.org/pypi/python-creole/

https://pypi.python.org/pypi/Markdown

https://pythonhosted.org/Markdown/extensions/index.html

https://www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Three features make it powerful:

Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. It doesn't take much code to write an application

Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't detect one. Then you just have to specify the original encoding.

Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility.


Also See

Wikis

Last modified 9 months ago Last modified on 03/13/2017 01:51:36 PM