mdpo.md2po package

Markdown to PO files extractor according to mdpo specification.

class mdpo.md2po.Md2Po(files_or_content, **kwargs)

Bases: object

Markdown to PO files extractor.

This class is where all the extraction process is carried out. If you are executing custom extraction events, you may want to read the documentation about the properties of this class to properly control the internal state of the parser.

Example

If you want to extract all “Foo” messages as “Bar”, regardless of the content of the Markdown input, you could do something like:

def transform_foo(self, block, text):
    if text == 'Foo':
        self.current_msgid = 'Bar'  # self is Md2Po
        return False

markdown_to_pofile('Foo', events={'text': transform_foo})

The public internal properties of this class are documented below:

_codespan_backticks
_codespan_start_index
_current_aspan_ref_target
_current_aspan_text
_current_imgspan
_current_markdown_filepath
_current_top_level_block_number
_current_top_level_block_type
_enterspan_replacer
_inside_aspan
_inside_codeblock
_inside_codespan
_inside_hblock
_inside_htmlblock
_inside_latexmath_display
_inside_latexmath_display_text
_inside_liblock
_inside_olblock
_inside_pblock
_inside_uspan
_leavespan_replacer
_process_command(text)
_quoteblocks_deep
_save_current_msgid(msgstr='', fuzzy=False)
_save_msgid(msgid, msgstr='', tcomment=None, msgctxt=None, fuzzy=False)
_saved_files_changed
_uls_deep
bold_end_string
bold_start_string
code_end_string
code_end_string_escaped
code_start_string
code_start_string_escaped
command(mdpo_command, comment, original_command)
command_aliases
content
current_msgctxt

Context message that will be saved in the next message.

Type:

str

current_msgid

The msgid being currently built for the next message entry. Keep in mind that, if you are executing an event that will be followed by an span one (enter_span or exit_span), the content of the msgid will change before save it.

Type:

str

current_tcomment

Translator comment that will be saved in the next message.

Type:

str

disable

Indicates if the extractor is currently disabled, which happens after a <!-- mdpo-disable --> command is found, before any subsecuents <!-- mdpo-enable --> commands.

Type:

bool

disable_next_block

Indicates if the next block will be extracted.

Type:

bool

disable_next_codeblock

Indicates if the next codeblock will be extracted when include_codeblocks is disabled.

Type:

bool

disabled_entries

Not extracted entries because the extractor has been disabled while processing them.

Type:

list

enable_next_block

Indicates if the next block will be extracted when the extractor is disabled (disable is True).

Type:

bool

enter_block(block, details)
enter_span(span, details)
events

Custom events excuted during the parsing while extracting content.

Type:

dict

extensions

MD4C extensions used to parse the content. See all available in mdpo.md4c module.

Type:

list(str)

extract(po_filepath=None, save=False, mo_filepath=None, po_encoding=None, md_encoding='utf-8', wrapwidth=78)
filepaths
found_entries

Extracted entries.

Type:

list

ignore_msgids

The msgids to ignore for extraction

include_codeblocks

Extract code blocks

Type:

bool

include_next_codeblock

Indicates if the next codeblock will be extracted when include_codeblocks is enabled.

Type:

bool

italic_end_string
italic_end_string_escaped
italic_start_string
italic_start_string_escaped
latexmath_end_string
latexmath_start_string
latexmathdisplay_end_string
latexmathdisplay_start_string
leave_block(block, details)
leave_span(span, details)
location
mark_not_found_as_obsolete
metadata
msgstr

Default msgstr used if the current is not found inside the previous content of the specified PO file.

Type:

str

not_plaintext_enter_span(span, details)
not_plaintext_leave_span(span, details)
plaintext
po_filepath

PO file path to which the content will be extracted.

Type:

str

pofile

polib.POFile PO file object representing the extracted content.

preserve_not_found
strikethrough_end_string
strikethrough_start_string
text(block, text)
underline_end_string
underline_start_string
mdpo.md2po.markdown_to_pofile(files_or_content, ignore=frozenset({}), msgstr='', po_filepath=None, save=False, mo_filepath=None, plaintext=False, wrapwidth=78, mark_not_found_as_obsolete=True, preserve_not_found=True, location=True, extensions=['collapse_whitespace', 'tables', 'strikethrough', 'tasklists', 'latex_math_spans', 'wikilinks'], po_encoding=None, md_encoding='utf-8', xheader=False, include_codeblocks=False, ignore_msgids=frozenset({}), command_aliases=None, metadata=None, events=None, debug=False, **kwargs)

Extract all the msgids from Markdown content or files.

Parameters:
  • files_or_content (str, list) – Glob path to Markdown files, a list of files or a string with Markdown content.

  • ignore (list) – Paths of files to ignore. Useful when a glob does not fit your requirements indicating the files to extract content. Also, filename or a dirname can be defined without indicate the full path.

  • msgstr (str) – Default message string for extracted msgids.

  • po_filepath (str) – File that will be used as polib.POFile instance where to dump the new msgids and that will be used as source checking not found strings that will be marked as obsolete if is the case (see save and mark_not_found_as_obsolete optional parameters).

  • save (bool) – Save the new content to the PO file indicated in the parameter po_filepath. If is enabled and po_filepath is None a ValueError will be raised.

  • mo_filepath (str) – The resulting PO file will be compiled to a MO file and saved in the path specified at this parameter.

  • plaintext (bool) – If you pass True to this parameter (as default) the content will be extracted as is, without markup characters included. Passing plaintext as False, extracted msgids will contain some markup characters used to appoint the location of `inline code`, **bold text**, *italic text* and `[links]`, that might be useful for you. It depends on the use you are going to give to this library activate this mode (plaintext=False) or not.

  • wrapwidth (int) – Wrap width for po file indicated at po_filepath parameter. If negative, 0, ‘inf’ or ‘math.inf’ the content won’t be wrapped.

  • mark_not_found_as_obsolete (bool) – The strings extracted from markdown that will not be found inside the provided PO file will be marked as obsolete.

  • preserve_not_found (bool) – The strings extracted from markdown that will not be found inside the provided PO file wouldn’t be removed. Only has effect if mark_not_found_as_obsolete is False.

  • location (bool) – Store references of top-level blocks in which are found the messages in PO file #: reference comments.

  • extensions (list) – md4c extensions used to parse markdown content, formatted as a list of ‘pymd4c’ keyword arguments. You can see all available at pymd4c documentation.

  • po_encoding (str) – Resulting PO file encoding.

  • md_encoding (str) – Markdown content encoding.

  • xheader (bool) – Indicates if the resulting PO file will have the mdpo x-header included.

  • include_codeblocks (bool) – Include all code blocks found inside PO file result. This is useful if you want to translate all your blocks of code. Equivalent to append <!-- mdpo-include-codeblock --> command before each code block.

  • ignore_msgids (list) – List of msgids ot ignore from being extracted.

  • command_aliases (dict) – Mapping of aliases to use custom mdpo command names in comments. The mdpo- prefix in command names resolution is optional. For example, if you want to use <!-- mdpo-on --> instead of <!-- mdpo-enable -->, you can pass the dictionaries {"mdpo-on": "mdpo-enable"} or {"mdpo-on": "enable"} to this parameter.

  • metadata (dict) – Metadata to include in the produced PO file. If the file contains previous metadata fields, these will be updated preserving the values of the already defined.

  • events (dict) –

    Preprocessing events executed during the parsing process that can be used to customize the extraction process. Takes functions or list of functions as values. If one of these functions returns False, that part of the parsing is skipped by md2po. Available events are the next:

    • enter_block(self, block, details): Executed when the parsing a Markdown block starts.

    • leave_block(self, block, details): Executed when the parsing a Markdown block ends.

    • enter_span(self, span, details): Executed when the parsing of a Markdown span starts.

    • leave_span(self, span, details): Executed when the parsing of a Markdown span ends.

    • text(self, block, text): Executed when the parsing of text starts.

    • command(self, mdpo_command, comment, original command): Executed when a mdpo HTML command is found.

    • msgid(self, msgid, msgstr, msgctxt, tcomment, flags): Executed when a msgid is going to be stored.

    • link_reference(self, target, href, title): Executed when a link reference is going to be stored.

    You can also define the location of these functions by strings with the syntax path/to/file.py::function_name.

    All self arguments are an instance of Md2Po parser. You can take advanced control of the parsing process manipulating the state of the parser. For example, if you want to skip a certain msgid to be included, you can do:

    def msgid_event(self, msgid, *args):
        if msgid == 'foo':
            self.disable_next_block = True
    

  • debug (bool) – Add events displaying all parsed elements in the extraction process.

  • **kwargs – Extra arguments passed to mdpo.md2po.Md2Po constructor.

Examples

>>> content = 'Some text with `inline code`'
>>> entries = markdown_to_pofile(content, plaintext=True)
>>> {e.msgid: e.msgstr for e in entries}
{'Some text with inline code': ''}
>>> entries = markdown_to_pofile(content)
>>> {e.msgid: e.msgstr for e in entries}
{'Some text with `inline code`': ''}
>>> entries = markdown_to_pofile(content, msgstr='Default message')
>>> {e.msgid: e.msgstr for e in entries}
{'Some text with `inline code`': 'Default message'}
Returns:

polib.POFile Pofile instance with new msgids included.

Submodules