Pieter Belmans—Guest post: Inheritance in Pygments' LatexFormatter

This is a guest post by Pieter Goetschalckx. As a fellow classmate of mine we've been working together for most of our projects this year, resulting in some LaTeX-related results like the UAStyle and (unpublished) software for automated Enigma deciphering or numerical continuation of n-dimensional manifolds.

Pygments in combination with the minted package is a very powerful syntax highlighter for LaTeX, with the possibility to use colored, bold and italic text. Unfortunately, the Computer Modern Typewriter font includes no bold shapes. Latin Modern does, but the difference is almost unnoticeable. Therefore I prefer Beramono, which in addition is a nice combination with Bitstream Charter. So far my personal taste.
While using this font setup for highlighting Cython code, I discovered a bug in the LatexFormatter of Pygments. When highlighting the following (rather useless) Cython function, the output (with the default Pygments style) is:
cpdef int add(int a, int b):
    return a + b
Actually, this fragment was highlighted with the HTMLFormatter. The LatexFormatter on the other hand outputs something like this:
cpdef int add(int a, int b):
    return a + b
when converted to HTML, obviously. The actual LaTeX code would be
\documentclass{minimal}
\usepackage{fancyvrb}
\usepackage{color}
\usepackage[T1]{fontenc}
\usepackage[scaled]{beramono}

\makeatletter
\def\PY@reset{\let\PY@it=\relax \let\PY@bf=\relax%
    \let\PY@ul=\relax \let\PY@tc=\relax%
    \let\PY@bc=\relax \let\PY@ff=\relax}
\def\PY@tok#1{\csname PY@tok@#1\endcsname}
\def\PY@toks#1+{\ifx\relax#1\empty\else%
    \PY@tok{#1}\expandafter\PY@toks\fi}
\def\PY@do#1{\PY@bc{\PY@tc{\PY@ul{%
    \PY@it{\PY@bf{\PY@ff{#1}}}}}}}
\def\PY#1#2{\PY@reset\PY@toks#1+\relax+\PY@do{#2}}

\def\PY@tok@nf{\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
\def\PY@tok@o{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
\def\PY@tok@kt{\def\PY@tc##1{\textcolor[rgb]{0.69,0.00,0.25}{##1}}}
\def\PY@tok@k{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
\makeatother

\begin{document}

\begin{Verbatim}[commandchars=\\\{\}]
\PY{k}{cpdef} \PY{k+kt}{int} \PY{n+nf}{add}\PY{p}{(}\PY{k+kt}{int} \PY{n}{a}\PY{p}{,} \PY{k+kt}{int} \PY{n}{b}\PY{p}{)}\PY{p}{:}
    \PY{k}{return} \PY{n}{a} \PY{o}{+} \PY{n}{b}
\end{Verbatim}
\end{document}
resulting in this pdf output. As the output differs in the handling of the type keywords a question arises: which Formatter is correct?
Let's take a look at the style definition in pygments/styles/default.py:
Keyword:      "bold #008000",
Keyword.Type: "nobold #B00040",
Because int is a Keyword.Type, it should not be bold. Simplified, the macro \PY{k+kt}{int} executes \PY@tok@k and \PY@tok@kt before printing \PY@bf{int}. \PY@tok@k sets \PY@bf=\textbf, and \PY@tok@kt does nothing with \PY@bf, so the result is \textbf{int}.
It looks like the LatexFormatter tries to implement inheritance and fails. Funny thing is, inheritance doesn't need to be implemented, because another part of Pygments already takes care of it. We can replace \PY{k+kt}{int} by \PY{kt}{int}. This can be done in pygments/formatter/latex.py, by replacing
styles = []
while ttype is not Token:
    try:
        styles.append(t2n[ttype])
    except KeyError:
        # not in current style
        styles.append(_get_ttype_name(ttype))
    ttype = ttype.parent
styleval = '+'.join(reversed(styles))
with
try:
    styleval = t2n[ttype]
except KeyError:
    # not in current style
    styleval = _get_ttype_name(ttype)
Additionally, some macro definitions in the preamble can be simplified. This will not change anything visually, but it removes unused features. This can be done by replacing
STYLE_TEMPLATE = r'''
\makeatletter
\def\%(cp)s@reset{\let\%(cp)s@it=\relax \let\%(cp)s@bf=\relax%%
    \let\%(cp)s@ul=\relax \let\%(cp)s@tc=\relax%%
    \let\%(cp)s@bc=\relax \let\%(cp)s@ff=\relax}
\def\%(cp)s@tok#1{\csname %(cp)s@tok@#1\endcsname}
\def\%(cp)s@toks#1+{\ifx\relax#1\empty\else%%
    \%(cp)s@tok{#1}\expandafter\%(cp)s@toks\fi}
\def\%(cp)s@do#1{\%(cp)s@bc{\%(cp)s@tc{\%(cp)s@ul{%%
    \%(cp)s@it{\%(cp)s@bf{\%(cp)s@ff{#1}}}}}}}
\def\%(cp)s#1#2{\%(cp)s@reset\%(cp)s@toks#1+\relax+\%(cp)s@do{#2}}
with
STYLE_TEMPLATE = r'''
\makeatletter
\def\%(cp)s@reset{\let\%(cp)s@it=\relax \let\%(cp)s@bf=\relax%%
    \let\%(cp)s@ul=\relax \let\%(cp)s@tc=\relax%%
    \let\%(cp)s@bc=\relax \let\%(cp)s@ff=\relax}
\def\%(cp)s@tok#1{\csname %(cp)s@tok@#1\endcsname}
\def\%(cp)s@do#1{\%(cp)s@bc{\%(cp)s@tc{\%(cp)s@ul{%%
    \%(cp)s@it{\%(cp)s@bf{\%(cp)s@ff{#1}}}}}}}
\def\%(cp)s#1#2{\%(cp)s@reset\%(cp)s@tok{#1}\%(cp)s@do{#2}}
Now we've arrived at the correct version of the backend, but there are still some issues with the CythonLexer. In most cases this will suffice (unless other lexers contain bugs as well), yet for completeness' sake, I give my version of the CythonLexer. It can be found at 314eter / pygments-CythonLexer.
The result: after all the described modifications, we should arrive at this highlighted piece of code.
As a matter of fact, this issue with the LaTeXFormatter is known to Pygments (by means of this bug report), but they've put the issue on hold. If you encounter it, this workaround could be the solution.

And this kids is why you should never dig into TeX, it's addictive :).