an open markdown workflow

The “md workflow package” is a compilation of the technical tools and fixes we have developed to make the entire production process of a scientific journal – from an accepted manuscript to a beautifully typeset and professionally designed end product in PDF, HTML and XML. The workflow package is reliant only on Open Source Software, which can be used, further developed and adapted by anyone for free. We believe that by making copy-editing both simpler and more cost-efficient, we will remove a central obstacle in the way of an Open Access future for all academic publications.

This workflow is used by Dialectica, and most of its development has occurred thanks to solving the different challenges found in the attempt of implementing it for the publishing needs of the journal.


Markdown and Pandoc

Our workflow includes different tools coordinated in a specific way.

Our articles are all written in Markdown. This is a plain-text, simple, markup language. Markdown is an excellent format for writing documents, and as a language is designed to be comfortable for humans to read and write. We can think of LaTeX and HTML as examples of other markup languages. This implies that, to obtain a final product, we need to render the Markdown documents into PDFs.

At the core, we use pandoc, a program written by the philosophy professor John MacFarlane, which provides the conversion from Markdown to PDF and HTML. This process is called "compilation" (from Markdown to something else). Pandoc is an Open Source, very efficient and popular tool for this conversion, and it extends Markdown syntax to allow for more flexibility in the final PDFs. In particular, pandoc allows inserting LaTeX environments and commands, for example, to render logical and mathematical formulas and proofs.


In-house Lua filters for Pandoc

On top of pandoc's Markdown, our workflow includes pandoc Lua filters, written in the Lua programming language by ourselves, to further extend pandoc's rendering capabilities and syntax to better match our publishing needs. Julien Dutant is the main developer and maintainer behind our Lua filters. We have produced many of these over the years, and they are all free to use, and their source code is freely available on GitHub. As of 26.06.2024, our collection of Lua filters for pandoc include the following:

  • columns: for multiple columns support

  • statement: for statements and theorems support

  • first-line-indent: for first-line indentation (LaTeX and HTML output) (Quarto / Pandoc)

  • recursive-citeproc: to handle self-citing bibliographies (Quarto / Pandoc)

  • labelled-lists: to handle custom labelled lists in LaTeX and HTML output

  • bib-place: for template control of the placement of a document's bibliography when using Pandoc citeproc

  • not-in-format: to keep part of a document out of selected output formats. Included in pandoc/lua-filters

  • secnumdepth: to enable the secnum-depth variable in formats other than LaTeX

  • prefix-ids: to add a prefix to all identifiers within a Pandoc document

  • longtable-to-xtab: to convert LaTeX longtable environments into xtab environments in Pandoc's LaTeX output (to be used within columns with the columns filter)

  • functions: to provide some utility functions to be reused across filters

An overview of this collection can be found on GitHub. A series of simple shell scripts (small programs for the terminal) coordinate the use of pandoc together with the filters in the different ways we need.


Compilation environment

Good software always comes in versions, and Pandoc, LaTeX, and Lua are no exception to this rule. A challenge when coordinating any team, however, is to ensure that the versions of the different pieces of software match (or match just enough) such that the outputs produced are the same. A (software) "environment" can be then understood as a set of software tools of a specific version.

The versions may depend on the date in which you obtained a software, and the Operating System you use (Windows, Mac, or Linux). If the versions of each team member's software differ too much, the output of the workflow (a PDF or an HTML) may end up being different from another team member's. Or they may end up having different compilation problems, or someone may find that they have a compilation issue, whereas a team member compiles perfectly their PDFs.

To solve this problem, we opted to integrate in the workflow the use of Docker, a very popular software development tool used to deploy applications in standardized "containers". We thus defined a standardized compilation environment for Dialectica, inside which the versions of the software used (LaTeX, Pandoc, Lua) remain frozen and equal to anyone using the environment. This environment can be replicated in any machine with Docker installed, and thanks to integrations with modern text editors (such as those available in our text editor of choice, VSCode), we can edit our Markdown files and easily use our standardized environment to produce the same PDF and HTML files anywhere.

While we maintain a docker image on Dockerhub to quickly replicate the compilation environment anywhere, unfortunately we cannot make it public, as it contains copyrighted fonts. The image is built based on very simple rules, however: it is an Ubuntu image, with LaTeX, pandoc, and our fonts installed.


The Open Manual of Markdown Style

We have also produced an 'Open Manual of Markdown Style,' which states Dialectica's instructions for authors concerning matters of style and typography. It also includes technical details on how to set up the different tools needed for our Markdown workflow.

We strive to balance authors' freedom of expression with a desire for typographical uniformity, and we hope that this document will prove helpful to other journals in the humanities. The manual is hosted on GitHub pages and is freely accessible. It is written in Quarto markdown, and its source code is open and accessible in GitHub.