Post

Generate Markdown from org-mode file using ox-pandoc

Finally make it possible and I can write org-mode for Jekyll.

Generate Markdown from org-mode file using ox-pandoc

Background

This blog site is generated by Jekyll and rendered beautifully using the Chirpy theme. I’m pretty happy with it, but I feel sometimes rather hesitant, to add more posts. I did not have this feeling years ago when I only use markdown to write posts.

It comes from the fear of repeating myself: I am writing all my private notes in org-mode. This format is powered by Emacs and have rich features in addition to markdown. On the other hand, while Jekyll can recognize org-mode with the help of jekyll-org, markdown is still its favorite format. Before finding a good exporter, I have to copy the file, manually do the conversion from org-mode to markdown, and repeat it whenever I update the original note. This is a boring labor and does not add to my knowledge.

This fear, if not the underlying laziness1, has been holding me from publishing more posts here. Now, eventually, I find myself unable to stand it any more and face it directly.

In the market

I am definitely not the only one with this problem. Searching the Internet gives me a few available approaches:

  1. ox-md: built-in markdown exporter. With a quick try, I found that it failed to export the #+title property correctly. It does not write ``` style code block either, making the syntax highlighter in Jekyll useless.
  2. ox-gfm: exporter for GitHub favored markdown, derived from ox-md. It can correctly export ``` code block, but still does not export the title.
  3. ox-html: also a built-in exporter but directly to HTML. This is a fundamental way to publish org-mode contents online, and has been used to connect org-mode and Jekyll as described in this official tutorial. However, it seems rather cumbersome to take advantage of the Chirpy theme when posts are directly written in HTML.
  4. ox-hugo: a markdown exporter oriented to the Hugo system. This exporter is oriented to Hugo, and it assumes the file hierarchy in Hugo. For example, markdown posts have to be exported under the content directory.
  5. ox-pandoc: wrapper around the wonderful universal converter pandoc. It correctly handles title and code blocks when exporting markdown. It also supports reading the metadata of org file like #+title, #+date.

ox-pandoc turns out to suit the best for my needs.

Installation

To use ox-pandoc, pandoc should of course be discoverable in the executable paths, and then include ox-pandoc in Emacs configuration. The latter is almost one-liner in Doom Emacs: simply switching on the pandoc plugin of the org module in init.el

1
2
(doom! :lang
       (org +pandoc))

Decent defaults have already been configured, so it is not necessary to add more codes to config.el. If bibliography is needed, citeproc-el package should be installed and loaded.

1
2
3
4
5
;; in package.el
(package! citeproc)

;; in config.el
(use-package! citeproc)

Writing and configuration

Pandoc options

pandoc command line options can be customized by #+PANDOC_OPTIONS in the org-mode header.

For example, heading level of markdown posts in Jekyll starts from 2, but it usually starts from 1 in org-mode. Therefore it should be promoted by 1 when exporting to markdown for Jekyll. For the built-in markdown exporter, there is exactly a variable for this purpose org-md-toplevel-hlevel (also org-html-toplevel-hlevel in HTML exporter), but ox-pandoc does not acknowledge it. Instead, we need to parse the --shift-heading-level-by command line option of pandoc

1
#+PANDOC_OPTIONS: shift-heading-level-by:1

Pandoc metadata

Metadata, such as category and tags, have to be parsed to pandoc by #+PANDOC_METADATA option. Different entries can be specified in multiple #+PANDOC_METADATA lines.

1
2
#+PANDOC_METADATA: categories:tool
#+PANDOC_METADATA: "tags:Emacs org-mode pandoc ox-pandoc"

This will results in the YAML header

1
2
3
4
---
categories: tool
tags: Emacs org-mode pandoc ox-pandoc
---

This is already handy, though it would be nice if ox-pandoc can recognize #+filetags.

Note that single tag with space is needed, it is necessary to parse an empty tag,

1
#+PANDOC_METADATA: "tags:First tag" tags: "tags:Second tag"

will generate

1
2
3
4
5
6
---
tags:
- First tag
-
- Second tag
---

The empty flag is fine as it will be ignored by Jekyll.

Liquid template

Simple liquid filter can be used by wrapping it with markdown raw code, @@markdown:@@. For example, @@markdown:site.time | date_to_xmlschema@@ (double curl brace neglected, otherwise it will be rendered as liquid) gives: 2025-04-04T12:01:41+02:00.

Direct internal link such as [[Internal links]] works in org-mode, but would be exported to something like [4.4](#Internal links) by ox-pandoc at the time of writing. A workaround is to use the explicit anchor [[#internal-links][Internal links]], leading to Internal links. To make it also work within org-mode, a custom id is required

1
2
3
4
** Internal links
:PROPERTIES:
:CUSTOM_ID: internal-links
:END:

The custom ID is not exported for the gfm writer. Therefore if the custom ID is not the same as the one generated by Jekyll, the link will fail. A partial solution is to use the general markdown writer. Nevertheless the markdown writer has its own issues in this case and I will addressed it later.

Equations

Typing equations is simple. In org-mode it is not required to add surrounding $$ for a display math equation.

1
2
3
4
5
\begin{equation}\label{eq:euler}
\begin{aligned}
e^{i \pi} + 1 = 0
\end{aligned}
\end{equation}

which is rendered as

eiπ+1=0

It is possible in org-mode to cross-reference the equation by eqref:eq:euler using org-ref. Unfortunately it is neither acknowledged by markdown nor converted to the correct \eqref{eq:euler} syntax using ox-pandoc.

The conversion can be done by using a pandoc filter (thanks to ChatGPT)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Pandoc filter to replace strings starting with 'eqref:' with Math nodes
-- In JSON AST, from
--   {"t":"Str","c":"eqref:eq:xxx"}
-- to
--   {"t":"Str","c":"\\eqref{eq:euler}"}
-- Trailing punctuation are handled.
function Str(el)
  local prefix = "eqref:"
  local content = el.text
  if content:sub(1, #prefix) == prefix then
    local eqref_content = content:sub(#prefix + 1)
    -- Check for trailing punctuation
    -- (.-) captures as few characters as possible (non-greedy)
    -- ([%p%s]*) captures zero or more punctuation or whitespace characters
    local label, punctuation = eqref_content:match("^(.-)([%p%s]*)$")
    local eqref = pandoc.Str('\\eqref{' .. label .. '}')

    if punctuation ~= "" then
      -- Return the target node followed by a Str node with the punctuation
      return {eqref, pandoc.Str(punctuation)}
    else
      -- Return only the target node if no relevant punctuation is found
      return eqref
    end
  end

  return el
end

Save the filter somewhere and add it to the pandoc options

1
#+pandoc_options: lua-filter:convert_org_ref_eqref.lua

Then org-ref equation link should work: (???).

Citation and Bibliography

ox-pandoc is aware of org-cite-style citation link, for example, [cite:@PerdewJ96PBE], and can render it into markdown-type link before it is parsed to pandoc using citeproc-el. One just needs to specify the bibliography file to look for the citation key with #+bibliography and how the entry should be exported with #+cite_export; see this section of ox-pandoc.

In my case, I have a .bib file and citation style file under etc/ directory, so I just need to add the following two lines in the heading

1
2
#+bibliography: etc/bibliography.bib
#+cite_export: csl etc/american-physics-society-without-titles.csl

When exported to markdown, [cite:@PerdewJ96PBE] is rendered as \[[1](#citeproc_bib_item_1)\] and gives  [1] on the Jekyll end. To print the bibliography, just insert the directive #+print_bibliography: at the place where you want to put the list of references. In this case, it is converted to markdown to

1
2
<span id="citeproc_bib_item_1"></span>\[1\] J. P. Perdew, K. Burke, and
M. Ernzerhof, Phys. Rev. Lett. **77**, 3865 (1996).

To make ox-pandoc also handle the org-ref link, we can work around by converting the org-ref-style cite:X links to org-cite [cite:@X]. This can be done using the following snippet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
(defun my/org-pandoc-convert-org-ref-link-to-org-cite (BACKEND &optional subtreep)
  "Hook function to convert org-ref cite link to org-cite cite link.

Currently it uses a naive implementation by `re-search-forward' for the conversion.
Caveats:
- only work with pandoc backend
- only handle cite and fullcite
- cannot handle notes"
  (if (not (equal BACKEND 'pandoc)) ()
    (goto-char (point-min))
    (while (re-search-forward
             "\\([=\~]\\)?\\[?\\[?\\(cite\\|fullcite\\):&?\\([^] @\t\r\n]+\\)\\]?\\]?\\([=\~]\\)?"
             nil t)
      ; do not convert those in a source code block or inline code
      (unless (or (org-in-src-block-p)
                  (string= (match-string 1) "=") (string= (match-string 1) "~")
                  (string= (match-string 4) "=") (string= (match-string 4) "~"))
        (let ((keys  ; handle multiple keys
                (replace-regexp-in-string "[,;]&?" ";@" (match-string 3))))
          (cl-case (intern (match-string 2))
                   (fullcite
                     (replace-match (format "[cite/bibentry/bare:@%s]" keys)))
                   (t
                     (replace-match (format "[\\2:@%s]" keys)))))))))

(add-to-list 'org-export-before-parsing-functions 'my/org-pandoc-convert-org-ref-link-to-org-cite)

Load and try to do the conversion:

  • cite:PerdewJ96PBE:  [1]
  • cite:&PerdewJ96PBE:  [1]
  • [cite:&PerdewJ96PBE]:  [1]
  • =cite:PerdewJ96PBE=: cite:PerdewJ96PBE
  • ~[cite:&PerdewJ96PBE]~: [cite:&PerdewJ96PBE]
  • fullcite:&PerdewJ96PBE: J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996) [DOI].

For more about citation in org-mode, the nice series by William Denton (the first article found here) is worth reading.

Footnote

In org-mode footnote is written as

1
2
3
Hello world![fn:hw]

[fn:hw] this is a footnote

It will be rendered by pandoc in markdown as

1
2
3
Hello world![^1]

[^:1] this is a footnote

Note that there is an issue (also here opened by John himself) in pandoc that footnote will be duplicated when referenced more than once. A dirty workaround is to use the raw code @@markdown:[^1]@@ , leading to 1. You have to track the footnote number yourself since pandoc always converts the raw footnote IDs to sequential numbers.

Citations in this case are not affected because normal markdown link is generated by citeproc-el instead of footnote link.

Chirpy prompt block quotes

Chirpy can render block quotes with prompt- class nicely. In Jekyll markdown parser kramdown, HTML attributes of a block can be specified by adding {:} before or after the element. For example,

1
2
3
4
5
6
@@markdown:{:.prompt-tip name=my-first-tip}@@
#+begin_quote
This is a tip block.

Another tip line.
#+end_quote

is converted by pandoc to markdown as

1
2
3
4
5
{:.prompt-tip name=my-first-tip}

> This is a tip block.
>
> Another tip line.

and rendered by Jekyll into a prompt block

This is a tip block.

Another tip line.

The block quote in HTML is <blockquote class="prompt-tip" name="my-first-tip">. Apart from tip, info, warning and danger are available.

This is an info block.

This is a warning block.

This is a danger block.

GitHub flavor (GFM) or other?

The GFM is widely used and gfm is a format specifically supported by pandoc. But one issue I have is that gfm does not export the custom identifier of heading when the custom ID is different from the heading slug.

Good news is that the general markdown writer supports writing custom ID as # heading {#id}. Bad news, however, is its own caveats:

  • org-mode headers are written as RawBlock
  • Tables are exported in an indented simple format rather than pipe tables.
  • The “verbatim” class, added to the pandoc.Code object when parsing inline verbatim (=verb=) to AST, is exported as `verb`{.verbatim}. Unfortunately, it cannot be rendered by kramdown, which expects `verb`{:.verbatim}.
  • Attributes of source code block will be exported as well, but kramdown fails to render them, similar to the inline verbatim.

The last two are actually related to the way the markdown writer writes the class and attributes for the object. Before a kramdown variant of Markdown writer is implemented (reader discussed in pandoc#2711), some pandoc filter may work around these issues.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Remove RawBlock nodes like "#+export_file_name" in markdown export
function RawBlock (el)
  return {}
end

-- Remove verbatim class of inline code
function Code(c)
  local filter_verbatim = function(le)
    return le ~= "verbatim"
  end
  c.classes = c.classes:filter(filter_verbatim)
  return c
end

-- Remove uncessary attributes of code block to avoid render issue
function CodeBlock(c)
  c.attributes.eval = nil
  c.attributes.results = nil
  c.attributes.wrap = nil
  c.attributes.exports = nil
  return c
end

In cases where link attributes are required, we can keep them but export the link in HTML format. This can be achieved by dropping the link_attributes extension. Meanwhile, the issue about tables can be solved by switching off the simple_tables extension. This can be done with ox-pandoc by setting in the header

1
#+pandoc_extensions: markdown-simple_tables-link_attributes

Subtree export

It is possible to export a subtree of an org file and have own pandoc options for each subtree. The options should go to the property drawer, with export_ prefix.

1
2
3
4
5
6
7
8
9
* Title of my subtree
:PROPERTIES:
:export_file_name: ~/my_subtree_export
:export_author: anonymous
:export_options: toc:nil tags:nil title:t date:nil author:t
:export_pandoc_options: shift-heading-level-by:1
:export_pandoc_metadata: comments:true
:export_pandoc_metadata+: categories:tool math:false
:END:

Exporting the subtree either by (org-pandoc-export-to-gfm nil t) or changing export scope to subtree by C-s in export dispatch, will generate ~/my_subtree_export.md as follows

1
2
3
4
5
6
7
---
author: anonymous
categories: tool
comments: true
math: false
title: Title of my subtree
---

This would be useful if notes are written in a single file but wants to be published in a series of markdown posts. Note that multi-line options and metadata have to be defined with the <PROPERTY>+ syntax. Otherwise, only the last line of the same property will be recognized.

Batch export

Emacs can be run in batch mode, so that the conversion can be done without opening Emacs GUI.

1
2
3
4
emacs -Q -batch -load etc/common.el -load etc/jekyll.el \
        --eval '(setq enable-local-variables :all)' \
        --visit=generate-markdown-from-org-using-pandoc.org \
        -f org-pandoc-export-to-gfm --eval '(sleep-for 5)'

where etc/common.el defines org variables common to all org export tasks, e.g. user-full-name, user-mail-address, as well as org-link-abbrev-alist to decode link abbreviations. In etc/jekyll.el, ox-pandoc and citeproc-el are loaded.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
;; Use packages under doom directory
;; ox-pandoc
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/ox-pandoc")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/dash.el")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/ht.el")

;; citeproc-el
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/citeproc-el")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/queue")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/compat")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/s.el")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/f.el")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/parsebib")

(unless (executable-find "pandoc")
  (error "pandoc is not found"))

(require 'ox)
(require 'ox-pandoc)
(require 'citeproc)

(setq org-export-with-broken-links t)  ; t/'mark/nil
(add-to-list 'org-export-backends 'pandoc)

(setq org-pandoc-options
      '((standalone . t)
        (mathjax . t)
        (variable . "revealjs-url=https://revealjs.com")))

The final (sleep-for 5) command waits for the asynchronous pandoc process triggered by org-pandoc-export-to-gfm (see ox-pandoc#6) to finish. Otherwise it would be killed before the markdown is correctly generated.

Batch export of subtree should possible but needs more tweak, which goes beyond the scope of this note.

Summary

With pandoc and ox-pandoc in Emacs, I can write my org-mode notes, convert them to markdown and finally publish them on my Jekyll site, without laborious manual changes. In fact, this post is written in this way.

References

[1] J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996) [DOI].


This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.