Generate Markdown from org-mode file using ox-pandoc
Finally make it possible and I can write org-mode for Jekyll.
Background
This blog site is generated by Jekyll and rendered beautifully using the Chirpy theme. I’m pretty happy with it, but I feel sometimes rather hesitant, to add more posts. I did not have this feeling years ago when I only use markdown to write posts.
It comes from the fear of repeating myself: I am writing all my private notes in org-mode. This format is powered by Emacs and have rich features in addition to markdown. On the other hand, while Jekyll can recognize org-mode with the help of jekyll-org
, markdown is still its favorite format. Before finding a good exporter, I have to copy the file, manually do the conversion from org-mode to markdown, and repeat it whenever I update the original note. This is a boring labor and does not add to my knowledge.
This fear, if not the underlying laziness1, has been holding me from publishing more posts here. Now, eventually, I find myself unable to stand it any more and face it directly.
In the market
I am definitely not the only one with this problem. Searching the Internet gives me a few available approaches:
ox-md
: built-in markdown exporter. With a quick try, I found that it failed to export the#+title
property correctly. It does not write```
style code block either, making the syntax highlighter in Jekyll useless.ox-gfm
: exporter for GitHub favored markdown, derived fromox-md
. It can correctly export```
code block, but still does not export the title.ox-html
: also a built-in exporter but directly to HTML. This is a fundamental way to publish org-mode contents online, and has been used to connect org-mode and Jekyll as described in this official tutorial. However, it seems rather cumbersome to take advantage of the Chirpy theme when posts are directly written in HTML.ox-hugo
: a markdown exporter oriented to the Hugo system. This exporter is oriented to Hugo, and it assumes the file hierarchy in Hugo. For example, markdown posts have to be exported under thecontent
directory.ox-pandoc
: wrapper around the wonderful universal converter pandoc. It correctly handles title and code blocks when exporting markdown. It also supports reading the metadata of org file like#+title
,#+date
.
ox-pandoc
turns out to suit the best for my needs.
Installation
To use ox-pandoc
, pandoc
should of course be discoverable in the executable paths, and then include ox-pandoc
in Emacs configuration. The latter is almost one-liner in Doom Emacs: simply switching on the pandoc plugin of the org module in init.el
1
2
(doom! :lang
(org +pandoc))
Decent defaults have already been configured, so it is not necessary to add more codes to config.el
. If bibliography is needed, citeproc-el
package should be installed and loaded.
1
2
3
4
5
;; in package.el
(package! citeproc)
;; in config.el
(use-package! citeproc)
Writing and configuration
Pandoc options
pandoc command line options can be customized by #+PANDOC_OPTIONS
in the org-mode header.
For example, heading level of markdown posts in Jekyll starts from 2, but it usually starts from 1 in org-mode. Therefore it should be promoted by 1 when exporting to markdown for Jekyll. For the built-in markdown exporter, there is exactly a variable for this purpose org-md-toplevel-hlevel
(also org-html-toplevel-hlevel
in HTML exporter), but ox-pandoc
does not acknowledge it. Instead, we need to parse the --shift-heading-level-by
command line option of pandoc
1
#+PANDOC_OPTIONS: shift-heading-level-by:1
Pandoc metadata
Metadata, such as category and tags, have to be parsed to pandoc by #+PANDOC_METADATA
option. Different entries can be specified in multiple #+PANDOC_METADATA
lines.
1
2
#+PANDOC_METADATA: categories:tool
#+PANDOC_METADATA: "tags:Emacs org-mode pandoc ox-pandoc"
This will results in the YAML header
1
2
3
4
---
categories: tool
tags: Emacs org-mode pandoc ox-pandoc
---
This is already handy, though it would be nice if ox-pandoc
can recognize #+filetags
.
Note that single tag with space is needed, it is necessary to parse an empty tag,
1
#+PANDOC_METADATA: "tags:First tag" tags: "tags:Second tag"
will generate
1
2
3
4
5
6
---
tags:
- First tag
-
- Second tag
---
The empty flag is fine as it will be ignored by Jekyll.
Liquid template
Simple liquid filter can be used by wrapping it with markdown raw code, @@markdown:@@
. For example, @@markdown:site.time | date_to_xmlschema@@
(double curl brace neglected, otherwise it will be rendered as liquid) gives: 2025-04-04T12:01:41+02:00.
Internal links
Direct internal link such as [[Internal links]]
works in org-mode, but would be exported to something like [4.4](#Internal links)
by ox-pandoc
at the time of writing. A workaround is to use the explicit anchor [[#internal-links][Internal links]]
, leading to Internal links. To make it also work within org-mode, a custom id is required
1
2
3
4
** Internal links
:PROPERTIES:
:CUSTOM_ID: internal-links
:END:
The custom ID is not exported for the gfm
writer. Therefore if the custom ID is not the same as the one generated by Jekyll, the link will fail. A partial solution is to use the general markdown
writer. Nevertheless the markdown
writer has its own issues in this case and I will addressed it later.
Equations
Typing equations is simple. In org-mode it is not required to add surrounding $$
for a display math equation.
1
2
3
4
5
\begin{equation}\label{eq:euler}
\begin{aligned}
e^{i \pi} + 1 = 0
\end{aligned}
\end{equation}
which is rendered as
It is possible in org-mode to cross-reference the equation by eqref:eq:euler
using org-ref. Unfortunately it is neither acknowledged by markdown nor converted to the correct \eqref{eq:euler}
syntax using ox-pandoc
.
The conversion can be done by using a pandoc filter (thanks to ChatGPT)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Pandoc filter to replace strings starting with 'eqref:' with Math nodes
-- In JSON AST, from
-- {"t":"Str","c":"eqref:eq:xxx"}
-- to
-- {"t":"Str","c":"\\eqref{eq:euler}"}
-- Trailing punctuation are handled.
function Str(el)
local prefix = "eqref:"
local content = el.text
if content:sub(1, #prefix) == prefix then
local eqref_content = content:sub(#prefix + 1)
-- Check for trailing punctuation
-- (.-) captures as few characters as possible (non-greedy)
-- ([%p%s]*) captures zero or more punctuation or whitespace characters
local label, punctuation = eqref_content:match("^(.-)([%p%s]*)$")
local eqref = pandoc.Str('\\eqref{' .. label .. '}')
if punctuation ~= "" then
-- Return the target node followed by a Str node with the punctuation
return {eqref, pandoc.Str(punctuation)}
else
-- Return only the target node if no relevant punctuation is found
return eqref
end
end
return el
end
Save the filter somewhere and add it to the pandoc options
1
#+pandoc_options: lua-filter:convert_org_ref_eqref.lua
Then org-ref equation link should work:
Citation and Bibliography
ox-pandoc
is aware of org-cite-style citation link, for example, [cite:@PerdewJ96PBE]
, and can render it into markdown-type link before it is parsed to pandoc using citeproc-el. One just needs to specify the bibliography file to look for the citation key with #+bibliography
and how the entry should be exported with #+cite_export
; see this section of ox-pandoc
.
In my case, I have a .bib
file and citation style file under etc/
directory, so I just need to add the following two lines in the heading
1
2
#+bibliography: etc/bibliography.bib
#+cite_export: csl etc/american-physics-society-without-titles.csl
When exported to markdown, [cite:@PerdewJ96PBE]
is rendered as \[[1](#citeproc_bib_item_1)\]
and gives [1] on the Jekyll end. To print the bibliography, just insert the directive #+print_bibliography:
at the place where you want to put the list of references. In this case, it is converted to markdown to
1
2
<span id="citeproc_bib_item_1"></span>\[1\] J. P. Perdew, K. Burke, and
M. Ernzerhof, Phys. Rev. Lett. **77**, 3865 (1996).
To make ox-pandoc
also handle the org-ref link, we can work around by converting the org-ref-style cite:X
links to org-cite [cite:@X]
. This can be done using the following snippet:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
(defun my/org-pandoc-convert-org-ref-link-to-org-cite (BACKEND &optional subtreep)
"Hook function to convert org-ref cite link to org-cite cite link.
Currently it uses a naive implementation by `re-search-forward' for the conversion.
Caveats:
- only work with pandoc backend
- only handle cite and fullcite
- cannot handle notes"
(if (not (equal BACKEND 'pandoc)) ()
(goto-char (point-min))
(while (re-search-forward
"\\([=\~]\\)?\\[?\\[?\\(cite\\|fullcite\\):&?\\([^] @\t\r\n]+\\)\\]?\\]?\\([=\~]\\)?"
nil t)
; do not convert those in a source code block or inline code
(unless (or (org-in-src-block-p)
(string= (match-string 1) "=") (string= (match-string 1) "~")
(string= (match-string 4) "=") (string= (match-string 4) "~"))
(let ((keys ; handle multiple keys
(replace-regexp-in-string "[,;]&?" ";@" (match-string 3))))
(cl-case (intern (match-string 2))
(fullcite
(replace-match (format "[cite/bibentry/bare:@%s]" keys)))
(t
(replace-match (format "[\\2:@%s]" keys)))))))))
(add-to-list 'org-export-before-parsing-functions 'my/org-pandoc-convert-org-ref-link-to-org-cite)
Load and try to do the conversion:
cite:PerdewJ96PBE
: [1]cite:&PerdewJ96PBE
: [1][cite:&PerdewJ96PBE]
: [1]=cite:PerdewJ96PBE=
:cite:PerdewJ96PBE
~[cite:&PerdewJ96PBE]~
:[cite:&PerdewJ96PBE]
fullcite:&PerdewJ96PBE
: J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996) [DOI].
For more about citation in org-mode, the nice series by William Denton (the first article found here) is worth reading.
Footnote
In org-mode footnote is written as
1
2
3
Hello world![fn:hw]
[fn:hw] this is a footnote
It will be rendered by pandoc in markdown as
1
2
3
Hello world![^1]
[^:1] this is a footnote
Note that there is an issue (also here opened by John himself) in pandoc that footnote will be duplicated when referenced more than once. A dirty workaround is to use the raw code @@markdown:[^1]@@
, leading to 1. You have to track the footnote number yourself since pandoc always converts the raw footnote IDs to sequential numbers.
Citations in this case are not affected because normal markdown link is generated by citeproc-el
instead of footnote link.
Chirpy prompt block quotes
Chirpy can render block quotes with prompt-
class nicely. In Jekyll markdown parser kramdown, HTML attributes of a block can be specified by adding {:}
before or after the element. For example,
1
2
3
4
5
6
@@markdown:{:.prompt-tip name=my-first-tip}@@
#+begin_quote
This is a tip block.
Another tip line.
#+end_quote
is converted by pandoc to markdown as
1
2
3
4
5
{:.prompt-tip name=my-first-tip}
> This is a tip block.
>
> Another tip line.
and rendered by Jekyll into a prompt block
This is a tip block.
Another tip line.
The block quote in HTML is <blockquote class="prompt-tip" name="my-first-tip">
. Apart from tip
, info
, warning
and danger
are available.
This is an info block.
This is a warning block.
This is a danger block.
GitHub flavor (GFM) or other?
The GFM is widely used and gfm
is a format specifically supported by pandoc. But one issue I have is that gfm
does not export the custom identifier of heading when the custom ID is different from the heading slug.
Good news is that the general markdown
writer supports writing custom ID as # heading {#id}
. Bad news, however, is its own caveats:
- org-mode headers are written as
RawBlock
- Tables are exported in an indented simple format rather than pipe tables.
- The “verbatim” class, added to the
pandoc.Code
object when parsing inline verbatim (=verb=
) to AST, is exported as`verb`{.verbatim}
. Unfortunately, it cannot be rendered by kramdown, which expects`verb`{:.verbatim}
. - Attributes of source code block will be exported as well, but kramdown fails to render them, similar to the inline verbatim.
The last two are actually related to the way the markdown
writer writes the class and attributes for the object. Before a kramdown variant of Markdown writer is implemented (reader discussed in pandoc#2711), some pandoc filter may work around these issues.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Remove RawBlock nodes like "#+export_file_name" in markdown export
function RawBlock (el)
return {}
end
-- Remove verbatim class of inline code
function Code(c)
local filter_verbatim = function(le)
return le ~= "verbatim"
end
c.classes = c.classes:filter(filter_verbatim)
return c
end
-- Remove uncessary attributes of code block to avoid render issue
function CodeBlock(c)
c.attributes.eval = nil
c.attributes.results = nil
c.attributes.wrap = nil
c.attributes.exports = nil
return c
end
In cases where link attributes are required, we can keep them but export the link in HTML format. This can be achieved by dropping the link_attributes
extension. Meanwhile, the issue about tables can be solved by switching off the simple_tables
extension. This can be done with ox-pandoc
by setting in the header
1
#+pandoc_extensions: markdown-simple_tables-link_attributes
Subtree export
It is possible to export a subtree of an org file and have own pandoc options for each subtree. The options should go to the property drawer, with export_
prefix.
1
2
3
4
5
6
7
8
9
* Title of my subtree
:PROPERTIES:
:export_file_name: ~/my_subtree_export
:export_author: anonymous
:export_options: toc:nil tags:nil title:t date:nil author:t
:export_pandoc_options: shift-heading-level-by:1
:export_pandoc_metadata: comments:true
:export_pandoc_metadata+: categories:tool math:false
:END:
Exporting the subtree either by (org-pandoc-export-to-gfm nil t)
or changing export scope to subtree by C-s
in export dispatch, will generate ~/my_subtree_export.md
as follows
1
2
3
4
5
6
7
---
author: anonymous
categories: tool
comments: true
math: false
title: Title of my subtree
---
This would be useful if notes are written in a single file but wants to be published in a series of markdown posts. Note that multi-line options and metadata have to be defined with the <PROPERTY>+
syntax. Otherwise, only the last line of the same property will be recognized.
Batch export
Emacs can be run in batch mode, so that the conversion can be done without opening Emacs GUI.
1
2
3
4
emacs -Q -batch -load etc/common.el -load etc/jekyll.el \
--eval '(setq enable-local-variables :all)' \
--visit=generate-markdown-from-org-using-pandoc.org \
-f org-pandoc-export-to-gfm --eval '(sleep-for 5)'
where etc/common.el
defines org variables common to all org export tasks, e.g. user-full-name
, user-mail-address
, as well as org-link-abbrev-alist
to decode link abbreviations. In etc/jekyll.el
, ox-pandoc
and citeproc-el
are loaded.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
;; Use packages under doom directory
;; ox-pandoc
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/ox-pandoc")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/dash.el")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/ht.el")
;; citeproc-el
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/citeproc-el")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/queue")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/compat")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/s.el")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/f.el")
(add-to-list 'load-path "~/.config/emacs/.local/straight/repos/parsebib")
(unless (executable-find "pandoc")
(error "pandoc is not found"))
(require 'ox)
(require 'ox-pandoc)
(require 'citeproc)
(setq org-export-with-broken-links t) ; t/'mark/nil
(add-to-list 'org-export-backends 'pandoc)
(setq org-pandoc-options
'((standalone . t)
(mathjax . t)
(variable . "revealjs-url=https://revealjs.com")))
The final (sleep-for 5)
command waits for the asynchronous pandoc
process triggered by org-pandoc-export-to-gfm
(see ox-pandoc#6) to finish. Otherwise it would be killed before the markdown is correctly generated.
Batch export of subtree should possible but needs more tweak, which goes beyond the scope of this note.
Summary
With pandoc
and ox-pandoc
in Emacs, I can write my org-mode notes, convert them to markdown and finally publish them on my Jekyll site, without laborious manual changes. In fact, this post is written in this way.
References
[1] J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996) [DOI].
Comments powered by Disqus.