Academic writing in Markdown with Obsidian, Pandoc and Zotero
Everything markdown
I enjoy writing most things in Markdown, and I enjoy having all my data in on my own computers and servers in Markdown, and not some proprietary binary data somewhere on the cloud. This includes my academic writing. Whether it’s a manuscript draft, a grant proposal, a review, or a thesis, the starting point is always a Markdown file in my favorite markdown editor.
Everything in Markdown is just plaintext (or, well, probably stored in UTF-8 / RFC 3629). Headlines, quotes, bold, italic, code blocks, tables, lists – it is all right there at your fingertips, simple, easy, understandable. I don’t want to delve into the mysteries of what a table is in Word, LibreOffice, or RTF, or EndNote. I just want a table. Here’s what a Markdown table looks like:
| Variable | Value |
|---------------|---------------|
| Some variable | Some value |
| Some variable | Some value |
It is just text. Exactly how it is then rendered (if at all) is up to whatever software you use for rendering. It is a clear separation of conerns (see Dijkstra): first authoring content, and then rendering content. For example, a Markdown table in a Quarto document might be rendered as an HTML table using a certain theme, where the theme itself specifies table colors and such. Or it might be rendered as a table in a Word document. The output format does not matter: the table is just a table.
Currently the spot for my favorite editor goes to Obsidian. Beautiful, sleek, with a simple folder structure with plaintext files for storage.
Zotero integration
All my reference handling is done in the open-source Zotero reference manager. Zotero integrates well with the Markdown workflow. In Obsidian, I use the Citations plugin to synchronize with my Zotero library. The plugin can watch a .bib file, which is generated and automatically updated with added references from Zotero via the Better BibTeX plugin.
The .bib file is another plaintext file in the BibTeX format. An entry can look something like the following:
@article{paulhan1887simultaneite,
title = {La Simultan\'eit\'e Des Actes Psychiques},
author = {Paulhan, Fr},
year = 1887,
journal = {Revue Scientifique},
volume = {13},
pages = {684--689}
}
..where you have a citekey (@paulhan1887simultaneite in this case), along with the metadata about the article.
To insert a citation in the text, the inline citation would look like:
the work of Paulhan [@paulhan1887simultaneite] questioned whether...
You can of course insert the citekeys manually. However, the Citations plugin makes this a lot easier. When writing an article, you can access your entire library via a hotkey (mine is set to ⌘ + ⌥ + C) and fuzzy match title, author, or citekey.
The citation format is the same as Pandoc citations and Quarto citations. This makes it handy for rendering anywhere.
Pandoc handles conversion to any format
The second stage of the separation of concerns, rendering content, depends on desired output format. No matter the output format, though, Pandoc will take the Markdown file and handle it. It will convert your document into HTML, LaTeX, docx, beamer, RTF, vimdoc, Haddock, and countless others. I mean look at the input-output diagram on the website. That’s a lot.
To convert your Markdown file into, for example, docx you would use a command as such:
pandoc input.md --output=document.docx --bibliography=references.bib --citeproc
or, for HTML:
pandoc input.md --output=document.html --bibliography=references.bib --citeproc
or, for producing a PDF via LaTeX (in this case using LuaLaTeX):
pandoc input.md \
--output=document.pdf \
--bibliography=references.bib \
--citeproc \
--pdf-engine=lualatex
To make it more convenient, you can specify all of these options in a preamble to your Markdown document. For example, I am using a Templater plugin in Obsidian which allows for adding pre-specified elements to any new file. I use it to add a preamble for conversion to various document types I use.
In Obsidian, the preamble looks like:
---
title: "My Document"
output:
latex:
output: document.tex
pdf-engine: lualatex
bibliography: references.bib
docx:
output: document.docx
bibliography: references.bib
citeproc: true
---
This can then easily be processed using a simple tool like Panrun with the command:
panrun input.md
or, for only converting to one specific type:
panrun input.md -t docx
This will process the preamble and run the appropriate pandoc command with specified options.
Quarto uses the same principle. In Quarto, the preamble would look like:
---
title: "My Document"
bibliography: references.bib
---
Add a references heading at the end of the document and processed references will be placed there.
# References
To select citation style, point to a Citation Style Language (CSL) file in the preamble. For example, using APA formatting:
---
title: "My Document"
output:
latex:
output: document.tex
pdf-engine: lualatex
bibliography: references.bib
csl: apa.csl
---
Other useful features
Acronyms
When authoring long documents using abbreviations, it can be tricky to keep track of where an abbreviations was first introduced, and take care to not re-introduce them. This can be solved by packages like pandoc-acro or pandoc-acronyms, which are based on Pandoc filters.
In pandoc-acro, an acronym is prepended with a plus sign (+), and an acronym definition list is defined in the document preamble.
Text:
Studies using +PET, +SPECT, and +EEG have found evidence of ...
Specifically, an +EEG study found that ...
Preamble:
---
title: "My Document"
output:
latex:
output: document.tex
pdf-engine: lualatex
bibliography: references.bib
csl: apa.csl
filter:
- pandoc-acro
acronyms:
PET:
short: PET
long: positron emission tomography
SPECT:
short: SPECT
long: single-photon emission computed tomography
EEG:
short: EEG
long: electroencephalography
---
This will produce the following text:
Studies using positron emission tomography (PET), single-photon emission computed tomography (SPECT), and electroencephalography (EEG) have found evidence of ...
Specifically, an EEG study found that ...
No need to keep track of where an abbreviation is used. When converting a file to .tex format, appropriate LaTeX acronym commands will be created, such as \ac{EEG}.
Filters
Pandoc filters can be a powerful tool. For example, I have image assets inside my Obsidian storage folder under img/, but if I’m working in a LaTeX template with predefined asset paths, I would need to replace these paths to, for example, includes/figures. A custom filter can process the document and do this automatically. For example:
function Image(img)
local new, n = img.src:gsub("^%.%.%/img/", "includes/figures/")
if n > 0 then
img.src = new
end
return img
end
This can be saved to a .lua file, for example custom_filter.lua and be added like any other filter.
filter:
- pandoc-acro
- custom_filter
Diagrams
To produce simple diagrams, there is no need to even leave the Markdown environment. Many Markdown environments support Mermaid, which is a language for producing graphs and diagrams.
quadrantChart
title Interference effects
x-axis Motor
y-axis Cognitive
quadrant-1 Mutual facilitation
quadrant-2 Motor priority
quadrant-3 Mutual interference
quadrant-4 Cognitive priority
P1: [0.31, 0.68]
P2: [0.33, 0.11]
P3: [0.48, 0.19]
P4: [0.87, 0.65]
P5: [0.31, 0.14]
P6: [0.30, 0.75]
P7: [0.68, 0.09]
In any rendering software that supports Mermaid, the resulting chart would look something like:
Enjoy Reading This Article?
Here are some more articles you might like to read next: