Data to manuscript with Quarto

Data to article

I am a big fan of transparency when it comes academic writing, even though I don’t always fully achieve it myself.

With modern tooling, it is (in my opinion) possible to go from raw data to finished manuscript in one single file. Enter Quarto. This is an open-source interactive development environment (similar to Jupyter notebooks) and publishing system, supporting both Python and R, as well as a few other languages.

You author your manuscript in markdown blocks, add code blocks in whatever flavor of language you prefer, and generate any table or figure right there in your Quarto file from raw data. It is also build on the extremely powerful Pandoc, a central tool in my academic workflows.

Not only for manuscripts

Even if you don’t write your full manuscript in a Quarto document, this is a useful tool for something like open lab notebooks when performing various types of secondary data analysis where you want to increase transparency and support the severity of your statistical tests. For example, I write an open lab notebook for one of my fNIRS studies in Quarto, then render the notebook as HTML using Quarto’s built-in rendering functions, and host the resulting webpage on GitHub pages, available for anyone to read. You can see the Quarto file in the docs folder in the GitHub repository for the study.

Example

The following is from an example study where I present a simple demographics table generated with arsenal/tableby in R. You can read precisely which data and which statistical tests are used to generate the table, leaving absolutely zero room for any ambiguity. If there is any preprocessing of raw data, you can simply follow along in relevant code blocks. If you have open datasets, your entire manuscript will be 100% reproducible with the click of a button (although remember to set your seed when relying on randomness).

Collapsed code blocks

[Collapsed code blocks][]

[Collapsed code blocks]

Expanded code blocks

Of course, this workflow limits you to the languages supported by the Quarto environment. This has been an issue for me, since I have many MATLAB based data pipelines, which are obviously not part of any open source ecosystem. In that case, I just link the MATLAB script (hosted on GitHub) that generated the data, and continue from the output file generated by the MATLAB script.

Collaboration

How about getting input from your co-authors? Can you escape the ever-present Word?

Maybe. The most workable solution I have found is Hypothes.is, allowing highlighting and markup of any webpage, as well as private invite-only groups. Here’s an example of a comment on a bit of text, similar to adding a comment in Word:

Example hypothes.is annotation

Of course, this still requires your co-authors to register an account. I was considering comments in a GitHub pull request, perhaps an issues page, but haven’t really come across anything spot-on. If you have any better ideas, I would love to hear it.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Writing in Markdown with Obsidian, Pandoc and Zotero