I am very committed to the idea of the reproducibility. The way I understand the term is that there should be a close link between the results presented in the paper and the raw data. It happens all too often that some pre-processing step essential for the results presented in the paper is modified slightly during the preparation of the manuscript, but the figures, tables and statistics are not updated accordingly.
Let’s take a simple table. Tables contain usually some pre-processed data from the experiments or simulations. The standard workflow to produce a table is to import data from a database or a file into a word processor and then to format it by adding borders, meging cells etc. The problem arises when one wants to modify the data within the table (for example, change units, significance level or pre-processing parameters). In such a case a repetition of the complete workflow is necessary, which may be very time consuming. This is especially annoying when the manuscript is still in its early state or the reviewers ask for more analyses/experiments.
My solution to the problem is to use LaTeX templates which are filled-in with data from a CSV (comma seperated values) file, which is a simple and common text data format used by many applications. LaTeX files can be easily converted to HTML, PDF or even DOC (the last option is not 100% functional yet), which then can be copy-pasted into your document or attached as supplementary data.
You will need:
- LaTeX environment (for example TexLive or TeTeX for Linux/Mac or MikTex for Windows),
- Python >= 2.6 (earlier versions may work but I haven’t tested them),
- templating system: a Python module which can render a final LaTeX file from a template. In the example I use Django template system, but Cheetah or Jinja can be also used.
- CSV file: it can contain anything from your shopping list through simulations result to data from experiments.
The CSV file may look like that:
First,Last,Phone
Bartosz, Telenczuk, 12345
Joseph, Stuart, 345566
Maria, Curie-Sklodowska, 77777
King, John II, 88888
Here is how to generate a PDF from a LaTeX file from this (or similar) CSV file.
-
Create a LaTeX template.
Here you will need some working knowledge of LaTeX, but you can start with the following template:
\documentclass[11pt]{report} \begin{document} \begin{table} \centering \begin{tabular}{lcr} \hline {% for col in head %} \textbf{ {{col}} } {% if not forloop.last %} & {% endif %} {% endfor %} \\ \hline {% for row in table %} {% for cell in row %} {{cell}} {% if not forloop.last %} & {% endif %} {% endfor %} \\ {% endfor %} \hline \end{tabular} \caption{Simple Phonebook} \label{tab:phonebook} \end{table} \end{document}
If you are familiar with LaTeX you might have notice the strange commands inside {% ... %} brackets – these are Django template commands (for a comprehensive list see the
Django documentation).
The syntax is similar to (reduced) Python, so we have for loops which iterate over rows and columns and conditional statements to make sure that we do not have too many column separators (LaTeX does not like it). The head and table variables contain the data which is filled into the template. In order to define them we need a little bit of Python code. -
Import the CSV file and render the template to obtain a final LaTeX file. This is where the actual conversion from CSV to LaTeX occurs. In Python it is extremly simple (I may be biased here, though):
#!/usr/bin/env python #coding=utf-8 import django from django.template import Template, Context import csv if __name__ == "__main__": # This line is required for Django configuration django.conf.settings.configure() # Open and read CSV file fid = open("names.csv") reader = csv.reader(fid) # Open and read template with open("table_template.tex") as f: t = Template(f.read()) # Define context with the table data head = reader.next() c = Context({"head": head, "table": reader}) # Render template output = t.render(c) fid.close() # Write the output to a file with open("table.tex", 'w') as out_f: out_f.write(output)
-
Generate a PDF output from the generated LaTeX file. Once you have the final LaTeX file, you can use LaTeX system to generate output in a plenty of format. For example to obtain a PDF file, just call from a command line:
pdflatex table.tex
If everything went fine a PDF file should be created in your current directory. If not, something is probably wrong with your template or your LaTeX installation.
-
All of the above steps can be of course automated with a simple Makefile. This is an optional step for those of readers who are crazy about
reproducibility (like me!).all: table.pdf table.pdf : table.tex pdflatex $< table.tex : names.csv table_template.tex csv2latex.py python csv2latex.py
Now, whenever your data change, it is enough to call make to get a nicely formatted PDF table!
All of the source files and generated output are available for download.
Edit 1: There is also a tool which converts CSV to LaTeX (csv2latex) I haven’t tested it, but it seems that it does not offer the flexibility the templating system gives you, but as always everything comes at some cost (in case of the template system cost=dependencies).
Edit 2: Some readers suggested other tools for importing data into LaTeX documents. Two of popular LaTeX packages are: pgfplotstable and datatool. Thanks to all for comments!
Alex says
This is great, thanks a lot for posting it!
Simon says
I also like your approach. I’m thinking about a similar setup right now, because i have to generate many quite similar reports from am huge set of data. Tahnks for the hint with the makefile.
Tim Staley says
Great, just what I was looking for – thanks for posting!
Xiaolei says
Thanks a lot for your tips.
Xiaolei from Singapore
Gus says
Great tip, thanks. How would you go about using this to insert multiple tables in a single .tex file?
admin says
Hi Gus,
Thanks for the comment. You can easily add another table by copy-pasting the table environment. Then you have to replace head and table objects with something similar like tab2_head, table2. You will have to add the two variables to Contex in the python file, for example like this
c = Context({“head”: head, “tab2_head”: tab2_head, “table”: reader, “table2”: reader2})
Let me know if you have any questions.
fs says
pgfplotstable provides exactly what you want!
top says
Hi,
Thanks for this. It seems to work, but after trying it I discovered datatool — a LaTeX plugin that does all this and more. It seems to me to be a more powerful and generalised solution. Simpler too, because it is not necessary to pre-compile the LaTeX code.
Cheers
btel says
Thanks for the comment. I have not tried datatool, but it looks very promising. I guess that what you want to do is to import CSV data into a LaTeX document. My use case was slightly different: formatting and exporting table to PDF. However, you should use whatever fits better your needs.
rainbru says
Exactly what I searched. Save me a lot of time with CVS larger than 350 lines :)
Thanks for posting.
Mikhail says
It is easier to use R and (pgf)Sweave . If you use Lyx you could use knitr. There is an R package named Hmisc that contains a function named latex() for exporting various R objects. And reading CSV is just a matter of read.csv() .
admin says
Thanks for comment. I do not use R myself, so I can not comment on the packages you mention. For Python there is apparently a package similar to Sweave called Pweave, but I haven’t tried it either. In Python there are also easier ways to load CSV files (look at pandas or numpy libraries) than what I show in the post.
isaac says
I want import my table in Excel to Latex using the method described above, I pray I’ll get iit. Thanks.
qiongzhu says
Very useful example to make things reproducible
Ayayo says
Online tool:https://tableconvert.com/