This is a rather good read for anyone interested in reproducible research (i.e., everyone who tries to become a researcher as a career)
TLDR:
Why reproducible research
- reproducible research helps researchers remember how and why they performed specific analyses during the course of a project.
- reproducible research enables researchers to quickly and simply modify analyses and figures.
- reproducible research enables quick reconfiguration of previously conducted research.
- conducting reproducible research is a strong indicator to fellow researchers of rigor, trustworthiness, and transparency in scientific research.
- reproducible research increases paper citation rates (Piwowar et al. 2007, McKiernan et al. 2016) and allows other researchers to cite code and data in addition to publications.
A three-step framework for conducting reproducible research
Before data analysis: data storage and organization
- data should be backed up at every stage of the research process and stored in multiple locations.
- Digital data files should be stored in useful, flexible, portable, nonproprietary formats.
- It is often useful to transform data into a “tidy” format (Wickham 2014) when cleaning up and standardizing raw data.
- Metadata explaining what was done to clean up the data and what each of the variables means should be stored along with the data.
- Finally, researchers should organize files in a sensible, user-friendly structure and make sure that all files have informative names.
- Throughout the research process, from data acquisition to publication, version control can be used to record a project’s history and provide a log of changes that have occurred over the life of a project or research group.
During analysis: best coding practices
- When possible, all data wrangling and analysis should be performed using coding scripts—as opposed to using interactive or point-and-click tools—so that every step is documented and repeatable by yourself and others.
- Analytical code should be thoroughly annotated with comments.
- Following a clean, consistent coding style makes code easier to read.
- There are several ways to prevent coding mistakes and make code easier to use.
- First, researchers should automate repetitive tasks.
- Similarly, researchers can use loops to make code more efficient by performing the same task on multiple values or objects in series
- A third way to reduce mistakes is to reduce the number of hard-coded values that must be changed to replicate analyses on an updated or new data set.
- create a software container, such as a Docker (Merkel 2014) or Singularity (Kurtzer et al. 2017) image (Table 1) for ensuring that analyses can be used in the future
After data analysis: finalizing results and sharing
- produce tables and figures directly from code than to manipulate these using Adobe Illustrator, Microsoft PowerPoint, or other image editing programs. (comment: for example, can use csvsimple package in latex)
- make data wrangling, analysis, and creation of figures, tables, and manuscripts a “one-button” process using GNU Make (https://www.gnu.org/software/make/).
- To increase access to publications, authors can post preprints of final (but preacceptance) versions of manuscripts on a preprint server, or postprints of manuscripts on postprint servers.
- Data archiving in online general purpose repositories such as Dryad, Zenodo, and Figshare