A beginner’s guide to conducting reproducible research

This is a rather good read for anyone interested in reproducible research (i.e., everyone who tries to become a researcher as a career)

TLDR:

Why reproducible research

  1.  reproducible research helps researchers remember how and why they performed specific analyses during the course of a project.
  2.  reproducible research enables researchers to quickly and simply modify analyses and figures.
  3.  reproducible research enables quick reconfiguration of previously conducted research.
  4.  conducting reproducible research is a strong indicator to fellow researchers of rigor, trustworthiness, and transparency in scientific research.
  5.  reproducible research increases paper citation rates (Piwowar et al. 2007, McKiernan et al. 2016) and allows other researchers to cite code and data in addition to publications.

A three-step framework for conducting reproducible research

Before data analysis: data storage and organization

  1. data should be backed up at every stage of the research process and stored in multiple locations.
  2. Digital data files should be stored in useful, flexible, portable, nonproprietary formats.
  3. It is often useful to transform data into a “tidy” format (Wickham 2014) when cleaning up and standardizing raw data.
  4. Metadata explaining what was done to clean up the data and what each of the variables means should be stored along with the data.
  5. Finally, researchers should organize files in a sensible, user-friendly structure and make sure that all files have informative names.
  6. Throughout the research process, from data acquisition to publication, version control can be used to record a project’s history and provide a log of changes that have occurred over the life of a project or research group.

During analysis: best coding practices

  1. When possible, all data wrangling and analysis should be performed using coding scripts—as opposed to using interactive or point-and-click tools—so that every step is documented and repeatable by yourself and others.
  2. Analytical code should be thoroughly annotated with comments.
  3. Following a clean, consistent coding style makes code easier to read.
  4. There are several ways to prevent coding mistakes and make code easier to use.
    1. First, researchers should automate repetitive tasks.
    2. Similarly, researchers can use loops to make code more efficient by performing the same task on multiple values or objects in series
    3. A third way to reduce mistakes is to reduce the number of hard-coded values that must be changed to replicate analyses on an updated or new data set.
  5.  create a software container, such as a Docker (Merkel 2014) or Singularity (Kurtzer et al. 2017) image (Table 1) for ensuring that analyses can be used in the future

After data analysis: finalizing results and sharing

  1. produce tables and figures directly from code than to manipulate these using Adobe Illustrator, Microsoft PowerPoint, or other image editing programs. (comment: for example, can use csvsimple package in latex)
  2. make data wrangling, analysis, and creation of figures, tables, and manuscripts a “one-button” process using GNU Make (https://www.gnu.org/software/make/).
  3. To increase access to publications, authors can post preprints of final (but preacceptance) versions of manuscripts on a preprint server, or postprints of manuscripts on postprint servers.
  4. Data archiving in online general purpose repositories such as Dryad, Zenodo, and Figshare

Useful Links

Copyright OU-Tulsa Lab of Image and Information Processing 2024
Tech Nerd theme designed by Siteturner