A template for data analysis projects structured as R packages
Rmarkdown documents are great to keep reproducible scientific workflows: tightly integrating code, results and text. Yet, once we are dealing with more complicated data analysis and writing custom code and functions for a project, structuring it as an R package can bring many advantages (e.g. see here and here, or read Marwick et al., but see also here for counterpoints).
Hence this package works as a template for new research or data analysis projects, with the idea of having everything (data, R scripts, functions, and manuscript reporting results) self-contained in the same package (a “research compendium”) to facilitate collaboration and promote reproducibility.
A short presentation introducing this approach on ‘Structuring data analysis projects as R packages’ is available here: https://doi.org/10.6084/m9.figshare.12479984.v1
# install.packages("remotes") remotes::install_github("Pakillo/template")
First, load the package:
Now run the function
new_projectto create a directory with all the scaffolding (slightly modified from R package structure). For example, to start a new project about tree growth, just use:
If you want to create a GitHub repository for the project at the same time, use instead:
new_project("treegrowth", github = TRUE, private.repo = FALSE)
You could choose either public or private repository. Note that to create a GitHub repo you will need to have configured your system as explained in https://usethis.r-lib.org/articles/articles/usethis-setup.html.
There are other options you could choose, like setting up
testthator continuous integration (Travis-CI, GitHub actions…). See
?new_projectfor all options.
DESCRIPTIONfile with some basic information about your project: title, brief description, licence, package dependencies, etc.
Place original (raw) data in
data-rawfolder. Save all R scripts (or Rmarkdown documents) used for data preparation in the same folder.
Save final (clean, tidy) datasets in the
datafolder. You may write documentation for these data.
R scripts or Rmarkdown documents used for data analyses may be placed at the
analysesfolder. The final manuscript/report may be placed at the
manuscriptfolder. You could use one of the many Rmarkdown templates available out there (e.g. rticles, rrtools or rmdTemplates).
If you write custom functions, place them in the
Rfolder. Document all your functions with
Roxygen. Write tests for your functions and place them in the
If your analysis uses functions from other CRAN packages, include these as dependencies (
Imports) in the
DESCRIPTIONfile (e.g. using
rrtools::add_dependencies_to_description(). Also, use
@importFromin your function definitions, or alternatively
package::function(), to import these dependencies in the namespace.
targetsto manage your project workflow. A simpler alternative might be writing a
makefileor master script to organise and execute all parts of the analysis. A template makefile is included with this package (use
makefile = TRUEwhen calling
Render Rmarkdown reports using
rmarkdown::render, and use Rstudio
Buildmenu to create/update documentation, run tests, build package, etc.
Record the exact dependencies of your project. One option is simply running
sessionInfo()but many more sophisticated alternatives exist. For example,
renv::snapshot()will create a file recording the exact versions of all packages used, which can be used to recreate such environment in the future or in another computer. If you want to use Docker, you could use e.g.
Archive your repository (e.g. in Zenodo), get a DOI, and include citation information in your README.