11. Sharing research

Communicating research findings

contribute
research
revealjs
quarto
reproducible-research
github
docker
renv

In this recipe, I cover the tools and strategies for sharing research findings with the public and peers. We will begin assuming we are using Quarto websites as the primary tool for sharing research findings in both forums. From there, we will enter into some of the details of articles, presentations, and publishing research code and data.

Concepts and strategies

Public facing communication

R (or Python) research projects that take advantage of Quarto websites have access to a wide range of tools for sharing research. First, the entire research tool chain can be published as a website, which is a great way to share the research process. Second, the website can be used to share the research findings in particular formats including articles and presentations. Let’s focus in on these later formats and discuss strategies for setting up and formatting research articles and presentations.

We will assume the project directory structure in Snippet 1.

Snippet 1: Quarto website structure for reproducible research
project/
  ├── data/
     ├── analysis/
     ├── derived/
     └── original/
  ├── process/
     ├── 1_acquire.qmd
     ├── 2_curate.qmd
     ├── 3_transform.qmd
     └── 4_analyze.qmd
  ├── reports/
     ├── figures/
     ├── slides/
     │   ├── workshop/
     │   │    └── index.qmd
     │   └── conference/
     ├── tables/
     ├── article.qmd
     ├── citation-style.csl
     ├── presentations.qmd
     └── bibliography.bib
  ├── renv/
     └── ...
  ├── _quarto.yml
  ├── index.qmd
  ├── README.md
  └── renv.lock
1
Project root: The root directory for the project.
2
Data directory: The directory for storing data files.
3
Process directory: The directory for storing process files.
4
Reports directory: The directory for storing reports.
5
Figures directory: The directory for storing figures.
6
Slides directory: The directory for storing presentations.
7
Workshop directory: The directory for storing a “workshop” presentation.
8
Workshop presentation file: The file for the “workshop” presentation.
9
Tables directory: The directory for storing tables.
10
Article file: The file for the article.
11
Citation style file: The file for the citation style.
12
Presentations listing file: The file for the presentations listing.
13
References file: The file for the references.
14
Quarto configuration file: The file for the Quarto configuration.

I will also assume the following Quarto configuration file _quarto.yml in Snippet 2.

Snippet 2: Quarto configuration file for reproducible research
project:
  title: "Web project"
  type: website
  execute-dir: project
  render:
    - index.qmd
    - process/1_acquire.qmd
    - process/2_curate.qmd
    - process/3_transform.qmd
    - process/4_analyze.qmd
    - reports/

website:
  sidebar:
    style: "docked"
    contents:
      - index.qmd
      - section: "Process"
        contents: process/*
      - section: "Reports"
        contents: reports/*

format:
  html:
    theme: cosmo
    toc: true

execute:
  freeze: auto
1
Execution directory: The root directory for all execution.
2
Render order: The order in which files are rendered.
3
Sidebar style: The style of the sidebar.
4
Sidebar contents: The contents of the sidebar.
5
Process contents: The contents of the process section.
6
Reports contents: The contents of the reports section.
7
Freeze option: The option for rendering only changed files.

Looking more closely at the directory structure in Snippet 1, let’s focus on the aspects that are shared between articles and presentations. You will notice that the reports/ directory contains a figures/ directory for saving figures, a tables/ directory for saving tables, and a references.bib file for saving references. These are shared resources that can be used in both articles and presentations. In the process directory, you can save tables, figures, and other resources that are generated during the research process that you believe will be useful in commmunicating the research findings. Then, when you create your presentations, you can include the same materials in either format with the same files. If changes are made to the figures or tables, they will be updated in both the article and the presentation(s).

Next, it is worth pointing out some important features that appear in the Snippet 2 configuration file. The execute-dir specifies the root directory for all execution. That is the path to directories and files will be the same no matter from what file the code is executed. The render option specifies the order in which the files are rendered. This is important for ensuring that the research process is executed in the correct order. The files that are executed and rendered for display appear in the website and the style and contents options specify the style and contents of the sidebar, respectively. Another key option is the freeze option under execute. This option specifies that only changed files will be rendered. This helps avoid re-rendering files that have not changed, which can be time-consuming and computationally expensive.

Articles

In the reports/ directory a file named article.qmd appears. This file, which can be named anything, will be the document in which we will draft the research article. This file is a standard Quarto document. However, we can take advantage of some options that we have not seen so far that adds functionality to the document.

In Snippet 3, we see an example of the YAML frontmatter for a Quarto article.

Snippet 3: Quarto article YAML frontmatter
title: "Article"
date: 2024-02-20
author:
  - name: "Your name"
    email: youremail@school.edu
    affiliation: "Your affiliation"
abstract: |
  This is a sample article. It is a work in progress and will be updated as the research progresses.
keywords:
  - article
  - example
csl: citation-style.csl
bibliography: ../bibliography.bib
citation: true
format:
  html: default
  pdf:
    number-sections: true
1
Author information: The author information for the article.
2
Abstract: The abstract for the article.
3
Keywords: The keywords for the article.
4
Citation style: The citation style for the article.
5
Bibliography: The bibliography for the article.
6
Citation: The citation for the article itself.
7
Format: The format for the article.
8
HTML format: The HTML format for the article (for web presentation)
9
PDF format: The PDF format for the article (for printing)

In addition to typical YAML frontmatter, we see a number of new times. Looking at the first three, we see that we can add author information, an abstract, and keywords. These are standard for articles and are used to provide information about the article to readers.

When rendered, the article header information will now contain this new information, as seen in Figure 1.

Figure 1: Appearance of the header format for a Quarto article.

The next two items are the citation style and bibliography. These are used to create and format citations in the article. The citation style is a CSL file that specifies the citation style. You can find a database of various citation styles at the Zotero Style Repository. You can search for a style or by field. Once you find a style you like, you can download the CSL file and add it to your project. The bibliography is a BibTeX file that contains the references for the article. You can create this file (as mentioned before) in a reference manager like Zotero or Mendeley.

Now the citation option is not for references that we have gathered. Rather, it is for generating a citation for the current article. This is useful if someone else would like to cite your article. When the article is rendered, the citation will appear at the bottom of the article, as seen in Figure 2.

Figure 2: Appearance of the citation attribution in a Quarto article.

There are two other features to mention. One is the format option. Since the article is a Quarto document, it can be rendered in multiple formats. The html option ensures that our article is rendered in HTML format as part of the website. However, in addition, we can add a pdf option that will render the article in PDF format. Note that in Figure 1, the pdf option has created an “Other formats” listing on the right side below the table of contents. Clicking this will open the PDF version of the article.

Although not employed in this example, it is also possible to use more extensive format changes with Quarto extensions. Currently, there are various extensions for different journals and publishing houses. For more information and examples, consult the documentation above.

Presentations

In the reports/ directory we can also include presentations and associated slide decks. A popular web-based presentation framework is reveal.js. This framework is used in Quarto to create presentations. In Snippet 1, the slides/ directory contains a directory for each presentation and an index.qmd file within. The index.qmd file contains the presentation content, which we will see soon. To provide a listings page each presentation, the presentations.qmd file contains special YAML instructions to be a listings page.

Let’s first dive into the index.qmd file for a presentation and discuss some of the key features. In Snippet 4, we see a basic example of a Quarto presentation.

Snippet 4: Quarto presentation YAML frontmatter index.qmd
title: "Examle presentation"
date: 2024-02-20
author: "Jerid Francom"
format: revealjs

The YAML frontmatter for a Quarto presentation is similar to that of most Quarto documents. The title, date, and author are all included. The format option specifies that the presentation will be rendered in reveal.js format. When rendered, the presentation the slide deck will be interactive and can be navigated by the user. The slide deck will also be responsive and can be viewed on any device.

In Figure 3, we see an example of a Quarto presentation rendered in reveal.js format. I will discuss some of the key features of the presentation, in the presentation itself.

Figure 3

Peer facing communication

For those who are interested in interacting with your work, it is key to prepare materials that can be reliably shared. This includes the research code and data, of course, but also the computational environment in which the research was conducted so that the research can be reproduced.

Figure 4: Computational environment for reproducible research

In the next section, we will discuss some of the strategies for sharing a reproducible computational environment, along with code and data. This will include version control with Git, pinned package versions with {renv}, containerization with Docker, and automation with GitHub Actions.

Tools and strategies

As seen in Figure 4, the computational environment for a research project includes various components. Let’s start with the inner components are work our way out.

As has been stressed throughout this text, version control is a key part of reproducible research. It allows you to track changes to your research code and data over time, the research compendium. This is important for ensuring that the research record is transparent. One of the most popular version control systems is Git. If you have interacted with the supplementary materials provided with this text (lessons, recipes, and labs), you are now familiar with working with Git (and GitHub) in day-to-day tasks. These tasks include:

  • Cloning a repository
  • Making changes to a repository
  • Committing changes to a repository
  • Pushing changes to a repository
  • Pulling changes from a repository
  • Forking a repository

These tasks are essential for collaborating with others on research projects and contributing to open source projects. However, if the goal is to produce research that is reproducible, it is important to be aware of some additional toos and strategies. In Git, these include:

  • Branching to maintain different versions of the research code
  • Merging branches to combine different versions
  • Pull requests to review and merge changes, either from yourself or other contributors

We will not cover these in detail here, but they are important to be aware of. For more information, see the GitHub documentation.

Moving out a level, we have the software layer. This includes the R packages that you use in your project, the version of R that you use, and the system dependencies that are required for your project. Sharing your research project code and all does not guarantee that others will be able to reproduce your research –R packages versions are constantly being updated (and this can lead to ‘breaking changes’ that render your code inoperable). R too changes over time. Although less frequent when compared to packages. …

To ensure that your research is reproducible, it is important to share the software layer as well. This can be done with the {renv} package. {renv} allows you to create a snapshot of the R packages that you use in your research project. This snapshot can then be shared with others, ensuring that they have the same versions of the R packages that you used.

Pinned package versions

As you develop your research code, you will likely use a number of R packages. These packages are constantly being updated, which can lead to issues with reproducibility. To ensure that your research is reproducible, it is important to pin the versions of the R packages that you use. This can be done with the {renv} package. {renv} allows you to create a snapshot of the R packages that you use in your research project. This snapshot can then be shared with others, ensuring that they have the same versions of the R packages that you used.

As you develop your research code, you will be adding packages along the way. In most cases, it is not necessary at this step to identify the version of the package(s) that you are using. However, when you are ready to share your research code, it is important. For example, you can see the package version using the packageVersion() function,.

packageVersion("knitr")
[1] '1.47'
renv::snapshot()

Containerization

Containerization is another key part of reproducible research. It allows you to create a self-contained environment for your research code and data. This environment can then be shared with others, ensuring that they have everything they need to reproduce your research. One popular containerization tool is Docker. Docker allows you to create a container that contains all of the dependencies for your research project. This container can then be shared with others, ensuring that they have everything they need to reproduce your research.

When creating a reproducible computing environment, it will be necessary to include system dependencies to ensure that the project is reproducible.

The pak package provides a way to determine the dependencies for a package or set of packages on a given operating system (platform).

# Dockerfile
FROM rocker/r-ver:4.1.1
# ...

Dive deeper

When creating a reproducible computing environment, it will be necessary to include system dependencies to ensure that the project is reproducible. The pak package provides a way to determine the dependencies for a package or set of packages on a given operating system (platform).

pak::pkg_sysreqs(pkg = c("knitr", "systemfonts"), sysreqs = "ubuntu")
── Install scripts ──── Ubuntu NA
apt-get -y update
apt-get -y install pandoc libfontconfig1-dev libfreetype6-dev

── Packages and their system dependencies
knitr       – pandoc
systemfonts – libfontconfig1-dev, libfreetype6-dev

These can be added to the Dockerfile for the project

In GitHub, these include:

  • Using issues to track and respond to inquiries
  • GitHub Actions for automating tasks
  • GitHub Pages for hosting websites

We will have more to say about these tools and strategies later in this recipe.

Check your understanding

  1. Using Quarto websites for sharing research findings is the only way to meet the requirements of a reproducible research project.
  2. Using Snippet 2, what option specifies the order in which files are rendered?
  3. What is the name of the framework that Quarto uses to create presentations?
  4. TBD
  5. TBD
  6. TBD

Lab preparation

TBD