packageVersion("knitr")
[1] '1.47'
Communicating research findings
In this recipe, I cover the tools and strategies for sharing research findings with the public and peers. We will begin assuming we are using Quarto websites as the primary tool for sharing research findings in both forums. From there, we will enter into some of the details of articles, presentations, and publishing research code and data.
R (or Python) research projects that take advantage of Quarto websites have access to a wide range of tools for sharing research. First, the entire research tool chain can be published as a website, which is a great way to share the research process. Second, the website can be used to share the research findings in particular formats including articles and presentations. Let’s focus in on these later formats and discuss strategies for setting up and formatting research articles and presentations.
We will assume the project directory structure in Snippet 1.
I will also assume the following Quarto configuration file _quarto.yml in Snippet 2.
Looking more closely at the directory structure in Snippet 1, let’s focus on the aspects that are shared between articles and presentations. You will notice that the reports/ directory contains a figures/ directory for saving figures, a tables/ directory for saving tables, and a references.bib file for saving references. These are shared resources that can be used in both articles and presentations. In the process directory, you can save tables, figures, and other resources that are generated during the research process that you believe will be useful in commmunicating the research findings. Then, when you create your presentations, you can include the same materials in either format with the same files. If changes are made to the figures or tables, they will be updated in both the article and the presentation(s).
Next, it is worth pointing out some important features that appear in the Snippet 2 configuration file. The execute-dir
specifies the root directory for all execution. That is the path to directories and files will be the same no matter from what file the code is executed. The render
option specifies the order in which the files are rendered. This is important for ensuring that the research process is executed in the correct order. The files that are executed and rendered for display appear in the website and the style
and contents
options specify the style and contents of the sidebar, respectively. Another key option is the freeze
option under execute
. This option specifies that only changed files will be rendered. This helps avoid re-rendering files that have not changed, which can be time-consuming and computationally expensive.
In the reports/ directory a file named article.qmd appears. This file, which can be named anything, will be the document in which we will draft the research article. This file is a standard Quarto document. However, we can take advantage of some options that we have not seen so far that adds functionality to the document.
In Snippet 3, we see an example of the YAML frontmatter for a Quarto article.
In addition to typical YAML frontmatter, we see a number of new times. Looking at the first three, we see that we can add author information, an abstract, and keywords. These are standard for articles and are used to provide information about the article to readers.
When rendered, the article header information will now contain this new information, as seen in Figure 1.
The next two items are the citation style and bibliography. These are used to create and format citations in the article. The citation style is a CSL file that specifies the citation style. You can find a database of various citation styles at the Zotero Style Repository. You can search for a style or by field. Once you find a style you like, you can download the CSL file and add it to your project. The bibliography is a BibTeX file that contains the references for the article. You can create this file (as mentioned before) in a reference manager like Zotero or Mendeley.
Now the citation
option is not for references that we have gathered. Rather, it is for generating a citation for the current article. This is useful if someone else would like to cite your article. When the article is rendered, the citation will appear at the bottom of the article, as seen in Figure 2.
There are two other features to mention. One is the format
option. Since the article is a Quarto document, it can be rendered in multiple formats. The html
option ensures that our article is rendered in HTML format as part of the website. However, in addition, we can add a pdf
option that will render the article in PDF format. Note that in Figure 1, the pdf
option has created an “Other formats” listing on the right side below the table of contents. Clicking this will open the PDF version of the article.
Although not employed in this example, it is also possible to use more extensive format changes with Quarto extensions. Currently, there are various extensions for different journals and publishing houses. For more information and examples, consult the documentation above.
In the reports/ directory we can also include presentations and associated slide decks. A popular web-based presentation framework is reveal.js. This framework is used in Quarto to create presentations. In Snippet 1, the slides/ directory contains a directory for each presentation and an index.qmd file within. The index.qmd file contains the presentation content, which we will see soon. To provide a listings page each presentation, the presentations.qmd file contains special YAML instructions to be a listings page.
Let’s first dive into the index.qmd file for a presentation and discuss some of the key features. In Snippet 4, we see a basic example of a Quarto presentation.
The YAML frontmatter for a Quarto presentation is similar to that of most Quarto documents. The title, date, and author are all included. The format
option specifies that the presentation will be rendered in reveal.js
format. When rendered, the presentation the slide deck will be interactive and can be navigated by the user. The slide deck will also be responsive and can be viewed on any device.
In Figure 3, we see an example of a Quarto presentation rendered in reveal.js
format. I will discuss some of the key features of the presentation, in the presentation itself.
For those who are interested in interacting with your work, it is key to prepare materials that can be reliably shared. This includes the research code and data, of course, but also the computational environment in which the research was conducted so that the research can be reproduced.
In the next section, we will discuss some of the strategies for sharing a reproducible computational environment, along with code and data. This will include version control with Git, pinned package versions with {renv}, containerization with Docker, and automation with GitHub Actions.
As seen in Figure 4, the computational environment for a research project includes various components. Let’s start with the inner components are work our way out.
As has been stressed throughout this text, version control is a key part of reproducible research. It allows you to track changes to your research code and data over time, the research compendium. This is important for ensuring that the research record is transparent. One of the most popular version control systems is Git. If you have interacted with the supplementary materials provided with this text (lessons, recipes, and labs), you are now familiar with working with Git (and GitHub) in day-to-day tasks. These tasks include:
These tasks are essential for collaborating with others on research projects and contributing to open source projects. However, if the goal is to produce research that is reproducible, it is important to be aware of some additional toos and strategies. In Git, these include:
We will not cover these in detail here, but they are important to be aware of. For more information, see the GitHub documentation.
Moving out a level, we have the software layer. This includes the R packages that you use in your project, the version of R that you use, and the system dependencies that are required for your project. Sharing your research project code and all does not guarantee that others will be able to reproduce your research –R packages versions are constantly being updated (and this can lead to ‘breaking changes’ that render your code inoperable). R too changes over time. Although less frequent when compared to packages. …
To ensure that your research is reproducible, it is important to share the software layer as well. This can be done with the {renv} package. {renv} allows you to create a snapshot of the R packages that you use in your research project. This snapshot can then be shared with others, ensuring that they have the same versions of the R packages that you used.
As you develop your research code, you will likely use a number of R packages. These packages are constantly being updated, which can lead to issues with reproducibility. To ensure that your research is reproducible, it is important to pin the versions of the R packages that you use. This can be done with the {renv} package. {renv} allows you to create a snapshot of the R packages that you use in your research project. This snapshot can then be shared with others, ensuring that they have the same versions of the R packages that you used.
As you develop your research code, you will be adding packages along the way. In most cases, it is not necessary at this step to identify the version of the package(s) that you are using. However, when you are ready to share your research code, it is important. For example, you can see the package version using the packageVersion()
function,.
Containerization is another key part of reproducible research. It allows you to create a self-contained environment for your research code and data. This environment can then be shared with others, ensuring that they have everything they need to reproduce your research. One popular containerization tool is Docker. Docker allows you to create a container that contains all of the dependencies for your research project. This container can then be shared with others, ensuring that they have everything they need to reproduce your research.
When creating a reproducible computing environment, it will be necessary to include system dependencies to ensure that the project is reproducible.
The pak
package provides a way to determine the dependencies for a package or set of packages on a given operating system (platform).
Dive deeper
When creating a reproducible computing environment, it will be necessary to include system dependencies to ensure that the project is reproducible. The pak
package provides a way to determine the dependencies for a package or set of packages on a given operating system (platform).
── Install scripts ──── Ubuntu NA
apt-get -y update
apt-get -y install pandoc libfontconfig1-dev libfreetype6-dev
── Packages and their system dependencies
knitr – pandoc
systemfonts – libfontconfig1-dev, libfreetype6-dev
These can be added to the Dockerfile for the project
In GitHub, these include:
We will have more to say about these tools and strategies later in this recipe.
TBD