Continuous Integration for Your Private R Projects with CircleCI

Estimated time:
time
min

If you have ever developed or used an open-source R package, you’re likely familiar with Continuous Integration. By automating the process of testing each proposed change in the source code, you can reduce the risk of errors, avoid unnecessary overhead and increase the quality of developed solution. For data scientists, Hadley has a good explanation of why it’s good to <a href="https://r-pkgs.org/r-cmd-check.html">automate code checks in R</a>. The most popular CI solution in the R world is TravisCI. Overall it works great, has built-in community support for R and is free for any open source project. CircleCI offers a great alternative with a free plan that includes private repositories. This is a perfect solution if you’re building a package that cannot be released publicly and you don’t have a paid Travis account. This post will quickly take you through setting up Continuous Integration for your private R project with CircleCI. Even if you are not developing an R package, but simply working on a data science project for a client, you can still use the same approach to run your unit tests and linter each time you push a commit to repo. At Appsilon, we strive to have tests and Continuous Integration for every project to ensure we catch all errors as quickly as possible. Obviously, even source code for most commercial projects has to be private (not mentioning training data, which we don’t send to CI anyway). CircleCI was our first choice for continuous integration, and we were not disappointed. We even started using it for our <a href="https://shiny.tools/">open source packages</a>. If you’re interested in these packages, go to the latest posts about <a href="https://appsilon.com/shiny-semantic-0-4-0-update/">shiny.semantic</a> and <a href="https://appsilon.com/shiny-router-020/">shiny.router</a>. <h2 id="continuous-integration-for-all-your-private-r-projects">Continuous Integration for R projects</h2> Our goal will be to set up Continuous Integration for a private R project hosted on GitHub. We’ll be assuming that it is a package and that we already have some unit tests. We want CircleCI to install all the necessary dependencies and run all tests in an isolated environment every time we push a commit to the repository. We’ll be using the free CircleCI plan for that. You can always graduate to a paid plan on CircleCI once your project needs that. <span class="marginnote">You can also use CircleCI with Bitbucket, but not yet with Gitlab.</span> You can use virtually the same steps if the project is not an R package, you’ll just need to adjust how the project is tested. <h2 id="configuring-project-for-circleci">Configuring project for CircleCI</h2> For CircleCI to know how to handle our project, we need to add a configuration file to the project repo. There are two versions of CircleCI API you can use for that, and we’ll be using the newer Version 2. CircleCI supports running tests in <a href="https://www.docker.com/" target="_blank" rel="noopener noreferrer">Docker</a> images, which is great for cleanly managing system-level dependencies. We’ll be using a Docker image that contains all system-level dependencies of our project plus everything that we need for building it. If you don’t need any additional system libraries, you can use the image built by us. Its name is <code class="highlighter-rouge">appsilon/ci-base:1.0</code>. You will need to modify the Docker image if your package or its dependencies require some libraries to be installed in the system. We will show you how to do that at the end of this article. All R packages that your package needs will be installed automatically based on the package’s <code class="highlighter-rouge">DESCRIPTION</code> - no need to add them to the Docker image. Let’s put the following contents in <code class="highlighter-rouge">.circleci/config.yml</code>: <div class="highlighter"> <pre class=" language-r"><code class=" language-r" data-lang="r">version: 2 jobs:  build:    working_directory: ~/main    docker:      - image: appsilon/ci-base:1.0    steps:      - checkout      - run:          command: |            R -e 'devtools::install_deps(dependencies = TRUE)'      - run:          command: |            R -e 'devtools::check()'      - store_artifacts:          path: man/          destination: man </code></pre> </div> We start by specifying the CircleCI API version and working directory. Then we declare containers to run tests on by providing a Docker image. You can add additional containers here, e.g. with a database if you have tests that need to talk to external resources (of course these would not be unit tests, but higher levels of <a href="https://martinfowler.com/bliki/TestPyramid.html" target="_blank" rel="noopener noreferrer">test pyramid</a> are also valuable.) Next, we define steps that should be run each time we’re running tests. We check out code from the repo, install package dependencies (as defined in <code class="highlighter-rouge">DESCRIPTION</code>) using <em>devtools</em>, and then perform a <code class="highlighter-rouge">devtools::check</code> of the package. This is all we need to do because check runs all of the tests. <span class="marginnote">If your project is not a package, this will look similar, but you will need to change how dependencies are installed and how tests are run (perhaps this is a topic for another short article). </span> Finally, we store generated files in <code class="highlighter-rouge">man/</code> as artifacts to be able to download them later. <h3 id="caching-r-packages-library">Caching R packages library</h3> The above setup is all that we need. If package check or any other step fails, CircleCI will automatically report this as an error. However, to save our time (and computing power), it’s worth doing one more thing: caching installed R packages. This way we won’t waste time installing all packages from scratch each time tests run. To cache the R packages library, let’s add a <code class="highlighter-rouge">restore_cache</code> and <code class="highlighter-rouge">save_cache</code> steps around dependencies installation: <div class="highlighter"> <pre class=" language-r"><code class=" language-r" data-lang="r">version: 2 jobs:  build:    working_directory: ~/main    docker:      - image: appsilon/ci-base:1.0    steps:      - checkout      - restore_cache:          keys:            - deps1-{{ .Branch }}-{{ checksum "DESCRIPTION" }}            - deps1-{{ .Branch }}            - deps1-      - run:          command: |            R -e 'devtools::install_deps(dependencies = TRUE)'      - save_cache:          key: deps1-{{ .Branch }}-{{ checksum "DESCRIPTION" }}          paths:            - "/usr/local/lib/R/site-library"      - run:          command: |            R -e 'devtools::check()'      - store_artifacts:          path: man/          destination: man </code></pre> </div> You can read more about caching in <a href="https://circleci.com/docs/2.0/caching/" target="_blank" rel="noopener noreferrer">CircleCI docs</a>. I recommend experimenting with cache keys to best fit your scenario. <h2 id="create-project-on-circleci">Create project on CircleCI</h2> Our project is ready! The final step is to go <a href="https://circleci.com" target="_blank" rel="noopener noreferrer">circleci.com</a>, log in with your Github account, go to Projects and choose “Setup project” for your project’s repo. This will configure all needed keys on Github and start monitoring changes, so you need to be an admin of that repo to do that. By default, CircleCI will build all new commits, whether they are in a PR or not. You can customize that if you need to. One setting that I particularly recommend is to cancel running builds of a branch if new commits are pushed to this branch. This can save quite some time if you’re iterating quickly on a branch. <h2 id="using-custom-docker-image-for-running-tests">Use custom Docker image for running tests</h2> Preparing a Docker image to run our tests is simple as we can use the popular rocker images as a base. Here’s the Dockerfile we used for building <code class="highlighter-rouge">appsilon/ci-base:1.0</code>: <div class="highlighter"> <pre class=" language-r"><code class=" language-r" data-lang="r">FROM r-base:3.4.1 <br>RUN apt-get update  \  &amp;&amp; apt-get install git libssl-dev ssh texlive-latex-base texlive-fonts-recommended libcurl4-openssl-dev libxml2-dev -y \  &amp;&amp; rm -rf /var/lib/apt/lists/* <br>RUN R -e "install.packages(c('devtools', 'roxygen2'), repos='http://cran.us.r-project.org')" </code></pre> </div> As you can see, on top <code class="highlighter-rouge">r-base</code> image, we install several libraries that are required to build an R package. If you need additional libraries for your package, you can add them here as well. We also install two R packages - <code class="highlighter-rouge">devtools</code> and <code class="highlighter-rouge">roxygen2</code> that are not dependencies of our package, but we need them to build it. To use your image on CircleCI, you’ll need to build it and push to Dockerhub, which is where CircleCI looks for images. Just run these commands in your <code class="highlighter-rouge">Dockerfile</code> directory, replacing <code class="highlighter-rouge">appsilon</code> with your Dockerhub account (you’ll need to <code class="highlighter-rouge">docker login</code> if you haven’t done this before): <div class="highlighter"> <pre class=" language-r"><code class=" language-r" data-lang="r">docker build -t appsilon/ci-base:1.0 . docker push appsilon/ci-base:1.0 </code></pre> </div> <h2 id="seeing-the-results">Seeing the results</h2> Each time you push a new commit, CircleCI will check the package and report the results back to Github. They will be shown in commits and related PRs. You can also add this single line in project’s README.md to display a badge with build status of main branch: <div class="highlighter"> <pre class=" language-r"><code class=" language-r" data-lang="r">[![CircleCI](https://circleci.com/gh/Appsilon/ci.example.svg?style=svg)](https://circleci.com/gh/Appsilon/ci.example) </code></pre> </div> I hope this is helpful! Also, you can find a sample project with all configuration (including code linter) on <a href="https://github.com/Appsilon/ci.example" target="_blank" rel="noopener noreferrer">GitHub</a>.

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
r
tutorial
infrastructure