How to diagnose and remedy R installation issues:

First, one general piece of advice is to add a MRAN snapshot from a day when you know that a package installed successfully. This is fairly simple on Code Ocean

If you are still getting installation issues, you may need to do some digging into required system level dependencies, typically available through APT, the 'Advanced Package Tool,' which is available on Code Ocean as the apt-get package manager in each capsule's environment

The rest of this article walks through that process with reference to geojsonio, a CRAN package that converts "data to 'GeoJSON' or 'TopoJSON' from various R classes, including vectors, lists, data frames, shape files, and spatial classes."

Interpreting error messages:

  • Let's say you start a new capsule, select a base environment of R 3.6 (the underlying operating system for which is Ubuntu:18.04), and try to install geojsonio. Your build will fail with the following message:
configure: error: gdal-config not found or not executable.
ERROR: configuration failed for package ‘sf’
* removing ‘/usr/local/lib/R/site-library/sf’
Error in i.p(...) :
  (converted from warning) installation of one or more packages failed,
  probably ‘jqr’, ‘protolite’, ‘rgdal’, ‘rgeos’, ‘V8’, ‘geojson’, ‘sf’
Calls: <Anonymous> ... with_rprofile_user -> with_envvar -> force -> force -> i.p
Execution halted
  • Figuring out how to parse this for the information that you need can take some practice. For starters, you might try entering the first line into a search engine.
  • Doing so will lead to this Stack Overflow post suggesting that you need an apt-get package called libgdal-dev, which is a Geospatial Data Abstraction Library.
  • Add that to apt-get and re-run, and you'll get a different, more verbose error:
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because  was not found. Try installing:
 * deb: libv8-dev or libnode-dev (Debian / Ubuntu)
 * rpm: v8-devel (Fedora, EPEL)
 * brew: v8 (OSX)
 * csw: libv8_dev (Solaris)
To use a custom libv8, set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘V8’
* removing ‘/usr/local/lib/R/site-library/V8’
cat: geojson.out: No such file or directory
Error in i.p(...) :
  (converted from warning) installation of one or more packages failed,
  probably ‘jqr’, ‘protolite’, ‘V8’, ‘geojson’
Calls: <Anonymous> ... with_rprofile_user -> with_envvar -> force -> force -> i.p
  • Note: "ANTICONF ERROR" is your friend. This class of errors typically tells you the name of the package you need. Because each base environment tells you which Linux variant is running (in this example, Ubuntu:18,04), you now know that you need libv8-dev or libnode-dev. (If you search for these packages, you'll discover that libnode-dev is not avaialble for Ubuntu:18.04, so you need libv8-dev).
  • When you add libv8-dev and try to reinstall, you'll get another ANTICONF ERROR  telling you to add libprotobuf-dev.
  • rinse and repeat -> ANTICONF ERROR about libjq-dev;
  • rinse and repeat -> Please install the 'protobuf-compiler' package for your system (the package is actually called protobuf-compiler).

And that should do it. So the needed apt-get packages, on top of those already installed into the R 3.6 environment, are libgdal-dev libjq-dev libprotobuf-dev libv8-dev protobuf-compiler.

 Wow, that was horrible! can you help me avoid this pain?

Yes, we are really good at dealing with such things. If you let us know that you are having installation issues via live chat or an email to support@codeocean.com, we will be glad to help.

 Why was this so hard?

A variety of interrelated reasons.

  • Code Ocean is based on Linux containers and capsules typically start from an Ubuntu:16.04 or Ubuntu:18.04 operating system.
  • R packages are often not written entirely in R, but in low-level, compiled languages, most typically C++ and Fortran, for speed;
  • On Mac or Windows, CRAN offers ‘precompiled’ libraries, which means that any low-level C/C++/Fortran source code bundled with the package has already been built into an executable (i.e. translated into machine code) appropriate for that platform -- so these packages tend to download and install quickly.
  • But on Linux, and therefore generally in any container-based platform, it ain’t so, because the Linux ecosystem is comprised not only of different versions (like Mac OS El Capitan vs. Catalina), but also many distributions which aren't fully compatible with one another.  For example, if you compile C code on an Ubuntu Linux distribution, and share that compiled executable with a friend who uses Fedora, there’s a reasonable chance that you and your friend have different C compilers, which, for a bunch of complicated reasons, means that when they try to load and use your package, it will not work. This makes creating binaries for all the possible permutations of distributions and versions a daunting task.
  • So R packages for Linux typically distribute the source code, which then needs to be compiled during runtime on your system, which means that you need all the necessary compilers, headers, libraries, etc. to be already installed.
  • R libraries may also bring in other R dependencies. On a fresh R instance with geojsonio  installed, if you run library(geojsonio); sessionInfo() , you'll see, inter alia:
other attached packages:
[1] geojsonio_0.7.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2         magrittr_1.5       maptools_0.9-5     units_0.6-4      
 [5] lattice_0.20-38    R6_2.4.0           httr_1.4.1         tools_3.6.0      
 [9] rgdal_1.4-4        parallel_3.6.0     grid_3.6.0         geojson_0.3.2    
[13] KernSmooth_2.23-15 e1071_1.7-2        DBI_1.0.0          jqr_1.1.0        
[17] rgeos_0.5-1        class_7.3-15       lazyeval_0.2.2     sf_0.7-7          
[21] curl_4.0           sp_1.3-1           V8_2.3             compiler_3.6.0    
[25] classInt_0.4-1     jsonlite_1.6       foreign_0.8-70    
  • This means that geojsonio  relies on functions from 27 separate packages, each of which needs to be installed, and each of which may depend on other R packages, and system-level dependencies, and so on. As the authors of Tinyverse phrase it, "[e]very dependency you add to your project is an invitation to break your project."

What can Code Ocean, as a platform, do to make this easier? 

One option we are considering is to provide a lot of apt-get packages by default. Rocker's geospatial environment, for instance, contains 24 apt-get packages beyond those available in its default R 3.6 image, which means that more/most geospatial packages will install without a problem. 

On the other hand, environments with more things installed tend to be slower to run and execute, and therefore less portable; and there is value in having each capsule contain nothing but the necessary and sufficient software for reproducing an academic result, so that others can reconstruct the environment easily. 

Point being, this question doesn't have an easy answer, and it is the kind of thing that Code Ocean's product team spends a lot of time thinking about. Multiply each problem in this category by the number of scientific languages and workflows Code Ocean supports, and you get a sense of the scope of the problem, and also of our ambitions. 

But, in the meantime, we devote a lot of resources to technical support, so if any questions come up, by all means, write to us and we'll be happy to help.

Did this answer your question?