At the moment, all capsules published on Code Ocean are verified to be computationally reproducible.
In practice, this means that Code Ocean staff run some checks on everything that is submitted for publication on the platform. In rough order of importance, here is what we check for.
Code Ocean's computational reproducibility checklist:
First, save all concrete results to the /results folder. Because Code Ocean runs headlessly by default, files should be saved explicitly. If your result is a computation printed to the console, it will be saved in a filed called 'output'.
Second, write a master script to reproduce your analysis as completely as possible. If you have five analysis scripts, your main script should run them all in sequence. If you have multiple datasets, your code should analyze them all by default (whenever possible). Because all published results are verified to be reproducible, there is no need for others to run your published code unless they are making modifications or extensions -- so long run times are not an issue.
Third, install all libraries and dependencies via the environment screen, and not during runtime. This is to guarantee long-term reproducibility. If you install things each and every time the code is run, we cannot guarantee that those commands will continue to execute successfully in the future. By contrast, packages installed via built-in package managers, and commands run via postInstall script, will be executed just once, when the environment is built, and then have their results cached into the environment. For published capsules, this means that the installed libraries are also part of the Docker image available for download via export capsule.
- Note: dependencies should generally be downloaded rather than uploaded to the /code pane; but if a dependency is no longer available online, this is at your discretion. If necessary, please clarify for readers in your documentation what is a dependency and what code is uniquely responsible for generating your results.
Fourth, upload all necessary data to the
/data pane. Data should not go in the
/code pane, both for clarity for the reader, and also because when Code Ocean integrates version tracking, this will simplify the process of knowing what to track and how. Data also should not be downloaded during runtime, as URLs, as well as download syntax, are prone to change over time.
Fifth, upload source code rather than compiled binaries, and then compile binaries during runtime. Reproducibility implies inspectability, and whenever possible, readers should be able to verify the inner workings of your algorithm or analysis.
Sixth, provide sufficient metadata for widespread intelligibility, including
- a title related to that of your paper;
- any appropriate tags;
- a few lines from your abstract, or the entire thing, in the description pane;
- any information about an associated publication;
- all affiliations for authors (use 'N/A' if none is available);
- If you wish to change the default licenses (MIT for code, CC0 for data), new licenses;
- a representative image (hover over the language symbol in metadata -> 'Upload Image').
If you have any questions about this process, we'd be happy to hear from you at email@example.com.