All capsules published on Code Ocean are verified to be computationally reproducible.
In practice, this means that Code Ocean staff check everything that is submitted for publication on the platform. In rough order of importance, here is what we look for.
Code Ocean's computational reproducibility checklist:
1. Save all concrete results to the /results folder.
Because Code Ocean runs headlessly by default (that is, behind-the-scenes without user input during runtime), files should be saved explicitly. If your result is a computation printed to the console, it will be saved in a filed called 'output'.
2. Write a master script to reproduce your analysis as completely as possible.
If you have five analysis scripts, your main script should run them all in sequence. If you have multiple datasets, your code should analyze them all by default (whenever possible). Because all published results are verified to be reproducible, there is no need for others to run your published code unless they are making modifications or extensions -- so long run times are not an issue.
note: you can manually modify the
runscript to run multiple scripts, and we advise taking out all comments to the effect of
# The previous version of this file was commented-out and follows below.
3. Install all libraries and dependencies via the environment screen, and not during runtime.
This is to guarantee long-term reproducibility. If you install things each and every time the code is run, we cannot guarantee that those commands will continue to execute successfully in the future. By contrast, packages installed via built-in package managers, and commands run via postInstall script, will be executed just once, when the environment is built, and then have their results cached into the environment. For published capsules, this means that the installed libraries are also part of the Docker image available for download via export capsule.
- Note: dependencies should generally be downloaded rather than uploaded to the /code folder; but if a dependency is no longer available online, this is at your discretion. If necessary, please clarify for readers in your documentation what is a dependency and what code is uniquely responsible for generating your results.
4. Upload all necessary data files to the
Data should not go in the
/code folder, both for clarity for the reader, and also because this simplifies the process of tracking in Git in Code Ocean. Data also should not be downloaded during runtime, as URLs, as well as download syntax, are prone to change over time.
note: if you feel strongly about having large files being somewhere other than
/data, please add them to your .gitignore.
5. Upload source code rather than compiled binaries, and then compile binaries during runtime.
Reproducibility implies inspectability, and whenever possible, readers should be able to verify the inner workings of your algorithm or analysis.
6. Provide sufficient metadata for widespread intelligibility, including
- a title related to that of your paper;
- any appropriate tags;
- a few lines from your abstract, or the entire thing, in the description pane;
- any information about an associated publication;
- all affiliations for authors (use 'N/A' if none is available);
- If you wish to change the default licenses (MIT for code, CC0 for data), new licenses;
- a representative image (hover over the language symbol in metadata -> 'Upload Image').
If you have any questions about this process, we'd be happy to hear from you at email@example.com.