What should I do with my intermediate files?
Reproduce them. This is especially important when those intermediate files cannot be browsed easily, i.e.,
.bin files. A reproducible workflow starts from raw data as much as possible and demonstrates each and every step towards generating results. If you would like to save your readers runtime on a future reproduction, you can either:
- save or copy intermediate files to your results folder -- perhaps in a subfolder labeled
intermediate_files-- and provide instructions for how and where to properly place them for future runs;
- upload intermediate files to
/dataand give users the option (via a flag/parameter) of loading them if they wish to execute only specific steps rather than the entire pipeline.
Downloading data/dependencies or using an API during runtime?
Just say no. Instead, take advantage of Code Ocean's package management system for dependencies. For anything else, the postInstall script can be used to download files and process them, and the output is baked into your Docker image. This both reduces runtime for users and ensures reproducibility (that data/model/API might not be available at the same URL in 10 years; with Code Ocean capsules, everything is archived).
Guaranteeing that random number generation leads to identical results between runs?
Set a random seed. This ensures consistency of results between runs and therefore supports reproducibility.
Will my interactive results work indefinitely?