Skip to main content
All CollectionsReproducibility and PreservationReproducibility
Common reproducibility challenges & gotchas
Common reproducibility challenges & gotchas

Intermediate files, downloading during runtime, random number generation, and interactive results

Updated over 5 years ago

What should I do with my intermediate files?

Reproduce them. This is especially important when those intermediate files cannot be browsed easily, i.e., .bin  files. A reproducible workflow starts from raw data as much as possible and demonstrates each and every step towards generating results. If you would like to save your readers runtime on a future reproduction, you can either:

  •  save or copy intermediate files to your results folder -- perhaps in a subfolder labeled intermediate_files  --  and provide instructions for how and where to properly place them for future runs;

  • upload intermediate files to /data   and give users the option (via a flag/parameter) of loading them if they wish to execute only specific steps rather than the entire pipeline.

Downloading data/dependencies or using an API during runtime?

Just say no. Instead, take advantage of Code Ocean's package management system for dependencies. For anything else, the postInstall script can be used to download files and process them, and the output is baked into your Docker image. This both reduces runtime for users and ensures reproducibility (that data/model/API might not be available at the same URL in 10 years; with Code Ocean capsules, everything is archived).

Guaranteeing that random number generation leads to identical results between runs? 

Set a random seed. This ensures consistency of results between runs and therefore supports reproducibility.

Will my interactive results work indefinitely?

Maybe not; JavaScript, HTML, and browser standards change over time, which could alter or break rendering of interactive figures. Make sure to save a static copy of your results (e.g. PNG, PDF) as well. 

Did this answer your question?