Reproducible software can consistently run and produce the same result.
The most common challenges with making code reproducible are:
- Switching between projects is hard
- You can't get your code to work again/can't reproduce your work.
- Your code errors when somebody else runs it
- Other users don't know how to run your program
Address these challenges with the following tools and practices:
- MATLAB® Projects
- Git™ version control, Version Control with Git in MATLAB, Source Control panel
- Dependency Analyzer
- README files
Once your work in MATLAB outgrows a single folder, you have to consider the MATLAB path. If each of your projects is in its own single folder, managing the path can be as simple as changing your current folder to the project folder. But that runs out of steam quickly. When a project has code in multiple folders, you need to be sure those folders are on the path when you use the project. You likely don't want them on the path when you aren't using the project, to be sure nothing from the project accidentally shadows other work you are doing.
Projects organize your work into self-contained units. MATLAB adds project folders to the path when you open a project and removes them when you close it. Projects can also run code when they are opened or closed and keep track of dependencies. This makes it easy to switch between projects without worrying about path conflicts or manual setup. Projects offer a lot of additional capabilities you will discover if you create reproducible or production software, but project path is likely the feature you'll find most useful first.
Have you ever found yourself looking at last month's project and wondered "Should I use script_v3_final_final.m or script_v3_final2.m?"
As you update your code to explore new ideas or refine your analysis, it becomes difficult to remember what changed and when. You might copy files with names like 'script_v3_final_final.m' to preserve versions, but this quickly becomes messy and unreliable. It gets worse with a project that has several files that might get out of sync.
Git is a tool that records changes to your files over time. Tools like this are called Source Code Management (SCM) tools. Instead of hunting through folders with variations of the same program, you have a single source of truth. You can save snapshots (called commits), compare versions, and recover previous states. Make regular commits with clear, concise descriptions of your changes, to make it easier to revert your changes when needed. If you don't already, organize your code with a different top-level folder for each project. Each top-level folder corresponds to a single repository in git.
MATLAB integrates with Git, allowing you to use version control interactively and programmatically. The easiest way to use Git is via the MATLAB Projects interface and the Source Control panel.
We learned above how MATLAB Projects make it easier for you to switch between projects by setting up the project path and running startup code. This also makes it easier for others to run your code, since opening the project sets up their MATLAB similarly to your MATLAB.
When you share your code with somebody else, on average how many messages do you send back and forth before it works on the other person's machine? If this number is greater than 0, the most likely culprit is that or a) you forgot to include one or more files when you shared the code, or b) you used a MathWorks® Toolbox that the other person doesn't have installed.
The Dependency Analyzer is an interactive tool for visualizing and analyzing dependencies. It figures out which data, code files, and MathWorks products your code needs to run, so your collaborator or end user doesn't have to. The Dependency Analyzer works even better when your code is in a project. It analyzes all dependencies of your project at one time, flagging any code you call that isn't in your project or in one of your project's dependencies.
If your program involves multiple code files, it might not be obvious to another user how to run your program.
A README file is a simple text file with the extension .md in the top level folder of a Git repository. This file tells other people why your program is useful, what they can do with your program, and how they can use it. README files use Markdown syntax, which is an easy way to create nicely formatted documents with simple text files. Popular Git-hosting tools like GitHub® and GitLab® platforms automatically show your README to viewers of your code. Along with code comments, a README file is the most basic piece of documentation. See this article for a good tutorial on writing a README file. Name your README file README.md.