Oπnions: Should I share my code?

Scroggs debates whether sharing truly is caring

post

Six years ago, in September 2014, I started working on my PhD. By Christmas, I was doing calculations using code that would’ve taken the average PhD student three to four years to write. But this wasn’t down to my own programming skills: this was thanks to my supervisor and his previous students deciding to work on open source software.

Open source software is software whose developers release the source code that makes up the software, rather than just releasing a binary or application that the user can run. Almost always, open source software is free to use and adapt.

You are—perhaps without realising it—using open source software every day. If you use WordPress, Firefox, Audacity, or VLC, you are using open source software: you could download the source code of any of these programs and edit them however you like. Even if you don’t use any of these, almost every website on the internet is run on a server running the open source operating system Linux.

The popularity of open source software has been increasing in academia too: the plot on the following page shows the proportion of papers in three numerical analysis journals—SIAM Journal on Scientific Computing, Computers and Mathematics with Applications, and Numerische Mathematik—that are found when using each journal’s search functionality to search for “open source”. You can see that there is a big increase in the proportion of papers in recent years, indicating a significant increase in the number of researchers writing or using open source software.

A scatter graph showing a clear upward trend

The percentage of papers mentioning “open source”

(There are, of course, going to be some false positives in this plot, for example a paper about a problem in an open domain with a source term in the equation may well show up in the search. There will also be some papers that use open source software but don’t say so, which will not show up. But the increase shown over recent years is large enough that it is likely to be meaningful even with these issues.)

So while my achievements early in my PhD sound impressive, I actually only wrote a few lines of code to extend code that already existed. Someone else had already done most of the three to four years of work, so I didn’t have to do this again. This sounds great, but would I have perhaps learned something during these three to four years that I’ve missed out on?

Will I learn more by writing the code myself?

I think one of the best ways to really understand how an algorithm works is to implement it, then tweak it to see how each step works. It is possible, therefore, to use other people’s code without really understanding what the methods you are using do.

I do not, however, think that this is reason for not using other people’s code.

The best way to test out an algorithm is usually to implement a version of it for a small, simplified problem. For example, you could use it to solve a one-dimensional problem, when in fact you are interested in a more realistic three-dimensional problem; or you could set some values in your equation to 0, giving a simpler equation to solve.

Turning this simple code into more general code that can be used for the actual problems you are interested in, on the other hand, is a much larger job. Extending your code in this way is very time consuming, and there is much less to be learned by doing this. So once you’ve tried out and understood the methods you want to use, you would be much better off using someone else’s open source code to save yourself a lot of time.

But what if no one else has written an open source implementation of your method?

Why should I give away my work for free?

If no one else has already done the work for you, you may have no choice but to do all the programming yourself. You may then be reluctant to give away all your hard work to benefit others, but there are plenty of ways in which this can also benefit you.

If your code does something that other code cannot do, you will probably find other people that want to use it. Having other people use your code has a number of benefits.

The more people use your code, the more likely it is that errors and bugs in your code will be spotted. Having a userbase that reports bugs to you will lead to a more reliable piece of software, allowing you to be more confident about your computations.

Additionally, you may find that some of your users fix the bugs themselves, or add functionality to your code to make it more applicable to the problems they are solving. The time you spent starting the software can then start to pay off, as you can now benefit from the work of others. This is also a great way to find collaborators and new problems to work on.

Some really great academic communities grow up around open source softwares, and there are even conferences organised just for developers and users of some softwares.

The Journal of Open Source Software (Joss)

Academics are often judged by their paper output, and writing good code can reduce this while often actually benefiting the community much more than writing a paper. It is possible, however, to write a good publication about your software, either by writing a full paper about the algorithms used in your software and the results obtained; or by writing a short paper for the Journal of Open Source Software (Joss), an open journal that publishes short papers about software libraries.

Publishing a paper like this makes your software easily citable. If lots of people use your software, you may well get a lot of citations. As an example, the paper Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities about the open source software Gmsh has 4677 citations. By comparison, the second most cited papers by its authors—Christophe Geuzaine and Jean-Francois Remacle—have just 260 and 727 citations respectively.

This is all great, but maybe your software is nowhere near as good as other people’s open source code and not worth releasing.

My code isn’t good enough

If you’ve written some code from scratch to do some computations, the code is almost certainly a little bit messy and lacking in documentation. In this state, it’s quite unlikely that anyone will use your code.

At this point, you’re going to have to spend a quite significant amount of time learning about documentation and good code style in your preferred programming language. My best advice here would be to find a few popular pieces of open software using the same language as you and see what they do. You’ll probably find that there are contributors to these softwares that are willing to give you some guidance on what to do. (If your preferred language is Python, feel free to bug me for more specific advice.)

Tidying and documenting code can seem like a very long and boring job, and might not seem worth it. Doing this, however, is likely to benefit you, as well as helping out others.

In a few months’ (or even years’) time, you may find that you need to revisit your code: perhaps your paper has been reviewed and you need to adjust some calculations, or maybe you want to build on your old work and extend the code to do something else. By this point, you’ll probably have forgotten exactly how your code works. If it’s an undocumented mess, it’s going to take you ages to work out how it worked. But if you tidied up your code and added some documentation, this task will be much easier.

If you’re still daunted by the idea of tidying your code up, you might be able to do less work than you’re dreading. One or two well-documented examples of how you use your code would probably be enough to allow someone to work out how to use it. You can always do more tidying at a later date, maybe adding bits of documentation as and when users ask you about them so you cover the important areas first.

Hopefully, you’re now strongly considering sharing your code, but perhaps wondering what to do next.

What do I need to do to make my code open source?

If you want to release your code, you’re probably going to have to learn to use Git. Git is a version control tool that is used by many popular online code repositories, such as GitHub, GitLab and Bitbucket. Git can do an awful lot of things, but you probably don’t need to use most of them. You almost certainly work alongside someone who uses Git, who can show you the basics and help you decide where the best place to put your code is.

You’ll also need to decide on a licence to give your code. This licence will tell users what they can do with your code. There are a few common open source licences that allow users to do slightly different things with your code; there are plenty of guides online that can help you decide which is best for you.

Then you’re ready to go ahead and release your code. Perhaps one day soon you’ll be able to boast that you saved a PhD student four years of programming.

Matthew Scroggs is a postdoctoral researcher in the Department of Engineering at the University of Cambridge working on finite and boundary element methods. His website, mscroggs.co.uk, is full of maths.
@mscroggs    mscroggs.co.uk    + More articles by Matthew

More from Chalkdust