Open Science

Overview

Teaching: 5 min
Exercises: 5 min
Questions
  • How can version control help me make my work more open?

Objectives
  • Explain how a version control system can be leveraged as an electronic lab notebook for computational work.

The opposite of “open” isn’t “closed”. The opposite of “open” is “broken”.

— John Wilbanks

Free sharing of information might be the ideal in science, but the reality is often more complicated. Normal practice today looks something like this:

For a growing number of scientists, though, the process looks like this:

This open model accelerates discovery: the more open work is, the more widely it is cited and re-used. However, people who want to work this way need to make some decisions about what exactly “open” means and how to do it. You can find more on the different aspects of Open Science in this book.

This is one of the (many) reasons we teach version control. When used diligently, it answers the “how” question by acting as a shareable electronic lab notebook for computational work:

Making Code Citable

Anything that is hosted in a version control repository (data, code, papers, etc.) can be turned into a citable object. You’ll learn how to do this in lesson 12: Citation.

How Reproducible Is My Work?

Ask one of your labmates to reproduce a result you recently obtained using only what they can find in your papers or on the web. Try to do the same for one of their results, then try to do it for a result from a lab you work with.

How to Find an Appropriate Data Repository?

Surf the internet for a couple of minutes and check out the data repositories mentioned above: Figshare, Zenodo, Dryad. Depending on your field of research, you might find community-recognized repositories that are well-known in your field. You might also find useful these data repositories recommended by Nature. Discuss with your neighbor which data repository you might want to approach for your current project and explain why.

How to Track Large Data or Image Files using Git?

Large data or image files such as .md5 or .psd file types can be tracked within a github repository using the Git Large File Storage open source extension tool. This tool automatically uploads large file contents to a remote server and replaces the file with a text pointer within the github repository.

Try downloading and installing the Git Large File Storage extension tool, then add tracking of a large file to your github repository. Ask a colleague to clone your repository and describe what they see when they access that large file.

Key Points

  • Open scientific work is more useful and more highly cited than closed.