The topic of open science has been on my mind a lot lately, and it has also been on the list of potential posts for this blog. As I sat down to map the ideas I wanted to clarify and share, I realized that I wasn't looking at one post, but several. I think this will take us most of the summer to get through. Why? Well, there's a lot here to talk about and there are also a few exciting projects that I'm working on that will hopefully line-up with their respective blog posts.
This post series was finally pushed to the top of my stack by a few events. First, I had some great discussions on open data with some colleagues and promised to show their work as an example of open science. Next, I was frustrated by reading a high impact paper whose data and methods were decidedly not open. I also wanted to raise awareness that US federally funded research is now supposed to be open through a mandate from the Office of Science and Technology Policy (OSTP). You can read the mandate here. Finally, several people had mentioned to me that they wanted to know more about open science, some even mentioning a social science site called ResearchGate.
For this first post, we are going to ask ourselves: 1) What is science anyway? 2) Why should it be open? 3) Why is this hard? Without further delay, let's jump in!
What is science anyway?
When I was in seventh grade our science teacher, Mr.Paskiewicz, asked us to write down the answer to "what is science?" I don't remember what I wrote then, but I know that the question has never stopped being something I wonder about. While my answer changes depending on the level of detail requested, I believe that science is basically the name for structured curiosity. While this is more broad than Merriam-Webster's definition of "knowledge about or study of the natural world based on facts learned through experiments and observation," I think it encompasses a similar idea. Science is the process that we have devised to learn about universal truths of our existence in a way that safeguards us against our own human tendencies. These tendencies include bias, preconceived ideas, literature inertia, and career pressure.
In an effort to protect us from ourselves, the scientific method came into existence. The scientific method is a recursive process of making predictions, testing them, revising our ideas and models, then making more predictions. This process seems bullet-proof until we step into the equation. If a scientist is working alone (or in a vacuum as we often say), they may end up chasing a line of work that supports their "pet" hypothesis. This is easy to do! What happens if that idea turns out to be false? Sometimes we still pursue different ways to show that it may be true, not wanting to let go of our self-conceived brilliance. The missing chunk in this process was filled much after the early days of science by the sharing of discoveries in publications, and finally the peer review process was conceived.
Peer review is how we ideally are one another's check and balance system. I think that my work has made a contribution to the knowledge base that is science, so I submit it for publication. The paper (and ideally the evidence) is sent to other scientists to read, review, and comment on. These comments should concern how good the work is, but be mostly made up of recommendations or comments on the contribution. This sounds like it should be fertile ground from which we continually reap a harvest, but instead has turned into one of the most mocked processes in science. Reviews on publications are sometimes very helpful, but just as often seem to contain requests to add references (often to the reviewer's work), mundane requests to change things in the figures or writing style, or are completely unhelpful (such as "good paper"). The publication and review processes are part of the called-for revisions in open science, so we'll discuss them deeply in a later post.
why should science be open?
As we said above, one of the goals of science is to create a curated body of knowledge. The definition didn't say what all was included in that body of knowledge. Is it our conclusions and "laws" of nature? Sure, but it should also consist of the methods, hardware, software, and data to back up those conclusions. If we do not continually re-evaluate our "laws," we run the risk of continuing the obey a false, self-imposed, misconception.
Open science is one of the best ways to help (though not totally prevent, we are human after all) us keep a trusted and verified scientific knowledge base. Open science is the idea that our data, methods, software, hardware, ideas, notes, and all materials are available to be checked, tested, compared, and scrutinized by anyone. How available? That is part of the core argument going on currently. Should things that I worked years on and devoted thousands of hours to be posted online for anyone to download? I'm going to argue that they should. This is absolutely not a trivial thing to do, and it certainly doesn't end in a situation with no problems where scientists worldwide sit in a virtual circle around a hologram campfire singing Kumbaya. It's certainly closer than where we are now though.
One last note on this before we move on. I am not at all against companies that make software, hardware, etc. I am not against making money, we all need to make a living. I am not against patents, the patent process, or copyright. There are so many licenses out there currently (copyleft, share and share alike, CERN Open Hardware License, and many others) that it is confusing. In fact, your work needs some kind of license or it is assumed that others cannot use it! The issue is further confused when we often sign over rights to our figures and text to a publishing house when we publish a paper. I'm not sure, nor is anyone, what the best path is, so let's explore that together as well.
Also have a look at Michael Nielsen's TEDxWaterloo talk on open science (below). He shows some excellent examples of both the successes and failures of open science. (Like anything, acknowledging failures is crucial to the eventual success.)
why is this hard?
So why are all scientists and R&D folks not sitting around that virtual campfire already? There are so many road-blocks and difficult decisions we will have to make as a community that it can seem very imposing. Why would we share our hard-earned results and ideas for free? How can we get funding to do our work from a company and remain open? Where will all of this go and who maintains it? Who sets standards? This list seems pretty imposing to me and may even sound discouraging. It's not! Challenges of equal magnitude have been faced during every major revolution by information workers. No solution will be perfect, but most of the solutions we have devised have been good enough. Just think about standardization in the early days of the internet or when plans for early scientific apparatus were released from corporate secrecy.
As you can see, this is a subject that I am very passionate about. My goal is going to be explaining things in a brief manner so that these are not so long that you have trouble making it through! I hope that these posts will promote discussion amongst my colleagues, both known and unknown. Please contribute by commenting: sharing your ideas is the first step!
There is an entire spectrum of open-ness that runs from top-secret documents that will never see the sun to those who freely give all of their knowledge and content. The community will never reach either end of that spectrum, but it is my hope that the slow migration towards open source science can be promoted through discussion. I'm really looking forward to working through my own thoughts on this subject and hearing what others have to say about it.