This is the first of a series of interviews with notable people involved with the free software community in the scientific and engineering fields. Future interviews will include project leaders, bloggers, teachers and CEOs.
I was fortunate enough to get Gaël Varoquaux to accept a written interview. He is a very, very busy man. He was recently heavily involved in SciPy 2011 where he gave a presentation entitled Python for Brain Mining: (Neuro)science with State of the Art Machine Learning and Data Visualization. I hope you enjoy this interview as much as I did!
F4S: Please, give us a brief introduction about yourself
Gaël: I am 30 years old, French born, living in Paris. I have no children, but
have been living happily with Emmanuelle Gouillart for ten years. I do
research in computer science and applied mathematics for a living. I work
at the French computer science institute, INRIA, in Saclay, on
statistical modeling and methods for brain imaging. My projects for the
future are to keep doing what I am doing right now: blend academic
research with software development and try to make progress in scientific
computing accessible to more people in acamedia and in the industry.
F4S: How did you got involved in open source?
Gaël: To make a long story short, I have been toying with computers ever since
I was a kid. At university, I got into Linux in 2000 because it offered
more tinkering than Windows and could teach me more. As time went I grew
a fascination for open source as it was giving me more opportunities to
innovate and learn. During my PhD in physics, I progressively grew from a
user to a contributor driven by a desire to improve tools I could use for
my work. During a trip to the US I visited Fernando Perez, of IPython
fame. He made me realize that my lack formal training in computing was
not a shame and that I could have an impact on software. Starting from
there, the SciPy community, incredibly friendly and positive-minded,
provided the grounds I needed to develop my skills and involvement.
F4S: In which open source projects are you contributing?
Gaël: Too many . Everything that I do is related to scientific computing in
Python, as this is the platform that has proved most useful for me in my
scientific projects. Mayavi, 3D plotting in Python, is the first major
project I got involved with, and I am now one of the 2 core developers. I
worked a bit on IPython, an interactive Python environment. For my
research work on brain imaging, I contribute to the nipy project,
neuroimaging in Python. To streamline my research, I have created the
joblib project, a small project dedicated to lightweight pipeline-like
computing in Python. But my largest involvement currently is the
scikit-learn, a Python toolkit for machine learning. I lead the
investment that my research group is putting in it and the project
benefits from a huge momentum.
F4S: What are the proprietary software equivalents to those projects?
Gaël: Matlab and its various toolboxes is the most serious contender. IDL is
also a historically important player, although its role is going down.
F4S: What programming languages and open source software platforms do you use in your work?
Gaël: I work personally under Linux, but I collaborate with people using macs
and Windows. My favorite language in Python. I do a bit of C, and when I
need speed of compiled languages and easy of use of Python, I use Cython.
I do all my work using Vim, and no IDE. My interactive environment for
scientific computing is IPython.
F4S: Why is open source scientific software important for you?
Gaël: First, because I use it in my work, scientific research. But also because
I believe that science should be done in a way that makes it reproducible
and accessible to everybody. I see open source scientific software as the
textbooks of the future: enabler for new generation. Science built on
close source software is controlled by those who control the software.
Finally, I believe that better innovation and technical or scientific
discoveries come from unrestricted tinkering. In my experience, being
able to modify and reuse the building blocks of scientific software opens
news horizons in many fields of science. For instance in physics, I was
able to write better experiment-control code using open source. Here I am
not only referring to modifying the source code of software, but more
generally to the fact that an open-source environment is much more made
of reusable tools that can be wired together for purposes differing from
their initial target market.
F4S: How people can contribute to advance the use of open source scientific software?
Gaël: First by using the software, and helping other people use them. Second by
giving credit to it: citing it in publication, naming it in project
description. Then by investing in the general ecosystem: mailing list,
forums, documentation wikis. All this can be done without programming
knowledge. Finally, contributing to the various project’s code is a good
way to move things forward. Bug fixes and focused improvements are great
to raise the projects’ quality. Developing new features requires more
experience in open source projects. I find that often people overestimate
the importance of the line of code. I spend more time dealing with
documentation, communication, and project management than writing code.
F4S: What other open source projects will you recommend science professionals to use or follow?
Gaël: Of course the general scientific Python ecosystem. I really think that
Python is going somewhere as a scientific tool. It enables combining so
many tools from various communities. As far as other projects go, I think
that the two frontiers in scientific software are making it faster, and
making it easier to use. Projects working on making computing faster that
I keep an eye on are Starcluster and Hadoop. To make life simpler in the
lab, my advice is to use and follow the developments of a good Linux
distribution supporting scientific computing, such as Debian or Ubuntu.
It makes trying out software much easier.
F4S: Which people, blogs or sites related to open source software for science can you recommend?
Gaël: I follow closely http://planet.scipy.org. Random twitter accounts that I
look at are CompSciFactn, ctitusbrown, enthought, ogrisel, scipytip,
F4S: Is there any other topic you would like to talk about?
Gaël: I would like to comment on the choice of licenses in scientific software.
While open source software has often been fueled by copyleft licenses,
such as the GPL, that force open source on derived products, many in the
scientific community feel that this is not the right model for science
and prefer BSD-like license, that have very little restrictions. The
reason behind this choice is that we do open source science because we
believe the articles are not enough to communicate scientific results:
software is need to reproduce them. This software is often
highly-technical, and reimplementing the full stack of scientific
computing is not a viable option. Thus it cannot come with strings
attached. Historically, scientific software was open before the rise of
the GNU movement: netlib contains a lot of age old public domain code
that serve as the foundation of todays numerical computing.
I also want to stress the importance of the community surrounding open
source scientific software. Investing in open source software, as a user
or as a contributor, is a fantastic way of learning through the
interaction with others. I have myself gained insights on numerical, and
mathematical methods, experience in programming, and discovered
project-management and the related trade-offs. In some sens, this
community complements the interaction with my lab colleagues, and my
academic peers, as it is formed of people with different background and
F4S: How people can contact you?
Gaël: I have a pretty bad information overload with email, although I do try to
answer it. My website and blog can be found on
http://gael-varoquaux.info. I am not on any social network.
F4S: Thanks Gaël.
A Primer on Scientific Programming with Python
Introduction to computer programming of scientific applications, using the high-level Python language. The exposition is example- and problem-oriented, where the applications are taken from mathematics, numerical calculus, statistics, physics, biology, and finance.
No related posts.