Interview: SciRuby Team

flattr this!

This week we interviewed three members of the SciRuby team (a collection of tools for scientific computation written in the Ruby language): John Woods, John Prince and Claudio Bustos. We think you will find this a very interesting an throurogh interview.

F4S: Please, give us a brief introduction about yourself.

 John Woods: I’m a Ph.D. student at the University of Texas at Austin, in Molecular Biology, working with Edward Marcotte. My research involves methods for searching for deep homology – in other words, finding genes that cause birth defects based on what related genes do in other species. I’m very grateful to be supported by a U.S. National Science Foundation fellowship.

I’m also a community organizer, in the political sense. It’s actually very similar to organizing an open source project. You find people who share a common goal, you bring them together, and you find a way to get it done.

F4S: What is SciRuby?

 John Woods:  The SciRuby Project is two things: a community and a set of libraries. We think Ruby is a great language, and so do a lot of other people. We liked what had been done with SciPy and NumPy, as far as opening up Python to those who had been stuck in Matlab and R, and decided Ruby needed something similar.

A lot of the tools already exist in one form or another. Claudio Bustos wrote statsample and a distribution package. He also ported the beautiful Protovis to Ruby (rubyvis).

Community means having an engaged user base that likes to document what works and what doesn’t work. You can’t really figure out what needs to be coded if you don’t know what you already have. To that end, John Prince and students in his lab are working on documenting NArray.

One thing that we really admire about SciPy is its community. But we think Ruby and the strong tradition of sharing libraries via Git should be one thing that makes Ruby even more conducive to building such a community.

F4S: Why and when did SciRuby came to be?

 John Woods: SciRuby was really conceived in October when I emailed John Prince — who used to occupy my desk in the Marcotte Lab, and who got me hooked on Ruby — to complain about the apparent lack of science and numerical tools in the language.

Then we found Claudio and Rubyvis, and asked him if he would be willing to get involved, since he had already written so many of the libraries that we would most like to have in SciRuby. He’s been an extremely enthusiastic contributor.

We got the name from Ara Howard, who had put together a SciRuby wiki a few years ago.

F4S: What are the strengths and weaknesses of the Ruby language when applied to scientific computing?

John Prince: Perhaps our biggest challenges are at the level of public perception.  Ruby is perceived by many as a “web” language, so they wouldn’t necessarily think about using it for numerical computation.  Also, as far as scripting languages go, Python/SciPy had a head start and won the scientific computing battle years ago — not because Ruby isn’t capable of outstanding scientific computing but because SciPy was so well-built and has matured so well.  So, why even try to do science computing with Ruby?  What does Ruby bring to the table that Matlab, R, or SciPy doesn’t already offer?

    1. Because of its consistent object oriented design and because everything returns a value, chaining is natural in Ruby.

    Chaining is very powerful in scientific computation — think about reverse polish notation calculators or the ‘J’ programming language — but it is not common in mainstream science libraries. Typically, layers of nested parentheses are used for series of calculations, e.g.:

    floor(10*random.random((3,4)))

    In Ruby, because the dot operator has highest priority, and even arithmetical operators are methods on objects, we can do things like:

    float(3,4).random.*(10).floor

    We can organize computation using traditional order of operations and parentheses, or we can simply chain things from left to right, which is often the cleanest solution.  Plus, the object model is easily compatible with Protovis/D3 style code, as demonstrated with Rubyvis.

    2. Avoid index errors and for loops with powerful block Enumerators.

    Indices are used inside ‘for’ loops to accomplish a variety of tasks

    in mainstream science libraries, but indexing errors are easy to make.  Ruby has an extensive set of enumerators that make the majority of these repetitive, error-prone operations much easier.  For example, we can take each consecutive 3 integers and compute a new array in one tidy line:

    new_array = [1,3,4,2,7].each_cons(3).map {|a,b| a+b**2 }

    Indexing errors account for a huge portion of coding errors on these other platforms, but these kinds of errors are missing in Ruby code because indexing is so rare.  The result is cleaner, error free, readable code.  Achieving speed with ruby enumerators and proper integration with numerical array operations is a remaining challenge.

    3. Scientific data and services are moving to the web, and Ruby is a web language.

    Although it is by no means “just” a web language, Ruby deserves its web-dev reputation, not just for Rails, but for a host of other resources and platforms like Sinatra.  As more data sets become web-accessible and more computation is performed as web services, we expect Ruby to become a major player in web-oriented scientific analysis.

    4. The Ruby community is highly innovative and dynamic.

    Behavior driven development, use of distributed source control (mostly revolving around github), and great package management in RubyGems and rubygems.org have fostered a community willing to experiment widely to achieve established and emerging objectives.  We think this dynamic can be leveraged to solve new scientific problems.  Keeping up and establishing foundational libraries in this environment can be challenging, but we hope to succeed by defining clear goals and letting the Ruby community innovate.

    As for weaknesses, the elephant in the room is “speed.”  Historically, native Ruby has not been particularly fast, but that has changed somewhat with version 1.9 and the new virtual machine implementation (YARV).  And, of course, achieving speed where it counts is a significant motivation for constructing Sciruby, building C/Fortran extensions, etc.  Syntactically, Ruby can easily accommodate multiple coding paradigms and is great for extending at the C level, so we really aren’t constrained by the language or its implementation.

F4S: Does SciRuby have sponsors?

John Woods: Right now SciRuby is supported by enthusiastic Rubyists working in both academic and industrial settings.  The project hopes to be increasingly supported by academic grants and/or private sponsorship.

F4S: How many users you estimate SciRuby have?

John Woods: Only a handful at present. We’re really still in the building stage.

F4S: How many team members does the project have?

John Woods: You can find a list here: https://github.com/SciRuby/sciruby/wiki/Roster

F4S: In what areas of SciRuby development do you currently need help?

 John Woods: Most of all right now we need someone to put together some basic tutorials and use cases, to help newcomers get started with SciRuby.

Our number-one priority right now is finding people interested in working on a rewrite of our numerical core, NArray. Eventually our hope is that NArray becomes part of Ruby-core.

Our number-two priority is statsample and distribution, which are Claudio’s statistical gems. They plug into GSL and Statistics2, but we want there to be a native Ruby option, so we’re translating a lot of C and C++ code to Ruby right now.

F4s: How can people get involved with SciRuby?

 John Woods: The easiest way is to join our Google Group, which is sciruby-dev, and make a suggestion. Folks are also welcome to email any of us.

F4S: What features are in the roadmap?

Claudio Bustos:

Distribution

We’ve made good progress on a C ruby extension with pdf, cdf and random variate generation for frequently used probability distributions. There are many efforts out there, but we need a solid and fast core of functions. R code is solid and very easy to port. GSL is another source.

Statsample

For the next release, I want to implement code for generalized linear models. I’m a psychologist, and we use Rasch models a lot — so I’m planning to implement parameter estimation for Rasch models, using joint and conditional maximum likelihood.

Visualization

Protovis development has stalled, so complete support (for static graphs, at least) could be possible in our Ruby port, Rubyvis. We’re also keeping an eye on d3, the successor to Protovis.

Math extensions

We are implementing methods for numerical analysis in Claudio’s gems, Distribution, Minimization and Statsample. It would be very cool to implement independent libraries to perform numerical integration, maximization with and without derivatives, and maximization with arbitrary constraints. GSL is great, but by having smaller libraries using pure Ruby ensures compatibility, and allows us to test not only common cases, but extreme ones too.

F4S: Which projects, blogs or sites related to open source software for science can you recommend?

John Prince: Outside Ruby: ScipyRJ. In Ruby: Hornet’s EyeRserveAra Howard’s workbioruby

F4S: How people can contact you?

John Woods: You can find me, John Woods, on Google+ or on Twitter. I’m mohawkjohn just about everywhere, including Github. You can also email me at john.o.woods@gmail.com.

John Prince is jtprince on gmail and Github.

Claudio Bustos is clbustos on both gmail and Github.

They can also typically find us in #sciruby on Freenode (IRC).

F4S:  Thank you all!  You have been very generous with your responses.

You can follow SciRuby at Twitter also.
SciRuby website: http://www.sciruby.com

Note: If you liked this interview you can Flattr it at the top of the post!

 

Books about Ruby

No related posts.

Tags: ,