This week we interviewed Mikhail Fursov, software engineer at Unipro, a software services company with an open source bioinformatics product: UGENE. We will learn about their business model, what role open source plays in it, and how they are integrating cloud computing into their services. Let’s get into it!
F4S: Please, give us a brief introduction about your company Unipro.
Unipro: Unipro is a small company with about 60-70 software engineers. The company expertise is focused on the following areas: compilers and low-level optimizations development, virtual machines development, quality testing, parallel and cloud-based computing.
F4S: When and why was Unipro founded? Where is it located?
Unipro: Unipro was founded in 1992. At that time most Unipro engineers worked on testing and compilers development for Russian Elbrus computer. Unipro is located in a small scientific town “Akademgorodok” in 15 miles from Novosibirsk city, one of the large central Russian cities in Siberia region.
F4S: What is your business model?
Unipro: The business model is quite complex: it is a long term contracts with large IT players, small outsourcing projects in bioinformatics, cloud-based computing support and customization of UGENE open-source project.
F4S: How many clients do you have and which of them can you mention?
Unipro: All projects supported or developed by Unipro are different. We had a long (about 13 years) and positive experience of working with big companies, like Sun Microsystems where we participated in Java, C++, Fortran languages development and testing. The local Intel’s site in Novosibirsk was founded in collaboration with Unipro. Now, for several years we have common development projects with one of the most advanced and revolutionary software development companies of today – Google.
If we focus on UGENE project only: the clients are primarily academic users. UGENE packages are downloaded more than 1000 times per month only from our central web site, not counting other software distributor web sites. UGENE internals, like the cloud-based computing part or visualization is reused to build customized software packages for internal use in client organizations.
F4S: What is UGENE?
Unipro: UGENE is a free bioinformatics cross-platform toolkit for molecular biologists. It can be run smoothly both on slow and high performance desktop stations. UGENE integrates a lot of (more than 20) existing popular bioinformatics tools, like BLAST, CLUSTAL, PHYLIP, SMITH-WATERMAN, BOWTIE, HMMER, supports dozens of file formats and remote databases, allows a user to visualize the results of computations in sequence, alignment, chromatography, assembly and 3d viewers and more.
The main feature of UGENE is the Workflow Designer. The idea behind the Workflow Designer is the following: all algorithms and tools integrated into UGENE are reworked to use common data model. It means that a user has a unique option to combine the tools in a workflow and run them in a batch. The graphical interface of the Workflow Designer is very simple; it contains a lot of hints and can be used by an user without programming skills. The workflows created in the Workflow Designer can be run both on a local computer and with a cloud engine. Currently we have adapters for Amazon EC2 and Microsoft Azure clouds.
F4S: Why and when did UGENE came to be?
Unipro: UGENE project was started in 2008 and combined results of different and not related to each other projects, developed in Unipro earlier, in collaboration with several academic organizations.
F4S: Why you decided to make UGENE free software?
Unipro: We think that the free and open-source model helps the project to get a large user base and to become well-known among bioinformaticians. Once users like UGENE, use it on every day basis and want to get support or create a customized version of the software – our team becomes the best candidate to complete this task.
F4S: In which language and/or platform is UGENE developed?
Unipro: We use C++ language and QT4 library. We test UGENE on Windows, Linux and Mac OS X platforms. Cloud server part is written in Java language. Web interface is written in Java, Wicket and jQuery libraries.
F4S: How many users you estimate UGENE have?
Unipro: Current users download rate is about 1000 times per month. Since we release updates every two months I think that the active users base is about the same or may be about 2000 users world wide.
F4S: How many team members does the project have?
Unipro: UGENE has more than 15 team members today. To know about them please check our issue tracking system: https://ugene.unipro.ru/tracker
The activity of each member varies. So sometimes we have periods with 12-13 members heavily involved in the project and sometimes the active team size is reduced to 3-4 engineers.
F4S: In what areas of UGENE development do you currently need help?
Unipro: First, we focused on quality improvement for the project. Having a test base with about 5 thousand automated tests is not enough for a codebase with more than one million lines of C++ code. As the result any feedback on how we can improve the existing UGENE user interface is appreciated.
It would be great if someone could help us to improve our documentation and translations of the graphical interface to different languages. Note, that even English language is not a native for most of our team.
And the last important area where we lack developers is for the Mac OS X user interface. None of us uses Mac OS X today. This is usual for Russia, where Windows and Linux have been ruling the world for a long time and Mac started to get popularity only recently. So any user with technical skills that he can fix or recommend how to improve the Mac OS X user interface to make it more standard is really welcome!
F4S: How can people get involved with UGENE?
Unipro: Contact us using any option: email, discussion board, issue tracker. Be initiative or really interested in some feature.
Web site: http://ugene.unipro.ru
Discussion board: http://ugene.unipro.ru/forum
Issue tracker: https://ugene.unipro.ru/tracker
F4S: What features are in the roadmap?
- Workflow Designer: more efficient code to run workflows in clouds.
- Assembly Browser: a lot of visualization improvements planned. Check the issue tracking system for the ‘view-assembly’ component for details.
- Development and optimizations of the assembly algorithms: we plan to integrate and optimize more assembly and aligner tools.
F4S: Which projects, blogs or sites related to open source software for science can you recommend?
- http://www.bioinformatics.org/- one of the best mailing lists.
- http://www.molecularstation.com/forum/bioinformatics/ – popular forums on bioinformatics.
- http://seqanswers.com/ – another good resource to get answers.
- http://www.biosolutions.info/ – a lot of high quality media materials to learn bioinformatics.
F4S: Is there any other topic you would like our readers to know about?
Unipro: There will be no end of the world in 2012. UGENE team will continue the development.
F4S: Thank you Mikhail for sharing with us more about Unipro and UGENE.
Find books on this topic in our bookstore …