Matrix Algebra on GPU: An interview with MAGMA lead developers

flattr this!

In this Floss4Science interview we bring you the leaders of the MAGMA (Matrix Algebra on GPU and Multicore Architectures) project: Jack Dongarra and Stanimire (Stan) Tomov. MAGMA is a set of linear algebra libraries with a strong relationship to the venerable LAPACK and ScaLAPACK libraries. On to the interview!


Jack Dongarra


Stanimire (Stan) Tomov

F4S: Please, give us an introduction about yourselves.
Jack: I am a University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee (UTK), as well as Distinguished Research Staff in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL), Turing Fellow at Manchester University, and an Adjunct Professor in the Computer Science Department at Rice University. I am also the director of the Innovative Computing Laboratory (ICL) at UTK, and the director of the Center for Information Technology Research at UTK.
I specialize in numerical algorithms in linear algebra, parallel computing, the use of advanced-computer architectures, programming methodology, and tools for parallel computers. I have contributed to the design and implementation of the following open source software packages and systems: EISPACK, LINPACK, the BLAS, LAPACK, ScaLAPACK, Netlib, PVM, MPI, NetSolve, Top500, ATLAS, and PAPI. Of great current interest are three addition projects that we are now developing at ICL, namely, these are the MAGMA, PLASMA, and DAGuE projects.

Stan: I am a Research Director in Prof. Dongarra’s linear algebra group at ICL and Adjunct Assistant Professor in the Electrical Engineering and Computer Science Department at UTK.

My research interests are in parallel algorithms, numerical analysis, and high-performance scientific computing. I have been involved in the development of numerical algorithms and software tools in a variety of fields ranging from scientific visualization and data mining to accurate and efficient numerical solution of PDEs. Currently, my work is concentrated on the development of the MAGMA libraries. I am also co-Principal Investigator of the CUDA Center of Excellence (CCOE) at UTK.

F4S: What is MAGMA?

Jack: MAGMA (Matrix Algebra on GPU and Multicore Architectures) is a collection of next generation linear algebra (LA) libraries designed and implemented by the team that developed LAPACK and ScaLAPACK.

MAGMA is for heterogeneous GPU-based architectures. It supports interfaces to current LA packages and standards, e.g., LAPACK and BLAS, to allow computational scientists to effortlessly port any LA-relying software components.

The main benefits of using MAGMA are that MAGMA can enable applications to fully exploit the power of current heterogeneous systems of multi/manycore CPUs and multiGPUs, and deliver the fastest possible time to an accurate solution within given energy constraints.

By combining the strengths of different architectures, MAGMA overcomes bottlenecks associated with just multicore or GPUs, to significantly outperform corresponding packages for any of these homogeneous components taken separately. MAGMA’s one-sided factorizations (and linear solvers) on a single Fermi GPU (and a basic CPU host) can outperform state-of-the-art CPU libraries on high-end multi-socket, multicore nodes (e.g., using up to 48 modern cores). The benefits for the two-sided factorizations (bases for eigenproblem and SVD solvers) are even greater, as the performance can exceed 10X the performance of systems with 48 modern CPU cores. Architecture-specific performances and comparisons can be found through the MAGMA site.

F4S: Why and when did MAGMA come to be?

Jack: There have been several main changes in the development of dense linear algebra libraries over the years and that has always been triggered by major hardware developments. For example, LINPACK in the 70′s targeted the vector machines at the time for which cache reuse was not essential, and as a result LINPACK had relied on just Level 1 BLAS. In the 80′s LINPACK had to be rewritten, leading to LAPACK, which would rely on Level 3 BLAS for cache-based machines. In the 90′s it was extended to ScaLAPACK for parallel platforms, relying on the PBLAS message passing.

Now, in the 00′s, with the explosion in parallelism and heterogeneity as well as the ever-increasing data-communication costs, the old libraries had to be redesigned ones again. Thus, the hardware trends triggered the need and the development of MAGMA. The project started in 2008 and relies on hybrid algorithms and scheduling, as well as new synchronization- and communication-avoiding algorithms.

Performance of MAGMA 1.1 — shown is LU on single GPU compared to LAPACK (Left) and LU for multi-GPUs (Right)


F4S: In which language and platform is MAGMA developed?

Stan: MAGMA is developed in C for multi/manycore CPU systems enhanced with accelerators. Within a CPU node MAGMA uses pthreads, and MPI for inter- nodal communications. Low level accelerator kernels (e.g., in MAGMA BLAS) are written in CUDA. Currently, for inter-platform portability, we are developing a MAGMA port in OpenCL as well as a pragma-based MAGMA port that would be suitable for the up-coming Intel MIC-based architectures.

F4S: Does MAGMA have sponsors?

Jack: MAGMA is sponsored by both government (DOE and NSF) and industry (NVIDIA, Intel, AMD, MathWorks, and Microsoft Research).

We have and try to maintain long term relations with our industry partners. Our software is fundamental in the sense that many scientific computing applications rely on and need high-performance linear algebra. This motivates also hardware vendors’ interest, as fundamental libraries yield a way of enabling new hardware for scientific computing.

F4S: How are the sponsors supporting the project?

Jack: DOE and NSF support comes through grants, awarded through various programs. Industry support comes in many forms – from hardware donations, and expertise on hardware and software from vendors, to grants and advertising support. UTK is also one of NVIDIA’s CUDA centers of excellence (CCOE).

F4S: How many users do you estimate MAGMA has?

Stan: There is a lot of interest in MAGMA. For example, there is an average of 3,000 MAGMA page hits per day, but is difficult to know the exact number of users. MAGMA 1.0 is freely available as open software so we can guess the number of users through download statistics, e.g., MAGMA’s download site has been visited more than 28,000 times since January 2009.

F4S: Do you know where is MAGMA used?

Jack: We know it is used in DOE labs (e.g., ORNL, LLNL, and LBNL), corporations use it and some even incorporate it through their products (e.g., MathWorks in Matlab, NVIDIA used our GEMM in CUBLAS for Fermi, etc.), as well as Universities (for CUDA educational activities and research involving GPU computing).

F4S: How many team members does MAGMA have?

Stan: We aim to build MAGMA, similarly to LAPACK, as a community effort. We have several collaborating partners (from UC Berkeley, UC Denver, INRIA France, and KAUST) and a number of contributors from the community (e.g., from ETH Zurich, ICHEC Ireland, etc.). At ICL we have three full-time team members, three graduate students, and a number of part-time contributors (from other ICL projects).

F4S: In what areas of MAGMA development do you currently need help?

Stan: Current work is on the development and incorporation in MAGMA of algorithms for multi-GPUs and distributed systems. This includes new algorithms that reduce synchronization as well as communication. On going work is also on the development of dynamic run-time/scheduling systems. We need and would be happy to get contributions in all these areas.

F4S: How can people get involved with MAGMA?

Jack: People can email me directly ( or Stan Tomov ( Contact can also be done through our partners. A sense of what functionality users may need and discussion on how to provide it (or contribute it) can be obtained form the magma users’ forum (

Our software license is modified BSD and contributors must be willing to release their code through it, as well as write their contributions within the MAGMA style and guidelines. Credit is given in the release documentation as well as directly in the files contributed. Once software is contributed we are committed to its maintenance.

F4S: What features are in the roadmap?


Key MAGMA features are

1) Top performance and high accuracy (LAPACK compliant),
2) Multiple precision arithmetic support (S/D/C/Z and mixed precision), and
3) Hybrid algorithms using both multicore CPUs and GPUs.

We will release MAGMA 1.1 at SC’11. This release will have enhancements for the main features, mostly in terms of expanded functionality. For example, we will release algorithms for multiGPUs on a node, e.g., hybrid LAPACK-compliant LU, QR, and Cholesky with static scheduling as well as tile LU, QR, and Cholesky with dynamic scheduling (using the StarPU scheduler There will be improvements in the eigenproblem and SVD MAGMA solvers, some new non-GPU-resident algorithms, etc.

Beyond the SC’11 release we are developing OpenCL port for MAGMA and support for MIC-based architectures. An auto-tuning framework is also on the roadmap. For users that want to minimize energy consumption, we identify various parameters that influence energy consumption, and will incorporate their tuning into the auto-tuning framework to automate the process of searching energy-efficient versions for MAGMA’s algorithms.

F4S: Which projects, blogs or sites related to open source software for science can you recommend?

Jack: The Netlib repository at UTK and ORNL

See for example the survey on freely available software for linear algebra at

F4S: Why do you consider free/libre open source software important for the advancement of your field?

Stan: Advancements in our field have relied always heavily on collaboration, exchange of ideas, and ultimately — open software. Open software facilitates the spread of new ideas and competitiveness to bring the “next big improvement”. Open software can also be used in innovative business ideas, leading to commercial versions, which is also great and an ultimate goal for open source initiatives. For example, we are very happy that MAGMA has been used to provide GPU support in Matlab.

F4S: Is there any other topic you would like our readers to know about?

Jack: Exascale computing capabilities and the challenges to get there are now on the horizon! Here is a report on the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project:

“The International Exascale Software Roadmap,” Dongarra, J., Beckman, P. et al., Volume 25, Number 1, 2011, International Journal of High Performance Computer Applications, ISSN 1094-3420.
(see also

F4S: Where people can contact you and learn more about MAGMA?


F4S: Thank you Jack and Stan for taking the time to share with us more about the MAGMA project.

Note: If you liked this interview you can Flattr it at the top of the post!

No related posts.

Tags: , ,