We’re very excited to announce the open sourcing and splitting of the proprietary Anaconda Accelerate library into several new projects. This change has been a long time coming, and we are looking forward to moving this functionality out into the open for the community.
A Brief History of Accelerate/NumbaPro
Continuum Analytics has always been a products, services and training company focused on open data science, especially Python. Prior to the introduction of Anaconda Enterprise, our flagship enterprise product, we created two smaller proprietary libraries: IOPro and NumbaPro. You may recall that IOPro was open sourced last year.
NumbaPro was one of the early tools that could compile Python for execution on NVIDIA GPUs. Our goal with NumbaPro was to make cutting-edge GPUs more accessible to Python users, and to improve the performance of numerical code in Python. NumbaPro proved this was possible, and we offered free licenses to academic users to help jump start early adoption.
In March 2014, we decided that the core GPU compiler in NumbaPro really needed to become open source to help advance GPU usage in Python, and it was merged into the open source Numba project. Later, in January 2016, we moved the compiler features for multithreaded CPU and GPU ufuncs (and generalized ufuncs) into the Numba project and renamed NumbaPro to Accelerate.
Our philosophy with open source is that we should open source a technology when we (1) think it should become core infrastructure in the PyData community and (2) we want to build a user/developer community around the technology. If you look at our other open source projects, we hope that spirit comes through, and it has guided us as we have transferred features from Accelerate/NumbaPro to Numba.
What is Changing?
Accelerate currently is composed of three different feature sets:
- Python wrappers around NVIDIA GPU libraries for linear algebra, FFT, sparse matrix operations, sorting and searching.
- Python wrappers around some of Intel’s MKL Vector Math Library functions
- A “data profiler” tool based on cProfile and SnakeViz.
NVIDIA CUDA Libraries
Today, we are releasing a two new Numba sub-projects called pyculib and pyculib_sorting, which contain the NVIDIA GPU library Python wrappers and sorting functions from Accelerate. These wrappers work with NumPy arrays and Numba GPU device arrays to provide access to accelerated functions from:
- cuBLAS: Linear algebra
- cuFFT: Fast Fourier Transform
- cuSparse: Sparse matrix operations
- cuRand: Random number generation (host functions only)
- Sorting: Fast sorting algorithms ported from CUB and ModernGPU
Going forward, the Numba project will take stewardship of pyculib and pyculib_sorting, releasing updates as needed when new Numba releases come out. These projects are BSD-licensed, just like Numba.
MKL Accelerated NumPy Ufuncs
The second Accelerate feature was a set of wrappers around Intel’s Vector Math Libraries to compute special math functions on NumPy arrays in parallel on the CPU. Shortly after we implemented this feature, Intel released their own Python distribution based on Anaconda. The Intel Distribution for Python includes a patched version of NumPy that delegates many array math operations to either Intel’s SVML library (for small arrays) or their MKL Vector Math Library (for large arrays). We think this is a much better alternative to Accelerate for users who want accelerated NumPy functions on the CPU. Existing Anaconda users can create new conda environments with Intel’s full Python distribution, or install Intel’s version of NumPy using these instructions.
Note that the free Anaconda distribution of NumPy and SciPy has used MKL to accelerate linear algebra and FFT operations for several years now, and will continue to do so.
The final feature in Accelerate is what we have decided to call a “data profiler.” This tool arose out of our experiences doing optimization projects for customers. Every optimization task should start by profiling the application to see what functions are consuming the most compute time. However, in a lot of scientific Python applications that use NumPy, it is important to also consider the shape and data type of the arrays being passed around, as that determines what optimization strategies are viable. Operations on a very large array could be accelerated with a multi-threaded CPU or GPU implementation, whereas many operations on small arrays might require some refactoring to batch processing for higher efficiency.
The traditional Python profiler, cProfile, doesn’t capture information about data types or array sizes, so we extended it to record this extra information along with the function signature. Any tool that works with cProfile stats files should be able to display this information. We also modified the SnakeViz tool to more easily embed its interactive graphics into a Jupyter Notebook.
Today, we are open sourcing this tool in the data_profiler project on GitHub, also under the Numba organization. Again, like Numba, data_profiler is BSD-licensed.
- If you were using the accelerate.cuda package, you can install the pyculib package today:
conda install -c numba pyculib
- The documentation for pyculib shows how to map old Accelerate package names to the new version. The pyculib packages will appear in the default conda channel in a few weeks.
- If you are interested in accelerated NumPy functions on the CPU, take a look at the Intel Python conda packages: Using Intel Distribution for Python with Anaconda.
- If you want to try out the data_profiler, you can take a look at the documentation here.
We will continue to support current Anaconda Accelerate licensees until August 1, 2018, but we encourage you to switch over to the new projects as soon as possible. If you have any questions, please contact [email protected] for more information.