We’re happy to announce the latest update to Accelerate with the release of version 2.2. This version of Accelerate adds compatibility with the recently released Numba 0.25, and also expands the Anaconda Platform in two new directions:
- Data profiling
- MKL-accelerated ufuncs
I’ll discuss each of these in detail below.
We’ve built up quite a bit of experience over the years optimizing numerical Python code for our customers, and these projects follow some common patterns. First, the most important step in the optimization process is profiling a realistic test case. You can’t improve what you can’t measure, and profiling is critical to identify the true bottlenecks in an application. Even experienced developers are often surprised by profiling results when they see which functions are consuming the most time. Ensuring the test case is realistic (but not necessarily long) is also very important, as unit and functional tests for applications tend to use smaller, or differently shaped, input data sets. The scaling behavior of many algorithms is non-linear, so profiling with a very small input can give misleading results.
The second step in optimization is to consider alternative implementations for the critical functions identified in the first step, possibly adopting a different algorithm, parallelizing the calculation to make use of multiple cores or a GPU, or moving up a level to eliminate or batch unnecessary calls to the function. In this step of the process, we often found ourselves lacking a critical piece of information: what data types and sizes were being passed to this function? The best approach often depends on this information. Are these NumPy arrays or custom classes? Are the arrays large or small? 32-bit or 64-bit float? What dimensionality? Large arrays might benefit from GPU acceleration, but small arrays often require moving up the call stack in order to see if calculations can be batched.
Rather than having to manually modify the code to collect this data type information in an ad-hoc way, we’ve added a new profiling tool to Accelerate that can record this type information as a part of normal profiling. For lack of a better term, we’re calling this “data profiling.”
We collect this extra information using a modified version of the built-in Python profiling mechanism, and can display it using the standard pstats-style table:
|300/100||0.01313||0.0001313||0.03036||0.0003036||linalg.py:532(cholesky(a:ndarray(dtype=float64, shape=(3, 3))))|
|200/100||0.003431||3.431e-05||0.005312||5.312e-05||linalg.py:106(_makearray(a:ndarray(dtype=float64, shape=(3, 3))))|
The recorded function signatures now include data types, and NumPy arrays also have dtype and shape information. In the above example, we’ve selected only the linear algebra calls from the execution of a PyMC model. Here we can clearly see the Cholesky decomposition is being done on 3×3 matrices, which would dictate our optimization strategy if
cholesky was the bottleneck in the code (in this case, it is not).
We’ve also integrated the SnakeViz profile visualization tool into the Accelerate profiler, so you can easily collect and view profile information right inside your Jupyter notebooks:
All it takes to profile a function and view it in a notebook is a few lines:
from accelerate import profiler p = profiler.Profile() p.run('my_function_to_profile()') profiler.plot(p)
MKL is perhaps best known for high performance, multi-threaded linear algebra functionality, but MKL also provides highly optimized math functions, like
cos() for arrays. Anaconda already ships with the numexpr library, which is linked against MKL to provide fast array math support. However, we have future plans for Accelerate that go beyond what
numexpr can provide, so in the latest release of Accelerate, we’ve exposed the MKL array math functions as NumPy ufuncs you can call directly.
For code that makes extensive use of special math functions on arrays with many thousands of elements, the performance speedup is quite amazing:
import numpy as np from accelerate.mkl import ufuncs as mkl_ufuncs def spherical_to_cartesian_numpy(r, theta, phi): cos_theta = np.cos(theta) sin_theta = np.sin(theta) cos_phi = np.cos(phi) sin_phi = np.sin(phi) x = r * sin_theta * cos_phi y = r * sin_theta * sin_phi z = r * cos_theta def spherical_to_cartesian_mkl(r, theta, phi): cos_theta = mkl_ufuncs.cos(theta) sin_theta = mkl_ufuncs.sin(theta) cos_phi = mkl_ufuncs.cos(phi) sin_phi = mkl_ufuncs.sin(phi) x = r * sin_theta * cos_phi y = r * sin_theta * sin_phi z = r, cos_theta return x, y, z n = 100000 r, theta, phi = np.random.uniform(1, 10, n), np.random.uniform(0, np.pi, n), np.random.uniform(-np.pi, np.pi, n) %timeit spherical_to_cartesian_numpy(r, theta, phi) %timeit spherical_to_cartesian_mkl(r, theta, phi) 100 loops, best of 3: 7.01 ms per loop 1000 loops, best of 3: 978 µs per loop
A speedup of 7x is not bad for a 2.3 GHz quad core laptop CPU from 2012. In future releases, we are looking to expand and integrate this functionality further into the Anaconda Platform, so stay tuned!
For more information about these new features, take a look at the Accelerate manual:
You can install Accelerate with
conda and use it free for 30 days:
conda install accelerate
Try it out, and let us know what you think. Academic users can get a free subscription to Anaconda (including several useful tools, like Accelerate) by following these instructions. Contact [email protected] to find out how to get a subscription to Anaconda at your organization.