The past decade has seen compute capacity at the cluster scale grow faster than Moore’s Law. The relentless pursuit to exascale systems and beyond brings broad advances in the availability of a large amount of compute power to developers and users on “everyday” systems. Call it “trickle down” high performance computing if you like, but the effects are profound in the amount of computation that can be accessed. A teraflop system today can be easily had in a workstation, ready and able to tackle scientific compute problems, financial modeling exercises and plow through huge amounts of data for machine learning.

The past decade has seen compute capacity at the cluster scale grow faster than Moore’s Law. The relentless pursuit to exascale systems and beyond brings broad advances in the availability of a large amount of compute power to developers and users on “everyday” systems. Call it “trickle down” high performance computing if you like, but the effects are profound in the amount of computation that can be accessed. A teraflop system today can be easily had in a workstation, ready and able to tackle scientific compute problems, financial modeling exercises and plow through huge amounts of data for machine learning.

Programming of these high performance systems used to be the domain of native language developers who work in Fortran or C/C++, and scaling up and out with distributed computing via Message Passing Interface (MPI) to take advantage of cluster computing. While these languages are still the mainstay of high performance computing, scripting languages, such as Python, have been adopted by a broad community of users for its ease of use and short learning curve. While giving the ability for more users to do computing is a good thing, there is a limitation that makes it difficult for users of Python to get good performance. Namely, it is the global interpreter lock, or “GIL,” that runs in a single threaded mode and does not allow for any parallelism to take advantage of modern hardware with multicore/many-core and multi threaded CPUs. If only there was a way to make it easy and seamless to get performance from Python, we could broaden the availability of compute power to more users.

My colleagues at Intel in engineering and product marketing teams examined this limitation and saw that there were some solutions out there that were challenging to implement—thus began our close association with Continuum Analytics, a leader in the Open Data Science and Python community, to make these performance enhancements widely available to all. Collaboration with Continuum Analytics has helped us bring the Intel® Distribution for Python powered by Anaconda to the Python community, which leverages the Intel® Performance Libraries, such as Intel® Math Kernel Library, Intel® Data Analytics Library, Intel® MPI Library and Intel® Threading Building Blocks. The collaboration between Intel and Continuum Analytics helps provide a path to greater performance for Python developers and users.

And today, we are happy to announce a major milestone in our journey with the Intel Distribution. After a year in beta, the Distribution is now available in its first public version as the Intel® Distribution for Python 2017. It’s been a wild ride—the thrills of successful compiles and builds, the agony of managing dependencies, chasing down the bugs, the race to meet project deadlines, the highs of good press, the lows of post release reported errors—but overall, we have the satisfaction of having delivered a solid product.
                                    
Our work is not done. We will continue to push the boundaries of performance to enable more flops to more users to solve more computing challenges. Live long and Python!

Questions about the Intel® Distribution for Python powered by Anaconda? Read our FAQ.