Dont’ miss talks and tutorials from the Continuum team at PyData DC! Learn more about Parallel Python with Matt Rockling in his ‘Parallel Python – Analyzing Large Data Sets‘ tutorial with Aron Ahmadia on Friday, October 7, at 11:30AM, and get a dose of Dask in Matt Rocklin’s talk ‘Dask for Ad-Hoc Distributed Computing‘ on Sunday, October 9, at 11:30AM.
Also, Continuum CTO & co-founder Peter Wang will be presenting a Keynote Address along with Intel’s Robert Cohn on Saturday, October 8, at 10:00AM.
For the first half, we will cover basic ideas and common patterns encountered when analyzing large data sets in parallel. We start by diving into a sequence of examples that require increasingly complex tools. From the most basic parallel API: map, we will cover some general asynchronous programming with Futures, and high level APIs for large data sets, such as Spark RDDs and Dask collections, and streaming patterns. For the second half, we focus on traits of particular parallel frameworks, including strategies for picking the right tool for your job. We will finish with some common challenges in parallel analysis, such as debugging parallel code when it goes wrong, as well as deployment and setup strategies.
TUTORIAL: Anaconda, Dhavide Aruliah, @dhavidearuliah
This tutorial (aimed at novices) is tailored to get you started using Anaconda. If you’re not already using Anaconda, bring your laptop and we’ll get you set up. You don’t need to have anything pre-installed, but it may be helpful to have Anaconda downloaded (or perhaps Miniconda, the lightweight version of Anaconda for faster downloading) before the tutorial. You’ll get an overview of the Anaconda ecosystem and a more in-depth discussion of conda and Anaconda Cloud. You’ll have a chance to experiment hands-on with both conda and Anaconda Cloud.
TALK: Dask for Ad-Hoc Distributed Computing, Matt Rocklin, @mrocklin
This talk lays out the benefits and challenges of parallelizing a numeric analytic stack, and then describes Dask, a parallel framework gaining traction within the Python community for interactive performant parallel computing, and finally goes through a few domains where this work is enabling novel science today.
Continuum is the founding sponsor of PyData Conferences, and we’re proud to have a presence at these important open source conferences all over the world. Our team looks forward to seeing our #AnacondaCrew in Washington, D.C. and talking all things Python, data science and open source.