Software Developer Siu Kwan Lam and Data Scientist Daniel Rodriguez will present talks at Spark Summit 2016. Siu will discuss GPU Computing with Apache Spark and Python on Tuesday, June 7 at 4:15PM, and Daniel will present Connecting Python to the Spark Ecosystem on Wednesday, June 8 at 11:15AM.
TALK: GPU Computing with Apache Spark and Python, Siu Kwan Lam
Siu will demonstrate how Python and the Numba JIT compiler can be used for GPU programming that easily scales from your workstation to an Apache Spark cluster. Using an example application, he will show how to write CUDA kernels in Python, compile and call them using the open source Numba JIT compiler, and execute them both locally and remotely with Spark. Siu will also describe techniques for managing Python dependencies in a Spark cluster with the tools in the Anaconda Platform. Finally, he’ll conclude with some tips and tricks for getting best performance when doing GPU computing with Spark and Python.
TALK: Connecting Python to the Spark Ecosystem, Daniel Rodriguez (@danielfrg)
In this talk, Daniel will discuss different ways to use Anaconda with Spark. He will discuss techniques that combine the best of both worlds, Spark’s RDD abstraction that gives us big data and Python scientific library collection to make the work faster and easier. The examples will attempt to cover multiple areas within data analysis: the canonical word count using pure Python, natural language processing using NLTK, machine learning using scikit-learn and TensorFlow, image analysis using SciPy and Numba, image analysis and deep learning with GPUs and caffe.