PyData Dallas, the first PyData conference in Texas, is taking place next week, April 24-26, 2015. PyData has been a wonderful conference for fostering the Python community and giving developers and other Python enthusiasts the opportunity to share their ideas, projects and the future of Python. Continuum Analytics is proud to be a founding sponsor for such an innovative, community-driven conference.
The Continuum team will have seven speakers and eight talks/tutorials at PyData Dallas, including a keynote from our co-founder, Peter Wang. We’ll also have a booth set up, with plenty of t-shirts, stickers, hiring updates, demos, and more. Please stop by to see us, check out our talks and tutorials, and email [email protected] if you’d like to schedule a meeting – many members of our team will be attending and would love to speak with you about any of our open source or enterprise tools.
Registration for the conference is still open – get your tickets here.
Take a look at our talks and tutorials below, and find the full PyData Dallas schedule here.
Friday, April 24
Bokeh is a Python interactive visualization library for large datasets that natively uses the latest web technologies. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity over large data to thin clients. This tutorial will walk users through the steps to create different kinds of interactive plots using Bokeh. We will cover using Bokeh for static HTML output, the IPython notebook, and plot hosting and embedding using the Bokeh server.
Building Python Data Applications with Blaze & Bokeh, Andy Terrel (@aterrel) & Christine Doig (@ch_doig)
We use the Blaze and Bokeh libraries to interactively query and visualize large datasets through Python. Blaze provides a consistent query experience on data ranging from small local CSV files to large remote Impala or Spark clusters. It automates data migration and brings the power of other database systems into the hands of the armchair analyst. Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. It provides elegant, concise construction of novel graphics in the style of D3.js, but also delivers this capability with high-performance interactivity over large or streaming datasets.
Saturday, April 25
Python in Scientific Research, Andy Terrel
Whether you’re modelling an earthquake, hurricane, or medical device, Python is there. The language has become so ubiquitous in scientific research that it is the go-to tool. In this presentation, I present different modalities that we see Python being used and how it continues to slither its way into new business. Whether you are running on a large cluster or teaching some new students how to get their next big idea done, Python delivers as a language that can be used by all, scale as needed, and not get in your way while it does it.
Blaze is a library for harnessing the power of big data technologies. We show motivating use cases illustrating why you might want to use Blaze, including a comparison of out-of-core pandas to other backends designed to scale both horizontally and vertically. Time permitting, we’ll show how easy it is for users of Blaze to scratch their own itch by hooking an existing API into blaze via a small set of multiply dispatched functions.
Sunday, April 26
Python is for the Curious, Jon Riehl
What is it you are curious about? If you are more curious about your data than your tools, then Python is for you. If you are more curious about your tools than your data, then Python is for you. If you are curious, then Python is for you.
Data are always messy and ill-formatted. We spend seemingly unnecessary amounts of hours writing software to convert between common formats, databases, and newer filesystems. Typically, we spend just enough mental energy to get the job done — hopefully, giving us more time in the next stage of the data pipeline. This results in non-performant, non-reusable, non-extensible code. In this talk we present Odo, a new open-source software package which simplifies and eases common data migration tasks. Odo can seamlessly migrate between CSVs, JSON, Dataframes, and Databases, just as easily as it can migrate between NumPy Arrays, HDF5, HDFS, and S3 — and everything in between and much more. We will cover different real-world use cases and scenarios and compare these with the “common” answers repeated amongst us data mungers.
Reproducible Multi-Language Data Science with Conda, Christine Doig
Reproducibility is one of the main principles of the scientific method to ensure that our analysis and results are reproducible by anyone. As Data Science projects grow in variety (applications, libraries, standalone analysis…) and complexity (DBs, computing engines, multiple programming languages, backwards incompatibilities…) we need solutions to handle reproducibility in every case. In this talk, we’ll explore how conda, a cross-platform package manager written in Python, can make our lives simpler and our Data Science projects easily shareable and reproducible.
Follow us on Twitter @ContinuumIO for up-to-date info during PyData Dallas!