Python Data Visualization 2018: Where Do We Go From Here?

This post is the third in a three-part series on the current state of Python data visualization and the trends that emerged from SciPy 2018. By James A. Bednar As we saw in Part I and Part II of this series, having so many separate Python visualization libraries to choose from often is confusing to […]
Intake for Cataloging Spark

Intake is an open source project for providing easy pythonic access to a wide variety of data formats, and a simple cataloging system for these data sources. Intake is a new project, and all are encouraged to try and comment on it. pySpark is the python interface to Apache Spark, a fast and general purpose cluster computing […]
Using Pip in a Conda Environment

Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times, establishing a state that can be hard to reproduce. Most of these issues stem from that fact that conda, like other package managers, has limited abilities to control packages it did […]
Python Data Visualization 2018: Moving Toward Convergence

This post is the second in a three-part series on the current state of Python data visualization and the trends that emerged from SciPy 2018. In my previous post, I provided an overview of the myriad Python data visualization tools currently available, how they relate to each other, and their many differences. In this post […]
Understanding Conda and Pip

Conda and pip are often considered as being nearly identical. Although some of the functionality of these two tools overlap, they were designed and should be used for different purposes. Pip is the Python Packaging Authority’s recommended tool for installing packages from the Python Package Index, PyPI. Pip installs Python software packaged as wheels or […]
Deriving Business Value from Data Science Deployments

One of the biggest challenges facing organizations trying to derive value from data science and machine learning is deployment. In this post, we’l…
Python Data Visualization 2018: Why So Many Libraries?

This post is the first in a three-part series on the state of Python data visualization tools and the trends that emerged from SciPy 2018. By James A. Bednar At a special session of SciPy 2018 in Austin, representatives of a wide range of open-source Python visualization tools shared their visions for the future of […]
Choose Your Anaconda IDE Adventure: Jupyter, JupyterLab, or Apache Zeppelin

As humans we are faced with multiple choices every day. Every person is different: some people prefer Firefox while others like Chrome; some people prefe…
Who You Gonna Call? Halloween Tips & Treats to Protect You from Ghosts, Gremlins and Software Vulnerabilities

At Anaconda, we’re not too scared about things that go bump in the night. We’ve examined the data and concluded that it’s just the cleaning staff upstairs. We are, however, kept awake by the ever-present concern of the security and experience of our users! We’d like to take this opportunity to discuss some of the […]
Bringing Dataframe Acceleration to the GPU with RAPIDS Open-Source Software from NVIDIA

Today we are excited to talk about the RAPIDS GPU dataframe release along with our partners in this effort: NVIDIA, BlazingDB, and Quansight. RAPIDS is …