2018 Anaconda State of Data Science Report Released


We at Anaconda greatly value our data science community and are always striving to learn more about how you are using our products and how we can improve your overall experience.

With this goal in mind, we recently launched our first Anaconda State of Data Science Survey to gain a better understanding of what users accomplish with Anaconda, what they think about it, and the data sources, visualization, and scale-out approaches they are using. The survey, which ran from March 22 to April 30, 2018, resulted in 4,218 responses with a 100 percent survey completion rate.

In addition to giving us key insights into how to improve our products, our resulting report reveals current trends in data science and machine learning within the Anaconda community.

The State of Data Science

The Anaconda State of Data Science is strong. With 2 to 2.5 million downloads per month during January to March 2018, Anaconda is easily the most popular Python distribution, with a growing R following. Key findings of the survey include:

  • Applying cloud-native technologies such as Docker containers and Kubernetes to data science is growing at the expense of traditional Big Data (Hadoop/Spark).
  • Google Cloud’s data services outrank those of Amazon Web Services (AWS) and Microsoft Azure. Although Google Cloud is the third largest cloud provider, its focus on data services is paying off with the Anaconda community.
  • Anaconda is gaining popularity with software developers (15%), in addition to data scientists (16%) and academics (16%).
  • Matplotlib continues to enjoy its first-mover advantage in visualization, sweeping the category, but it is a highly-crowded space with many strong competitors, both open source and commercial. Plotly, Tableau, Microsoft Power BI, and Tibco Spotfire are all strong commercial competitors to Matplotlib and other open source projects like ggplot, Bokeh, D3, and Altair.
  • It matters a lot to our users that Anaconda is free, but not so much that it is open source. Free was ranked the most important attribute, while the open source licensing was second to last.

To learn more about the 2018 State of Data Science, we invite you to download the full report here.

We’d like to thank all the respondents for taking the time to complete our survey and help us gain insights into the state of data science in 2018. Let’s do it again next year!

You May Also Like

For Practitioners
Intake: Taking the Pain out of Data Access
By Martin Durant, Anaconda Software Engineer We are pleased to announce the release of Intake, a simple data access layer and cataloging system. This article contains code exe...
Read More
Data Science And Deep Learning Application Leaders Form GPU Open Analytics Initiative
Continuum Analytics, H2O.ai and MapD Technologies Create Open Common Data Frameworks for GPU In-Memory Analytics SAN JOSE, CA—May 8, 2017—Continuum Analytics, H2O.ai, and...
Read More
For Practitioners
How to Troubleshoot Python Software in Anaconda Distribution
Below is a question that was recently asked on StackOverflow and I decided it would be helpful to publish an answer explaining the various ways in which to troubleshoot a prob...
Read More