Scalable Machine Learning in the Enterprise with Dask


You’ve been hearing the hype for years: machine learning can have a magical, transformative impact on your business, putting key insights into the hands of decision-makers and driving industries forward. But many organizations today still struggle to extract value from their machine learning initiatives. Why?

Building & Training Models on Your Laptop is No Longer Good Enough

One of the biggest reasons machine learning projects fail to produce tangible results in the enterprise is the inability to scale models. Leading data scientists understand that, by harnessing larger training sets, they can build models that are more effective.

According to Open AI, the largest AI training runs have increased exponentially every 3.5 months. Their research shows that compute has grown by 300,000x since AlexNet was released in 2012.

What this means is that data scientists must plan to scale their model training, as simply building and training models on their laptops is no longer good enough. But even the most popular tools for machine learning are not designed to scale. Scikit-learn, for example, works well with data that fits on a local machine, but when your data volumes require multiple cores or nodes, it cannot help you.

Historically, data scientists have turned to distributed computing frameworks like Spark to train large datasets. This approach is not ideal, however. Assuming an enterprise data science team has access to a Spark cluster, they still must rewrite the code in Spark, which is both time-consuming and introduces the potential for reproducibility errors.

However, as we learned in Anaconda Data Scientist Tom Augspurger’s recent webinar, Scalable Machine Learning with Dask, there is now an easy alternative for scaling model training: Dask.

Scale Your Machine Learning Systems in the Enterprise with Dask

With Dask, data scientists can use the familiar APIs they know—including scikit-learn, XGBoost, and TensorFlow—and, with slight modification, scale their model training to thousands of nodes on a cluster. As Tom demonstrated, data scientists can now perform compute-intensive tasks like hyper-parameter optimization on large datasets with relative ease. This means that model training takes less time and model accuracy improves.

Even better, Dask scales down nicely. For data volumes that don’t fit in memory but are not so large that a cluster is required, Dask can still parallelize computation on a single machine. This is extremely useful for data scientists that experiment with samples of their data. With Dask, they do not have to rewrite their code as they increase from sample data to larger training sets, saving them significant time.

While Dask is open source and freely available, many of our enterprise customers leverage Dask as part of their Anaconda Enterprise deployments. Anaconda Enterprise is the AI enablement platform for data science teams at scale. It offers not only a scalable environment to build and train models on infrastructure ranging from a single machine to thousands of nodes, but also a simple, robust platform for model deployment and management.

With the single click of a button, data scientists can deploy models into production in seconds, quickly delivering insights into the hands of decision-makers without IT intervention or laborious server-side coding. Meanwhile, Anaconda Enterprise keeps IT happy by offering security, governance, and scale that provide them with control without sacrificing productivity.

If you missed Tom’s webinar, you can check it out here on-demand. To learn more about how Anaconda Enterprise can take your organization to the next level, contact us anytime for a demonstration.

You May Also Like

Enterprise Data Science
4 Ways Financial Firms Put Machine Learning to Work
Several industry giants in the finance sector are well on their way to implementing machine learning technology that improves operations and guides strategy in multiple depart...
Read More
What You Missed on Day Three of AnacondaCON 2018
And that’s a wrap! Yesterday was the third and final day of AnacondaCON 2018, and what a ride it’s been. Read some highlights from what you missed, and stay tuned for our ...
Read More
AnacondaCON 2019 Day 3 Recap: The Need for Speed, “Delightful UX” in Dev Tools, LOTR Jokes and More.
Everyone at Anaconda is still feeling the love AnacondaCON 2019. Day 3 wrapped up last Friday with one more day of talks and sessions, highlighted by some powerhouse keynotes....
Read More