Enterprise Data Science
How PNC Financial Services Leveraged Anaconda to Enable Data Science and Machine Learning Capabilities Across the Company
Aug 29, 2018By Anaconda Team
As an AI software company passionate about the real-world practice of data science, machine learning, and predictive analytics, we take great pleasure in hearing about the inspiring and innovative ways our customers use our products to drive their businesses forward and change the worlds around them.
Earlier this year, we hosted our second annual AnacondaCON data science and AI conference (mark your calendars for AnacondaCON 2019!), where we got to watch several of our favorite customers share their experiences with and extol the virtues of the Anaconda Enterprise AI enablement platform. And we won’t lie to you, dear readers—observing their infectious enthusiasm for our work felt pretty good.
One such customer is PNC, a banking and financial services corporation operating in 19 states. Data Manager Ann Manchella and Data Scientist Jim Ogle regaled the AnacondaCON crowd with the story of how they went about building a data science "competency center" to enable data science and machine learning capabilities across the company. (Spoiler alert: they used Anaconda Enterprise.)
Making Python a First-Class Citizen
Back in 2015, PNC created a new Enterprise Data Management team that relied primarily on a proprietary data science platform. Based on their experiences, the team took it upon themselves to convince management to make the switch to open source analytics and make Python a first-class citizen in their analytics environment.
The main argument at the time was that using open source Python and R in lieu of SAS and other commercial alternatives would, of course, dramatically reduce software costs. But there were other compelling arguments to support their case.
The team found that Python allows for easier debugging, decreased development time, and improved performance. And the explosive growth of Python in recent years meant a larger pool from which to recruit new talent, a strong user base to provide online community support, and easily accessible, inexpensive training. Furthermore, Python and R include a huge collection of libraries—to use for everything from machine learning to visualization—that support the full analytics lifecycle.
Choosing Anaconda Enterprise as PNC’s AI Enablement Platform
Next, the team needed to choose an AI platform that would support their open source aspirations, while providing the security and governance that enterprise IT requires. They developed a proof of concept with Anaconda and two other vendors, using a variety of real estate residential data that they loaded into their Hadoop environment. The team soon discovered that Anaconda Enterprise provided their analysts with the powerful open source tools they wanted while enabling them to easily access data on their Hadoop cluster.
The beating heart of Anaconda Enterprise is the collection of core AI/ML tools in Python and R that data scientists use to build models. These tools are provided via Anaconda Distribution, the world’s most popular open source distribution of Python and R for data science. Anaconda Distribution brings powerful open source packages to over six million data scientists and developers worldwide, allowing them to build and train models on their laptops with ease.
The team found that Anaconda Enterprise was the ideal platform for porting SAS code to Python, training new data scientists, building, training, and deploying models, and storing packages. So the team chose Anaconda Enterprise as the core Python platform for PNC.
Building an Analytics Competency Center (ACC)
In 2017, the team began to set up an analytics competency center (ACC) to enable data science and machine learning capabilities across the entire company. The ACC comprised five main components: Community, Training, Help/Support, Enhancements, and Package Maintenance. With the Anaconda Enterprise platform, the ACC was able to institute an enterprise-wide capability for data science and machine learning, enabling the various departments within PNC to access tools, manage packages, build models, and receive training all from one place. The user base within PNC grew explosively, with departments from all over the bank eagerly taking advantage of the ACC’s many valuable services.
Other PNC departments quickly began leveraging the ACC to make models that predict losses, protect the bank, and set prices, and the Management Information Systems group also began making applications in Python that help with bank operations. Then Marketing came in with their market basket and pricing analyses, and even HR had metrics they wanted to process and report to stakeholders. The ACC was sure to bring the Model Risk Management department into the fold to help influence how to grow the ACC environment and validate their open source models within the strict regulatory community. So all across the bank, they’ve seen departments come on board.
Improving Traditional Modeling Methods
With Anaconda, the ACC is now doing all the modeling activities one would expect from a bank, including machine learning models, Probability of Default, Loss Given Default, Exposure at Default, Rate/Price Forecasting, Scorecard, and PPNR.
They were even able to improve upon traditional Monte Carlo methods, which are often used in banking applications. The ACC began by performing deterministic computation on individual sets of inputs, sending them to the Hadoop cluster and letting them run, and then aggregating the results. They took their event-level data from historical losses, and created an aggregate loss distribution through traditional Monte Carlo techniques. The team noticed that the two factors that affected performance were the number of simulations and the size of the dataset. So they implemented the simulation in PySpark, and compared their PySpark implementation with their existing SAS implementation by varying the two factors.
What their analysis showed was that, by using Python and Hadoop, they could drop run-times from hours to just minutes, making their data scientists faster and much more productive.
The team at PNC is looking forward to migrating over to Anaconda Enterprise 5.2. According to Ann, the ACC is particularly excited about the platform’s simple, one-click deployments, because what good is a model if it can’t be deployed into production? Anaconda Enterprise makes it easy to schedule or live deploy notebooks, dashboards, and machine learning models, and publish any project into production with the single click of a button.
As a bank spanning multiple states, PNC also needs to be able to collaborate and share with other users while ensuring security and governance. The developers need to be able to work together on centralized code bases, but from different locations. According to Ann and Jim, Anaconda has played a great role in making that happen for them, and the ACC expects that migrating to Anaconda 5.2 will make that even easier.
We at Anaconda loved hearing about PNC’s journey to making data science and machine learning capabilities available across the company, and cannot wait to see what’s in store for them next! If you’d like to learn more about what Anaconda Enterprise can do for your organization, we suggest you watch our recent webinar, Deploying AI at Scale, or reach out to us anytime to schedule a demo.