When examining the success of one of the most influential and iconic rock bands of all time, there’s no doubt that talent played a huge role. However, it would be unrealistic to attribute the phenomenon that was The Beatles to musical talents alone. Much of their success can be credited to the behind-the-scenes work of trusted advisors, managers and producers. There were many layers beneath the surface that contributed to their incredible fame—including implementing the proper team and tools to propel them from obscurity to commercial global success.  

 

 

When examining the success of one of the most influential and iconic rock bands of all time, there’s no doubt that talent played a huge role. However, it would be unrealistic to attribute the phenomenon that was The Beatles to musical talents alone. Much of their success can be credited to the behind-the-scenes work of trusted advisors, managers and producers. There were many layers beneath the surface that contributed to their incredible fame—including implementing the proper team and tools to propel them from obscurity to commercial global success.  

 

Open Source: Where to Start

Similar to the music industry, success in Open Data Science relies heavily on many layers, including motivated data scientists, proper tools and the right vision for how to leverage data and perspective. Open Data Science is not a single technology, but a revolution within the data science community. It is an inclusive movement that connects open source tools for data science—preparation, analytics and visualization—so they can easily work together as a connected ecosystem. The challenge lies in figuring out how to successfully navigate the ecosystem and identifying the right Open Data Science enterprise vendors to partner with for the journey. 

Most organizations have come to understand the value of Open Data Science, but they often struggle with how to adopt and implement it. Some select a “DIY” method when addressing open source, choosing one of the languages or tools available at low or no cost. Others augment an open source base and build proprietary technology into existing infrastructures to address data science needs. 

Most organizations will engage enterprise-grade products and services when selecting other items, such as unified communication and collaboration tools, instead of opting for short-run cost-savings. For example, using consumer-grade instant messaging and mobile phones might save money this quarter, but over time this choice will end up costing an organization much more. This is due to the costs in labor and other services to make up for the lack of enterprise features, performance for enterprise use-cases and support and maintenance that is essential to successful production usage.  

The same standards apply for Open Data Science and the open source that surrounds this movement. While it is tempting to try and go at it alone with open source and avoid paying a vendor, there are fundamental problems with that strategy that will result in delayed deliverables, staffing challenges, maintenance headaches for software and frustration when the innovative open source communities move faster than an organization can manage or in a direction that is unexpected. All of this hurts the bottom line and can be easily avoided by finding an open source vendor that can navigate the complexity and ensure the best use of what is available in Open Data Science. In the next section, we will discuss three specific reasons it is important to choose vendors that can leverage open source effectively in the enterprise. 

Finding Success: The Importance of Choosing the Right Vendor/Partner

First, look for a vendor who is contributing significantly to the open source ecosystem. An open source vendor will not only provide enterprise solutions and services on top of existing open source, but will also produce significant open source innovations themselves—building communities like PyData, as well as contributing to open source organizations like The Apache Software Foundation, NumFOCUS or Software Freedom Conservancy. In this way, the software purchase translates directly into sustainability for the entire open source ecosystem. This will also ensure that the open source vendor is plugged into where the impactful open source communities are heading. 

Second, raw open source provides a fantastic foundation of innovation, but invariably does not contain all the common features necessary to adapt to an enterprise environment. Integration with disparate data sources, enterprise databases, single sign-on systems, scale-out management tools, tools for governance and control, as well as time-saving user interfaces, are all examples of things that typically do not exist in open source or exist in a very early form that lags behind proprietary offerings. Using internal resources to provide these common, enterprise-grade additions costs more money in the long run than purchasing these features from an open source vendor. 

The figure on the left below shows the kinds of ad-hoc layers that a company must typically create to adapt their applications, processes and workflows to what is available in open source. These ad-hoc layers are not unique to any one business, are hard to maintain and end up costing a lot more money than a software subscription from an open source vendor that would cover these capabilities with some of their enterprise offerings. 

The figure on the right above shows the addition of an enterprise layer that should be provided by an open source vendor. This layer can be proprietary, which w ill enable the vendor to build a sustainable software business that attracts investment, while it solves the fundamental adaptation problem as well.  As long as the vendor is deeply connected to open source ecosystems and is constantly aware of what part of the stack is better maintained as open source, businesses receive the best of supported enterprise software without the painful lock-in and innovation gaps of traditional proprietary-only software. 

Maintaining ad-hoc interfaces to open source becomes very expensive, very quickly.   Each interface is typically understood by only a few people in an organization and if they leave or move to different roles, their ability to make changes evaporates. In addition, rather than amortizing the cost of these interfaces over thousands of companies like a software vendor can do, the business pays the entire cost on their own. This discussion does not yet include the opportunity cost of tying up internal resources building and maintaining these common enterprise features instead of having those internal resources work on the software that is unique to a business. The best return from scarce software development talent is on software critical to a business that gives them a unique edge. We have also not discussed the time-to-market gaps that occur when organizations try to go at it alone, rather than selecting an open source vendor who becomes a strategic partner. Engaging an open source vendor who has in-depth knowledge of the technology, is committed to growing the open source ecosystem and has the ability to make the Open Data Science ecosystem work for enterprises, saves organizations significant time and money. 

Finally, working with an open source vendor provides a much needed avenue for the integration services, training and long-term support that is necessary when adapting an open source ecosystem to the enterprise. Open source communities develop for many reasons, but they are typically united in a passion for rapid innovation and continual progress. Adapting the rapid pace of this innovation to the more methodical gear of enterprise value creation requires a trusted open source vendor. Long-term support of older software releases, bug fixes that are less interesting to the community but essential to enterprises and industry-specific training for data science teams are all needed to fully leverage Open Data Science in the enterprise. The right enterprise vendor will help an enterprise obtain all of this seamlessly. 

The New World Order: Adopting Open Data Science in the Enterprise

The journey to executing successful data science in the enterprise lies in the combination of the proper resources and tools. In general, in-house IT does not typically have the expertise needed to exploit the immense possibilities inherent to Open Data Science.  

Open Data Science platforms, like Anaconda, are a key mechanism to adopting Open Data Science across an organization. These platforms offer differing levels of empowerment for everyone from the citizen data scientist to the global enterprise data science team. Open Data Science in the enterprise has different needs from an individual or a small business. While the free foundational core of Anaconda may be enough for the individual data explorer or the small business looking to use marketing data to target market segments, a large enterprise will typically need much more support and enterprise features in order to successfully implement open source and therefore Open Data Science across their organization. Because of this, it is critical that larger organizations identify an enterprise open source vendor to both provide support and guidance as they implement Open Data Science.  This vendor should also be able to provide that enterprise layer between the applications, processes and workflows that the data science team produces and the diverse open source ecosystem. The complexity inherent to this process of maximizing insights from data will demand proficiency from both the team and vendors, in order to harness the power of the data to transform the business to one that is first data-aware and then data-driven. 

Anaconda allows enterprises to innovate faster. It exposes previously unknown insights and improves the relationship between all members of the data science team. As a platform that embraces and deeply supports open source, it helps businesses to take full advantage of both the innovation at the core of the Open Data Science movement, as well as the enterprise adaptation that is essential to leveraging the full power of open source effectively in the business. It’s time to remove the chaos from open source and use Open Data Science platforms to simplify things, so that enterprises can fully realize their own superpowers to change the world. 


About the Author

Travis Oliphant

President, Chief Data Scientist & Co-Founder

Travis holds a PhD from the Mayo Clinic and BS and MS degrees in mathematics and electrical engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably …

Read more

Join the Disucssion