Data science innovation requires availability, transparency and interoperability. But what does that mean in practice? At Anaconda, it means providing data scientists with open source tools that facilitate collaboration; moving beyond analytics to intelligence. Open source projects are the foundation of modern data science and are popping up across industries, making it more accessible, more interactive and more effective. So, who’s leading the open source charge in the data science community? Here are five organizations to keep your eye on:

1. TaxBrain. TaxBrain is a platform that enables policy makers and the public to simulate and study the effects of tax policy reforms using open source economic models. Using the open source platform, anyone can plug elements of the administration’s proposed tax policy to get an idea of how it would perform in the real world.

2. Recursion Pharmaceuticals. Recursion is a pharmaceutical company dedicated to finding the remedies for rare genetic diseases. Its drug discovery assay is built on an open source software platform, combining biological science with machine learning techniques to visualize cell data and test drugs efficiently. This approach shortens research and development process, reducing time to market for remedies to these rare genetic diseases. Their goal is to treat 100 diseases by 2026 using this method.

3. The U.S. Government. Under the previous administration, the U.S. government launched Data.gov, an open data initiative that offers more than 197K datasets for public use. This database exists, in part, thanks to the former U.S. chief data scientist, DJ Patil. He helped drive the government’s data science projects forward, at the city, state and federal levels. Recently, concerns have been raised over the the Data.gov portal, as certain information has started to disappear. Data scientists are keeping a sharp eye on the portal to ensure that these resources are updated and preserved for future innovative projects.

4. Comcast. Telecom and broadcast giant, Comcast, run their projects on open source platforms to drive data science innovation in the industry. 

For example, earlier this month, Comcast’s advertising branch announced they were creating a Blockchain Insights Platform to make the planning, targeting, execution and measurement of video ads more efficient. This data-driven, secure approach would be a game changer for the advertising industry, which eagerly awaits its launch in 2018.

5. DARPA. The Defense Advanced Research Projects Agency (DARPA) is behind the Memex project, a program dedicated to fighting human trafficking, which is a top mission for the defense department. DARPA estimates that in two years, traffickers spent $250 million posting the temporary advertisements that fuel the human trafficking trade. Using an open source platform, Memex is able to index and cross reference interactive and social media, text, images and video across the web. This allows them to find the patterns in web data that indicate human trafficking. Memex’s data science approach is already credited in generating at least 20 active cases and nine open indictments. 

These are just some of the examples of open source-fueled data science turning industries on their head, bringing important data to the public and generally making the world a better place. What will be the next open source project to put data science in the headlines? Let us know what you think in the comments below!