Anaconda Supports Pandata, the Scalable Open-Source Analysis Stack for High-Powered Scientific Data Analysis
The Pandata stack is a groundbreaking set of general-purpose and powerful open-source data analytics tools offering fully scalable data processing for any domain.
AUSTIN, TX – August 8, 2023 – Anaconda Inc., provider of the world’s most popular data science & AI platform, today announced its support of the Pandata stack, providing freely available open-source data analytics tools for processing data of any size and for any domain. The collection of tools includes high-performance, cloud-friendly, OS-independent Python libraries offering data analysis, visualization, and processing at scale, and across all areas of research, development, and operations.
Python has become the most popular programming language in the world, partially due to the wide range of open-source libraries available that cover almost any area of science, engineering, and data analysis. However, many available tools are highly specialized and restricted to addressing small problems in confined domains.
Pandata’s stack of general-purpose, interoperable, and compositional tools includes Dask, Xarray, Numba, hvPlot, Jupyter, and more, providing a versatile and sustainable shared platform for data analysis and scientific computation. Collectively, Pandata covers the landscape of data access, distributed computation, and interactive visualization across any domain or scale, letting researchers and practitioners in each field focus on the much smaller set of code that is required for their own specific domain.
As stewards of the open-source Python ecosystem, Anaconda has invested heavily into the creation of many of the tools within the Pandata stack, including Dask, Bokeh, and Panel. Additionally, Anaconda staff contribute extensively to other tools in the Pandata stack to ensure all components work well together and to eliminate data bottlenecks. The critically important work of coordinating these tools puts free, high-performance, and trusted data tools into the hands of data scientists, researchers, and developers for the first time.
“As the scale of scientific data analysis grows, traditional domain-specific software tools are hitting limits when managing increased data size and complexity,” said Dr. James Bednar, Director of Custom Services at Anaconda. “Within Python’s open-source ecosystem, general data-processing tools allow for flexibility and cross-domain collaboration with data of any kind. The Pandata stack takes advantage of this open-source model to bring a wide array of functionality to the traditional data stack that overcomes the limitations of legacy domain-specific stacks.”
The tools making up the Pandata stack are maintained independently and can be utilized individually. However, when the various tools are used together, they create a firm foundation for processing data of nearly any kind. The Pandata Python libraries are:
- Domain independent: Maintained, used, and tested by people from many different backgrounds
- Efficient: Run at machine-code speeds using vectorized data or JIT compilation
- Scalable: Run on anything from a single-core laptop to a thousand-node cluster
- Cloud friendly: Fully usable for local or remote compute using data on any file storage system
- Multi-architecture: Run on your desktop and on Mac/Windows/Linux CPUs and GPUs
- Scriptable: Fully support batch mode for parameter searches and unattended operation
- Compositional: Select which tools you need and put them together to solve your problem
- Visualizable: Support rendering even the largest datasets without conversion or approximation
- Interactive: Support fully interactive exploration, not just rendering static images or text files
- Shareable: Deployable as web apps for use by anyone anywhere
- OSS: Free, open, and ready for research or commercial use, without restrictive licensing
“We use the tools available in the Pandata stack on a daily basis for everything from building backend data engineering workflows to user-facing, data-driven applications,” said Dharhas Pothina, CTO Quansight. “Having a healthy ecosystem of interoperable tools that scales from exploratory research to production deployments on massive datasets is critical to solving the hard problems our clients encounter. As avid users and contributors to Pandata, we are excited for its future.”
Anaconda is proud to support the open-source Python community and contribute towards advancing science and industry with tools like the Pandata stack. With its “mix and match” approach, data professionals across industries can implement data workflows that best fit their needs. To learn more about the Pandata stack, please visit github.com/panstacks/pandata and follow Anaconda on Twitter and LinkedIn.
With more than 35 million users, Anaconda is the world’s most popular platform to develop and deploy secure Python solutions, faster. We pioneered the use of Python for data science, champion its vibrant community, and steward the open-source projects behind tomorrow’s artificial intelligence (AI) and machine learning (ML) breakthroughs. Our solutions enable practitioners and institutions around the world to securely harness the power of open source for competitive advantage and groundbreaking discoveries.
Visit Anaconda.com to learn more.