Open Source is at the Core of Modern Software

Innovation through open collaboration has changed the technology industry forever

Anaconda Open Source Ecosystem

At Anaconda, we value open source software

We believe it is a privilege to be able to share ideas-as-code with people around the world. We continuously seek productive, sustainable ways to strengthen the open source foundation and create the architecture of the future. Through our work, we aim to empower people to improve lives and solve the world’s greatest challenges.

Innovating with New Projects to Meet Enterprise Needs

While much of the software we write at Anaconda is open source from the beginning, some of our software is not immediately freely available. Our software provides livelihoods for developers, which allows them to focus on writing software contributions to open source. We hope you will download our software and be satisfied with both the software and the knowledge that you are contributing to the present and future ecosystem of open source.

Blaze scales Python analytics to Big Data on multiple compute engines

Fast, scalable out-of-core computations on Big Data

Blaze extends successful model of array-oriented programming of NumPy and pandas to out-of-core, distributed and streaming data. Blaze allows analysts and scientists to productively write robust and efficient code, without getting bogged down in the details of how to distribute computation for all kinds of data, but especially semi-structured, sparse, and columnar data.

Blaze supports data stores and stream engines including:

  • Bcolz compressed columnar
  • MongoDB NoSQL store
  • SQLAlchemy SQL store
  • Apache Spark cluster computing framework
  • PyTables high performance HDF5
  • Streaming Python streaming data

Bokeh scales visualization to Big Data

Interactive and real-time streaming visualization framework that scales to Big Data with data shading

Bokeh is a framework for creating versatile, interactive, browser-based visualizations of streaming or Big Data from Python, R or Scala, without writing any JavaScript. Its primary output backend is HTML5 Canvas.

There are many excellent plotting packages for Python, but they generally do not optimize for the particular needs of statistical plotting or multidimensional datasets. Additionally, advanced visual customization is typically difficult for non-programmers, and most libraries do not build a reified data processing pipeline that supports rich interactivity like linked brushing. Bokeh addresses these problems at their core by using a declarative data transformation scheme, and is engineered to operate in a client/server model for the modern web.

Read more about Bokeh

conda easily packages Python, R, NumPy, SciPy & more

Eliminates package dependency and version control issues

Conda is an innovative package manager tool that allows users to mix-and-match different versions of Python, NumPy, SciPy and other packages in isolated environments and easily switch between them.

The conda command is the primary interface for managing Anaconda installations. It is great for solving enterprise integration and application deployment challenges. It can query and search the Anaconda package index and current Anaconda installation, create new Anaconda environments, and install and update packages into existing Anaconda environments.

Dask parallelizes data science workloads on multi-cores and distributed clusters

Makes it easy to write complex parallel algorithms for task execution

Dask is a framework used to easily parallelize algorithms that takes advantage of the available memory and computer power to maximize memory, execution time and performance of complex algorithms. Dask creates a task graph based on the data and then intelligently schedules the execution of the tasks to optimize throughput.

While developers can parallelize Python manually, Dask helps to automate the task with rich primitives that are aware of the execution environment and optimize the analytic execution. Dask collections build on Dask to provide dask.array and dask.dataframe, collections that mimic NumPy and pandas but operate in parallel and on larger-than-memory datasets.

Report bugs and make feature requests through the GitHub issue tracker. For community discussion, please use [email protected]

Datashader is a graphics pipeline system for creating meaningful representations of large amounts of data

Datashader breaks the creation of images into a series of explicit steps that allow computations to be done on intermediate representations. This approach allows accurate and effective visualizations to be produced automatically, and also makes it simple for data scientists to focus on particular data and relationships of interest in a principled way. Using highly optimized rendering routines written in Python but compiled to machine code using Numba, datashader makes it practical to work with extremely large datasets even on standard hardware.

Holoviews is a library for analyzing and visualizing scientific or engineering data

HoloViews is a Python library that makes analyzing and visualizing scientific or engineering data much simpler, more intuitive, and more easily reproducible. Instead of specifying every step for each plot, HoloViews lets you store your data in an annotated format that is instantly visualizable, with immediate access to both the numeric data and its visualization.

Jupyter Notebooks allow you to create and share documents that contain live code, equations, visualizations and explanatory text

The Notebook has support for over 40 programming languages, including those popular in Data Science such as Python, R, Julia and Scala. Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime.

matplotlib is an easy-to-use interactive tool for publication-quality scientific plotting

matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits.

Numba speeds up NumPy and SciPy

Compiles Python into machine code for lightning fast execution

Numba is a compiled version of NumPy and SciPy. It uses the LLVM compiler infrastructure to compile Python byte-code to machine code for use in the NumPy run-time and SciPy modules.

NumPy provides fast vectors, matrices, and arrays in Python

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

NumPy is licensed under the BSD license, enabling reuse with few restrictions.

pandas is a fast, flexible, and expressive data structures for working with relational or labeled data

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

PhosphorJS simplifies and speeds up web apps

Fast, flexible, and efficient web framework

PhosphorJS is a framework for building high performance, pluggable, desktop style web applications that integrates easily with existing web frameworks. The PhosphorJS framework has well-defined, efficient widgets and layouts that allow a developer to design high performance, responsive desktop style apps for the web that consistently achieve sub-millisecond layouts. This efficient design maximizes the execution speed of business logic.

Learn More About PhosphorJS

SciPy is a rich, powerful library for scientific computing

SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages:

  • NumPy Base N-dimensional array package
  • SciPy library Fundamental library for scientific computing
  • Matplotlib Comprehensive 2D Plotting
  • IPython Enhanced Interactive Console
  • Sympy Symbolic mathematics
  • pandas Data structures & analysis

Spyder is the scientific python development environment

Powerful interactive development and numerical computing environment for Python

Spyder is a powerful interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features and a numerical computing environment thanks to the support of IPython (enhanced interactive Python interpreter) and popular Python libraries such as NumPy (linear algebra), SciPy (signal and image processing) or matplotlib (interactive 2D/3D plotting).

Spyder may also be used as a library providing powerful console-related widgets for your PyQt-based applications – for example, it may be used to integrate a debugging console directly in the layout of your graphical user interface.