Hundreds of millions of Python packages are downloaded using Conda every month. That's why we are excited to announce the release of condastats, a conda statistics API with Python interface and Command Line interface. Now anyone can use this tool to conduct research on usage statistics for Conda packages. This project is inspired by pypistats, which is a Python client and CLI for retrieving PyPI package statistics.
Data source
Since May 2019, we have published hourly summarized download data for all Conda packages, conda-forge channel, and a few other channels. The dataset starts in January 2017 and is uploaded once a month. Condastats is built on top of this public Anaconda package data and returns monthly package download statistics.Installation
condastats
is released on conda-forge
. To install condastats, run this command in your terminal:conda install -c conda-forge condastats
Command-line interface
There are five sub-commands in the condastats command: overall, pkg_platform, data_source, pkg_version, and pkg_python. Runcondastats --help
in terminal or run !condastats --help
in Jupyter Notebook to see all sub-commands:In [1]:
<span class="o"><span>!</span>condastats <span>--</span>help
</span>
overall
condastats overall
returns overall download statistics for one or more packages for specific months and for specified package platform, python version, package version, and data source. Run condastats overall --help
in terminal or run !condastats overall --help
in Jupyter Notebook for details:details:
In [2]:
<span class="o"> <span>!</span>condastats overall <span>--</span>help
</span>
The only required argument is
package
, which can be one or more packages. When only given package name(s), it will return the total package download number for all the available Anaconda public datasets, which is from 2017 through the end of last month. Here we show total package download statistics for one package (e.g., pandas), and for multiple packages (e.g., pandas, dask, and numpy).In [3]:<span class="o"><span>!</span>condastats overall pandas
</span>
In [4]:
<span class="o"><span>!</span>condastats overall pandas dask numpy
</span>
We can also get package download statistics for specified month, package platform, data source, package version, and python version:
In [5]:
<span class="o"><span>!</span>condastats overall pandas <span>--</span>month <span class="m">2019</span>-01 <span>--</span>pkg_platform linux-32 <span>--</span>data_source anaconda <span class="err">\</span>
<span class="o"><span>--</span><span class="n">pkg_version</span> <span class="mf">0.10</span><span class="o">.</span><span class="mi">0</span> <span class="o"><span>--</span><span class="n">pkg_python</span> <span class="mf">2.6</span>
</span></span></span>
And finally, when we pass in the
monthly
argument, we will get monthly values.In [6]:
<span class="o"><span>!</span>condastats overall pandas <span>--</span>start_month <span class="m">2019</span>-01 <span>--</span>end_month <span class="m">2019</span>-03 <span>--</span>monthly
</span>
pkg_platform, data_source, pkg_version, and pkg_python
The other four subcommands have similar functions:condastats pkg_platform
returns package download counts by package platform.condastats data_source
returns package download counts by the data source.condastats pkg_version
returns package download counts by package version.condastats pkg_python
returns package download counts by python version.
condastats pkg_platform --help
and condastats data_source --help
:In [7]:
<span class="o"><span>!</span>condastats pkg_platform <span>--</span>help
</span>
In [8]:
<span class="o"><span>!</span>condastats data_source <span>--</span>help
</span>
Same as
condastats overall
, we can specify a month, or provide the start month and the end month of the time period we are interested in. For example, we can see package download counts for each python version for pandas for a specific month.In [9]:
<span class="o"><span>!</span>condastats pkg_python pandas <span>--</span>month <span class="m">2019</span>-01
</span>
And we can see the monthly counts for each python version with the
monthly
flag.In [10]:
<span class="o"><span>!</span>condastats pkg_python pandas <span>--</span>start_month <span class="m">2019</span>-01 <span>--</span>end_month <span class="m">2019</span>-02 <span>--</span>monthly
</span>
Python interface
To use the Python interface, we need to import the functions from thecondastats
package by running:In [11]:
<span class="kn"><span>from</span> <span class="nn"><span>condastats.cli</span> <span class="k"><span>import</span> <span class="n">overall</span><span class="p">,</span> <span class="n">pkg_platform</span><span class="p">,</span> <span class="n">pkg_version</span><span class="p">,</span> <span class="n">pkg_python</span><span class="p">,</span> <span class="n">data_source</span>
</span></span></span>
Here are the function signatures for these five functions:
In [12]:
<span class="n">help</span><span class="p">(</span><span class="n">overall</span><span class="p">)</span>
In [13]:
<span class="n">help</span><span class="p">(</span><span class="n">pkg_platform</span><span class="p">)</span>
In [14]:
<span class="n">help</span><span class="p">(</span><span class="n">pkg_version</span><span class="p">)</span>
In [15]:
<span class="n">help</span><span class="p">(</span><span class="n">pkg_python</span><span class="p">)</span>
In [16]:
<span class="n">help</span><span class="p">(</span><span class="n">data_source</span><span class="p">)</span>
Similar to command-line interface, we can get the total package download counts for all the available data since 2017, for a given month, or a given combination of specifications:
<span class="n">overall</span><span class="p">([</span><span class="s1">'pandas'</span><span class="p">,</span><span class="s1">'dask'</span><span class="p">])</span>
Out[17]:
In [18]:
<span class="n">overall</span><span class="p">([</span><span class="s1">'pandas'</span><span class="p">,</span><span class="s1">'dask'</span><span class="p">],</span> <span class="n">month</span><span class="o">=</span><span class="s1">'2019-01'</span><span class="p">)</span>
Out[18]:
In [19]:
<span class="n">overall</span><span class="p">(</span><span class="s1">'pandas'</span><span class="p">,</span><span class="n">month</span><span class="o">=</span><span class="s1">'2019-01'</span><span class="p">,</span> <span class="n">pkg_platform</span><span class="o">=</span><span class="s1">'linux-32'</span><span class="p">,</span><span class="n">data_source</span><span class="o">=</span><span class="s1">'anaconda'</span><span class="p">,</span><span class="n">pkg_version</span><span class="o">=</span><span class="s1">'0.10.0'</span><span class="p">,</span><span class="n">pkg_python</span><span class="o">=</span><span class="mf">2.6</span><span class="p">)</span>
Out[19]:
Similarly, pkg_platform, pkg_version, pkg_python, and data_source functions will give us package counts for each package platform, package version, python version, and data source for a given package. Here are two examples with pkg_python:
In [20]:
<span class="n">pkg_python</span><span class="p">(</span><span class="s1">'pandas'</span><span class="p">,</span> <span class="n">month</span><span class="o">=</span><span class="s1">'2019-01'</span><span class="p">)</span>
Out[20]:
In [21]:
<span class="n">pkg_python</span><span class="p">(</span><span class="s1">'pandas'</span><span class="p">,</span> <span class="n">start_month</span><span class="o">=</span><span class="s1">'2019-01'</span><span class="p">,</span> <span class="n">end_month</span><span class="o">=</span><span class="s1">'2019-02'</span><span class="p">,</span> <span class="n">monthly</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
Out[21]:
We hope you find
condastats
useful! If you have any requests or issues, please open an issue or a pull request. If you have any questions regarding the Anaconda public dataset, please check out https://github.com/ContinuumIO/anaconda-package-data and open an issue there.