For Practitioners

Get Python Package Download Statistics with Condastats

Dec 02, 2019
[email protected]
Hundreds of millions of Python packages are downloaded using Conda every month. That's why we are excited to announce the release of condastats, a conda statistics API with Python interface and Command Line interface. Now anyone can use this tool to conduct research on usage statistics for Conda packages. This project is inspired by pypistats, which is a Python client and CLI for retrieving PyPI package statistics.

Data source

Since May 2019, we have published hourly summarized download data for all Conda packages, conda-forge channel, and a few other channels. The dataset starts January 2017 and is uploaded once a month. Condastats is built on top of this public Anaconda package data and returns monthly package download statistics.

Installation

condastats is released on conda-forge. To install condastats, run this command in your terminal:conda install -c conda-forge condastats

Command line interface

There are five sub-commands in the condastats command: overall, pkg_platform, data_source, pkg_version, and pkg_python. Run condastats --help in terminal or run !condastats --help in Jupyter Notebook to see all sub-commands:
In [1]:
<span class="o"><span>!</span>condastats <span>--</span>help
</span>
usage: condastats [-h]
 {overall,pkg_platform,data_source,pkg_version,pkg_python}
 ...

positional arguments:
 {overall,pkg_platform,data_source,pkg_version,pkg_python}

optional arguments:
 -h, --help show this help message and exit

overall

condastats overall returns overall download statistics for one or more packages for specific months and for specified package platform, python version, package verion, and data source. Run condastats overall --help in terminal or run !condastats overall --help in Jupyter Notebook for details:
details:
In [2]:
<span class="o"> <span>!</span>condastats overall <span>--</span>help
</span>
usage: condastats overall [-h] [--month MONTH] [--start_month START_MONTH]
 [--end_month END_MONTH] [--monthly]
 [--pkg_platform PKG_PLATFORM]
 [--pkg_python PKG_PYTHON]
 [--pkg_version PKG_VERSION]
 [--data_source DATA_SOURCE]
 package [package ...]

positional arguments:
 package package name(s)

optional arguments:
 -h, --help show this help message and exit
 --month MONTH month - YYYY-MM (defalt: None)
 --start_month START_MONTH
 start month - YYYY-MM (defalt: None)
 --end_month END_MONTH
 end month - YYYY-MM (defalt: None)
 --monthly return monthly values (defalt: False)
 --pkg_platform PKG_PLATFORM
 package platform e.g., win-64, linux-32, osx-64.
 (defalt: None)
 --pkg_python PKG_PYTHON
 Python version e.g., 3.7 (defalt: None)
 --pkg_version PKG_VERSION
 Python version e.g., 0.1.0 (defalt: None)
 --data_source DATA_SOURCE
 Data source e.g., anaconda, conda-forge (defalt: None)
The only required argument is package, which can be one or more packages. When only given package name(s), it will return the total package download number for all the available Anaconda public dataset, which is from 2017 till the end of last month. Here we show total package download statistics for one package (e.g., pandas), and for multiple packages (e.g., pandas, dask, and numpy).In [3]:
<span class="o"><span>!</span>condastats overall pandas
</span>
pkg_name
pandas 24086379
Name: counts, dtype: int64
In [4]:
<span class="o"><span>!</span>condastats overall pandas dask numpy
</span>
pkg_name
dask 7958854
numpy 53752580
pandas 24086379
Name: counts, dtype: int64
We can also get package download statistics for speficied month, package platform, data source, package version, and python version:
In [5]:
<span class="o"><span>!</span>condastats overall pandas <span>--</span>month <span class="m">2019</span>-01 <span>--</span>pkg_platform linux-32 <span>--</span>data_source anaconda <span class="err">\</span>
<span class="o"><span>--</span><span class="n">pkg_version</span> <span class="mf">0.10</span><span class="o">.</span><span class="mi">0</span> <span class="o"><span>--</span><span class="n">pkg_python</span> <span class="mf">2.6</span>
</span></span></span>
pkg_name
pandas 12
Name: counts, dtype: int64
And finally, when we pass in the monthly argument, we will get monthly values.
In [6]:
<span class="o"><span>!</span>condastats overall pandas <span>--</span>start_month <span class="m">2019</span>-01 <span>--</span>end_month <span class="m">2019</span>-03 <span>--</span>monthly
</span>
pkg_name time 
pandas 2019-01 932443.0
 2019-02 1049595.0
 2019-03 1268802.0
Name: counts, dtype: float64

pkg_platform, data_source, pkg_version, and pkg_python

The other four subcommands have similar functions:
  • condastats pkg_platform returns package download counts by package platform.
  • condastats data_source returns package download counts by data source.
  • condastats pkg_version returns package download counts by package version.
  • condastats pkg_python returns package download counts by python version.
The arguments and optional arguments are the same across the four subcommands. Let's take a look at condastats pkg_platform --help and condastats data_source --help:
In [7]:
<span class="o"><span>!</span>condastats pkg_platform <span>--</span>help
</span>
usage: condastats pkg_platform [-h] [--month MONTH]
 [--start_month START_MONTH]
 [--end_month END_MONTH] [--monthly]
 package [package ...]

positional arguments:
 package package name(s)

optional arguments:
 -h, --help show this help message and exit
 --month MONTH month - YYYY-MM (defalt: None)
 --start_month START_MONTH
 start month - YYYY-MM (defalt: None)
 --end_month END_MONTH
 end month - YYYY-MM (defalt: None)
 --monthly return monthly values (defalt: False)
In [8]:
<span class="o"><span>!</span>condastats data_source <span>--</span>help
</span>
usage: condastats data_source [-h] [--month MONTH] [--start_month START_MONTH]
 [--end_month END_MONTH] [--monthly]
 package [package ...]

positional arguments:
 package package name(s)

optional arguments:
 -h, --help show this help message and exit
 --month MONTH month - YYYY-MM (defalt: None)
 --start_month START_MONTH
 start month - YYYY-MM (defalt: None)
 --end_month END_MONTH
 end month - YYYY-MM (defalt: None)
 --monthly return monthly values (defalt: False)
Same as condastats overall, we can specify a month, or provide the start month and the end month of the time period we are interested in. For example, we can see package download counts for each python version for pandas for a specific month.
In [9]:
<span class="o"><span>!</span>condastats pkg_python pandas <span>--</span>month <span class="m">2019</span>-01
</span>
pkg_name pkg_python
pandas 2.6 1466.0
 2.7 247949.0
 3.3 1119.0
 3.4 9251.0
 3.5 104445.0
 3.6 468838.0
 3.7 99375.0
Name: counts, dtype: float64
And we can see the monthly counts for each python version with the monthly flag.
In [10]:
<span class="o"><span>!</span>condastats pkg_python pandas <span>--</span>start_month <span class="m">2019</span>-01 <span>--</span>end_month <span class="m">2019</span>-02 <span>--</span>monthly
</span>
pkg_name time pkg_python
pandas 2019-01 2.6 1466.0
 2.7 247949.0
 3.3 1119.0
 3.4 9251.0
 3.5 104445.0
 3.6 468838.0
 3.7 99375.0
 2019-02 2.6 1542.0
 2.7 242518.0
 3.3 1227.0
 3.4 8134.0
 3.5 83393.0
 3.6 541670.0
 3.7 171111.0
Name: counts, dtype: float64

Python interface

To use the Python interface, we need to import the functions from the condastats package by running:
In [11]:
<span class="kn"><span>from</span> <span class="nn"><span>condastats.cli</span> <span class="k"><span>import</span> <span class="n">overall</span><span class="p">,</span> <span class="n">pkg_platform</span><span class="p">,</span> <span class="n">pkg_version</span><span class="p">,</span> <span class="n">pkg_python</span><span class="p">,</span> <span class="n">data_source</span>
</span></span></span>
Here are the function signatures for these five functions:
In [12]:
<span class="n">help</span><span class="p">(</span><span class="n">overall</span><span class="p">)</span>
Help on function overall in module condastats.cli:

overall(package, month=None, start_month=None, end_month=None, monthly=False, pkg_platform=None, data_source=None, pkg_version=None, pkg_python=None)

In [13]:
<span class="n">help</span><span class="p">(</span><span class="n">pkg_platform</span><span class="p">)</span>
Help on function pkg_platform in module condastats.cli:

pkg_platform(package, month=None, start_month=None, end_month=None, monthly=False)

In [14]:
<span class="n">help</span><span class="p">(</span><span class="n">pkg_version</span><span class="p">)</span>
Help on function pkg_version in module condastats.cli:

pkg_version(package, month=None, start_month=None, end_month=None, monthly=False)

In [15]:
<span class="n">help</span><span class="p">(</span><span class="n">pkg_python</span><span class="p">)</span>
Help on function pkg_python in module condastats.cli:

pkg_python(package, month=None, start_month=None, end_month=None, monthly=False)

In [16]:
<span class="n">help</span><span class="p">(</span><span class="n">data_source</span><span class="p">)</span>
Help on function data_source in module condastats.cli:

data_source(package, month=None, start_month=None, end_month=None, monthly=False)

Similar to command line interface, we can get the total package download counts for all the available data since 2017, for a given month, or a given combination of specifications:
<span class="n">overall</span><span class="p">([</span><span class="s1">'pandas'</span><span class="p">,</span><span class="s1">'dask'</span><span class="p">])</span>
Out[17]:
pkg_name
dask 7958854
pandas 24086379
Name: counts, dtype: int64
In [18]:
<span class="n">overall</span><span class="p">([</span><span class="s1">'pandas'</span><span class="p">,</span><span class="s1">'dask'</span><span class="p">],</span> <span class="n">month</span><span class="o">=</span><span class="s1">'2019-01'</span><span class="p">)</span>
Out[18]:
pkg_name
dask 221200
pandas 932443
Name: counts, dtype: int64
In [19]:
<span class="n">overall</span><span class="p">(</span><span class="s1">'pandas'</span><span class="p">,</span><span class="n">month</span><span class="o">=</span><span class="s1">'2019-01'</span><span class="p">,</span> <span class="n">pkg_platform</span><span class="o">=</span><span class="s1">'linux-32'</span><span class="p">,</span><span class="n">data_source</span><span class="o">=</span><span class="s1">'anaconda'</span><span class="p">,</span><span class="n">pkg_version</span><span class="o">=</span><span class="s1">'0.10.0'</span><span class="p">,</span><span class="n">pkg_python</span><span class="o">=</span><span class="mf">2.6</span><span class="p">)</span>
Out[19]:
pkg_name
pandas 12
Name: counts, dtype: int64
Similarly, pkg_platform, pkg_version, pkg_python, and data_source functions will give us package counts for each package platform, package version, python version, and data source for a given package. Here are two examples with pkg_python:
In [20]:
<span class="n">pkg_python</span><span class="p">(</span><span class="s1">'pandas'</span><span class="p">,</span> <span class="n">month</span><span class="o">=</span><span class="s1">'2019-01'</span><span class="p">)</span>
Out[20]:
pkg_name pkg_python
pandas 2.6 1466.0
 2.7 247949.0
 3.3 1119.0
 3.4 9251.0
 3.5 104445.0
 3.6 468838.0
 3.7 99375.0
Name: counts, dtype: float64
In [21]:
<span class="n">pkg_python</span><span class="p">(</span><span class="s1">'pandas'</span><span class="p">,</span> <span class="n">start_month</span><span class="o">=</span><span class="s1">'2019-01'</span><span class="p">,</span> <span class="n">end_month</span><span class="o">=</span><span class="s1">'2019-02'</span><span class="p">,</span> <span class="n">monthly</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
Out[21]:
pkg_name time pkg_python
pandas 2019-01 2.6 1466.0
 2.7 247949.0
 3.3 1119.0
 3.4 9251.0
 3.5 104445.0
 3.6 468838.0
 3.7 99375.0
 2019-02 2.6 1542.0
 2.7 242518.0
 3.3 1227.0
 3.4 8134.0
 3.5 83393.0
 3.6 541670.0
 3.7 171111.0
Name: counts, dtype: float64
We hope you find condastats useful! If you have any requests or issues, please open an issue or a pull request. If you have any questions regarding the Anaconda public dataset, please check out https://github.com/ContinuumIO/anaconda-package-data and open an issue there.
This website uses cookies to ensure you get the best experience on our website. Privacy Policy
Accept