Behind the Code of Conda: Q&A with principal developer Cheng Lee
Data science and related fields have been born in and pushed forward by open-source projects. Open-source communities allow for people to work together to solve larger problems. As stewards of the data science community, we believe it is important to go behind the lines of code to shine a light on those doing the work in open source. In a series of blogs, we’ll highlight several Anaconda employees, the open-source projects they work on, and how their work is making an impact on the larger field.
Q: What is Conda?
Conda is the flagship open-source project that Anaconda manages. It’s a package and environment management system that makes doing data science much easier. Instead of having to contact your system administrator or go through multiple sources to install software for data science, you can grab Conda and let it handle the work of installing software, managing multiple versions and software environments, and dealing with upgrades.
Conda itself is a tool, but it’s also a broad ecosystem of community-supported packages. It’s not just Anaconda that produces packages; we rely on a really large community to support our users. Conda-forge and Bioconda are two of the biggest, more well known sub-communities, but partner companies and projects, other volunteer groups, and individuals all play a big role in the broader Conda ecosystem.
Q: What is your role within Conda?
I am one of the principal Conda core team members. I do a lot of the development work on Conda itself, and I also handle the community aspects, such as responding to issues and pull requests, managing the public roadmap for the Conda technology stack, and meeting with community members. All my time at Anaconda, in some way or another, touches this project. That doesn’t mean I am directly coding on Conda at all times; I may be meeting with our customers and partners, gathering feedback from community teams, and working with other members of the Anaconda distribution team developing Conda-related software and packages.
Q: What contribution are you most proud of?
I’ve been working on Conda as an Anaconda employee for a little over six months. My last job was at a company that was an Anaconda partner, so I’ve been working within the Conda ecosystem for more than five years.
Helping to build out the package ecosystem is the contribution I’m most proud of. At my last job at a biotech startup, we built Conda packages for common bioinformatics and analysis tools, and we strongly recommended Conda as the standard way for our customers to install and manage the software environments they used to analyze their scientific data.
Conda and its package ecosystem makes creating, managing, and sharing complex technical computing (software) environments easier, especially for users who are not necessarily trained as software developers or system administrators. This allows them to focus on doing their science and analytics without having to worry about building, installing, and managing software.
Q: What are you working on now that you are most excited to release?
We’re trying to make it easier for people who are not Anaconda employees to help develop and maintain Conda. We’re moving Conda and its development processes in a direction where community members can more quickly and easily contribute code, triage issues and PRs, and develop additional tools for Conda and its package ecosystem.
The feedback we’ve heard from the community is that the way that Conda is currently structured makes the code somewhat hard to understand and modify. We want to simplify and better modularize Conda’s code base, so that people can feel confident working on it without having to worry about horribly breaking things. For example, the performance and rather opaque messages of the solver have been long-standing issues with users; we’re constantly trying to improve those aspects of Conda and would love to get more community contributions.
Conda the application is focused on creating environments and installing packages. However, growing and sustaining the ecosystem requires thinking about things beyond the Conda itself. How can users reliably reproduce environments across different platforms? What should “next generation” Conda packages and recipes look like? How do we package and distribute tools written in other languages and runtimes (Julia, for example)? How can the community expand the ecosystem to platforms Anaconda, Inc. does not “officially” support (e.g., the open-source BSDs)? The community has been working on answering these questions, and that’s something we’re excited about and want to continue encouraging.
Q: How important is feedback from users to the evolution of Conda?
The Conda roadmap is driven almost entirely by its user base. In addition to the large open-source community, this user base includes Anaconda’s various commercial partners and customers. Anaconda’s Commercial, Team, and Enterprise Edition products are built on Conda, so the Conda roadmap must include items to support those solutions. A big part of my role is ensuring we get input across this diverse user base and balancing their requirements so no one group becomes an outsized influence on Conda’s development.
In the past, we haven’t done as good a job as we should communicating feedback between user groups and making them aware of the different needs. We are working on being more transparent about why we make certain design decisions or code changes than we have been in the past.
Q: What use cases for your project do you find the most interesting or surprising? What software does Conda enable?
I’ve always been aware of Conda’s popularity, but until I started working at Anaconda, I don’t think I truly grasped the scale of its reach. As of early October, the number of average monthly Conda users in 2020 was 4.13 million, and the number of packages shipped in 2020 was more than 3 billion. We have become the standard way of installing frameworks like PyTorch and TensorFlow.
I love discovering the use cases people have found for Conda beyond “a tool for installing data science software”. As I alluded to before, Conda has become rather popular in the bioinformatics space; the Bioconda channel specializes in packaging software for biology and is nearing 8,000 packages. At least one DNA sequencer manufacturer uses it to distribute the analysis tools for its instruments. I’ve seen Conda being used at high performance computing centers to support having multiple versions of various software environments on the same supercomputer. There are Conda packages for astronomy, CFD, GIS, etc., and I know of at least one group working on packaging the Robot Operating System using Conda.
One of the great things about Conda that maybe more people should know about is that it’s not limited to just Python and not just data science tools. Conda is especially useful if, like in bioinformatics, your analysis stack requires multiple tools and running scripts written in multiple languages (e.g., Python, R, and Perl). And it’s nearly indispensable if various steps in your workflow require different versions of the same tools, a challenge that’s more difficult to deal with using most other packaging systems.
Q: What do you envision for this project in a year from now?
As a company, we remain committed to keeping Conda open source and want to make it easier for the community to participate. We want to foster increased community contributions to the Conda through 2020 and into 2021; a year from now, I would like it if a significant fraction of contributions to Conda came from people outside of Anaconda.
In addition, we’ve started a Conda tools community organization, and I would like to see some of those come out of the incubator phase to become fully fledged, non-Anaconda maintained tools early next year.
Q: In your mind, what is the value of open-source projects?
The biggest value is a sense of community ownership over the modern world. Software is such a huge part of the modern world, and open source allows software to be more available to everyone. It makes it easier for people to get software; it makes it easier for people to address their needs rather than relying on a vendor; and it lowers the barrier to entry for using and developing software.
At Anaconda, we’re proud to support our employees’ involvement in open-source initiatives. To learn more about Conda and how we contribute to other open-source projects, visit our Open Source page.