Behind the Code of HoloViz: Q&A with Sr. Software Engineer Philipp Rudiger
Nov 10, 2020By Kelly Davis-Felner
Data science and related fields have been born in and pushed forward by open-source projects. Open-source communities allow for people to work together to solve larger problems. As stewards of the data science community, we believe it is important to go behind the lines of code to shine a light on those doing the work in open source. In a series of blogs, we’ll highlight several Anaconda employees, the open-source projects they work on, and how their work is making an impact on the larger field.
Q: What is HoloViz?
HoloViz is a set of seven Anaconda-developed tools to make visualization and dashboarding easier. It consists of libraries like HoloViews, which was the first library and where the name came from. The idea behind HoloViews was to make exploratory visualization easier. GeoViews extends HoloViews for geographic data.
From HoloViews, we built hvPlot, which provides an even quicker entry point for people familiar with the Pandas .plot interface. It also interfaces with Datashader, which is another one of the tools. Datashader takes large volumes of data and renders them very quickly to be able to send to a web browser. You can quickly explore really large data sets.
About a year and a half ago, we noticed a consistent issue amongst our Anaconda clients when creating dashboards for them. Lots of our clients had done analysis in a notebook or a script, were outputting plots, and wanted to be able to share it with someone. We came up with a library called Panel. Panel allows you to take your analysis, drop it into rows, columns, and more complex layouts, and publish that as a polished web application.
What we’ve seen in lots of companies is that they have a data science team who does daily analyses and needs to share them. We saw this involved process of a handover to a different team of web developer, which then gets fed to management who looks at the dashboard. There is often a crazy feedback loop that takes forever and loses a lot in translation between these teams. We want to empower data scientists to take their analyses, bundle them up, and share them quickly. Together, these tools make browser-based data visualization in Python easier to use and to learn, as well as more powerful.
Q: What is your role within HoloViz?
I wrote a lot of these tools: HoloViews, hvPlot, and Panel. HoloViews actually came out of a PhD project that I did at the University of Edinburgh; that’s where Jean-Luc Stevens and I developed the HoloViews software to visualize huge amounts of data. I had a horrible workflow where I would generate 200-page PDF files that I would flip through just to see results. I built a software to solve that — that’s how HoloViews was born. From there I’ve been the main author of tools like GeoViews, hvPlot and most recently Panel.
I am also a core developer on the library that powers this project, which is bokeh. I have been driving a lot of the tools that are part of HoloViz and contributing to the others, like Datashader.
Q: Who are the primary users of HoloViz?
In addition to Anaconda clients, there is also the general user community. We have a tight feedback loop with our clients and an active Discourse where users can ask questions, post examples, and share experiences. There is also the GitHub issue tracker where people file issues and ask for features or bug fixes.
It depends on the project — certain projects are more tailored to what our customers are doing. Generally, there’s been recent healthy adoption of a lot of these projects by the community. I’d love to start doing a weekly stand-up meeting with the community.
Q: What contribution to this project are you most proud of?
I’m really excited about the Panel library, which was announced publicly in the summer of 2019. It was the first project where I’ve had six months of runway to design a library from scratch and develop a vision. I’m very happy with it. In terms of code, it’s the best software I’ve written. In terms of functionality, it’s really exciting to have been able to build a data science application easily.
Q: What are you working on now that you are most excited to release?
I’m currently working on a new library that builds on Panel. It is a new way to quickly monitor any data source using a declarative YAML specification. With 100 lines of quite readable YAML specification, you get a full dashboard that monitors any data source. We built it originally to monitor deployments of our dashboards, but in writing it, I quickly realized that this was a general tool that many organizations need. Organizations have data sources that they want to monitor, like a database or a Kubernetes cluster, and they can now write a YAML specification to build a dashboard to monitor that very easily. The project is called Lumen and should be announced before the end of the year.
Q: What do you envision for this project in a year from now?
A lot of the exciting work we’ve been doing is for integration with new data backends. In particular, we now have a data backend for GPUs, which is built by NVIDIA. We’re trying to build out the ability to build highly performative dashboards that explore really large data sets and drill down into individual observations. I’m really excited about the integration with these new tools and the ability to perform complex cross-filtering on these datasets.
Q: What has been the biggest challenge while working on this project?
Being an open source maintainer is hard. Everyone has requests for features and bug fixes. Being aware of all of the different stakeholders and keeping up with the sheer volume of issues can be challenging. Another thing I’m still actively learning about is community building, which is a central part of any open-source project but no one teaches you about.
Q: What use cases for your project do you find the most interesting or surprising? What software does Holoviz enable?
I mentioned that some of the libraries within HoloViz have seen recent adoption by the community. Tools like Datashader and hvPlot have been adopted quite a bit by the geoscience community. In particular, there’s an initiative called Pangeo, which is made up of climate and oceanology modelers who visualize large amounts of data in the cloud.
NVIDIA has been building a data science stack that works on GPUs, which really speeds things up quite a lot. They’ve been building a framework called cuxfilter to quickly build dashboards, which builds on the Panel and Bokeh to showcase some of their tools such as cuDF and cuGraph.
It’s great to see all the dashboards that people have been building. I only recently opened the Showcase category on our Discourse, so people should check that out to post their examples and see what others have built using our tools.
Q: In your mind, what is the value of open-source projects?
The power of open source comes from the fact that it’s people with concrete use cases that drive the development of these libraries. Like with HoloViews, I built a visualization library because I had these huge quantities of data sets that I needed to explore. Many popular open-source libraries have a history like that. You build something to solve your issue and realize that many others have the same challenge.
You get a great sense of community when you build tools with people in an open environment. You get contributions from unexpected sources. If you’re closed source, you only get contributions from your narrow team, which has a limited set of perspectives and use cases. If you manage or collaborate on open-source projects, you hear new perspectives and make projects better that way.
There are huge enterprises that derive tens of million in value and benefit immensely from open-source projects. Ideally, companies that benefit from open source would contribute back to them and shepherd the ecosystem forward.
At Anaconda, we’re proud to support our employees’ involvement in open-source initiatives. To learn more about Holoviz and how we contribute to other open-source projects, visit our Open Source page.