Python Data Visualization 2018: Where Do We Go From Here?
This post is the third in a three-part series on the current state of Python data visualization and the trends that emerged from SciPy 2018.
By James A. Bednar
As we saw in Part I and Part II of this series, having so many separate Python visualization libraries to choose from often is confusing to new users and likely to lead them down suboptimal paths. Once they learn one library it is difficult to re-learn others that may be more suitable for later tasks. Is there any hope that Python could have a simpler story to tell, steering users to a smaller number of starting points without cutting them off from important functionality?
Visions for the Future
At SciPy 2018, I proposed my PyViz.org initiative as one step in that direction, with HoloViews and GeoViews providing a single and concise high-level declarative data API for using multiple InfoVis libraries (currently including Bokeh, Matplotlib, Datashader, Cartopy, and Plotly), and now Panel providing a unified dashboarding approach across dozens of libraries and data formats (see highlighted libraries above). If other InfoVis library authors support the high-level HoloViews API, then users could easily switch between backends depending on their immediate needs (e.g., for selecting different plot types), without having to learn a completely new library’s API. Even without such support, Panel already allows plots to be combined from any of the above sources, into the same figure or dashboard.
Ben Root, representing Matplotlib, suggested that the large number of existing libraries was not necessarily an issue. He believes it’s entirely appropriate for Matplotlib to be the core workhorse for a large number of libraries building on it—not everything needs to be in Matplotlib itself, and Matplotlib makes an excellent basis on which to build other, higher-level 2D static-plotting functionality, due to its comprehensive support for low-level primitives and output formats.
Noelle Held, from the audience, argued that her scientific colleagues have indeed been totally overwhelmed by the sheer number of plotting possibilities in Python, yet she saw the benefit of having so many different libraries available. It seems unlikely that all the separate package authors would be able to coordinate closely, but perhaps the SciPy community could do better at educating users on the data models, assumptions, and outputs of each of the main visualization tools. For example, perhaps we do not need to achieve centralization of the libraries, but rather centralization of educational resources that can guide users to the appropriate libraries. I agreed that if people were willing to work on this, PyViz.org would be a natural place to host such resources.
Maarten Breddels, representing ipyvolume, bqplot, and ipywidgets, argued that ipywidgets (aka Jupyter widgets) is already emerging as a de facto standard, supported by a wide range of libraries (ipyvolume, ipyleaflet, pythreejs, bqplot, and now Plotly) that can now be mixed and matched as needed to provide interactive apps and plots in a Jupyter notebook.
Prabhu Ramachandran, representing Mayavi, emphasized that mature SciVis tools like VTK, Mayavi, or ParaView cover important functionality not addressed by InfoVis-focused libraries, offering advanced and specialized visualization techniques for large and complex finite-element-method simulations. These tools support visualization of a variety of data structures and the “long tail” of scientific research beyond just the initial visualization itself.
Conclusions and Outlook
Overall, it was clear that each of the main libraries represents a vibrant community of users and developers using different techniques to achieve different goals. It was both unlikely and perhaps undesirable for the libraries to consolidate significantly because that would remove major differences in functionality. One participant also cautioned that any unification efforts were likely to be distressing to some users of libraries not included in those efforts.
Clearly, we can do better at educating the public about how each library and initiative is most useful, steering users more efficiently into effective solutions for their various goals. In particular, users need to consider the type of plots they want to use, the data sizes they work with, how they want to interact with and publish their plots, and what type of API they want (focusing on high-level capabilities or low-level control); see the first blog post in this series. Library authors can help make these differences clear for each project, steering users towards appropriate solutions for their needs. Hopefully these blog posts will also help clarify the situation a bit!