this post was submitted on 29 Jun 2023
19 points (100.0% liked)

Programming

17509 readers
7 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 1 year ago
MODERATORS
 

So I'm considering going deep into a data viz library, and I'm wondering what you people think. I'm not asking reddit because I know for a fact that all the hardcore people that know their stuff are on lemmy.

Here are my requirements:

  • API must at least pretend to be reasonably designed.
    • I know that viz libraries are complex. But I want something with carefully chosen primitives that scale reasonably well from "data goes in, chart goes out" to nit-picky adjustments.
  • Defaults must not be ugly.
    • Or at least there should be an easy way to bypass the default ugliness. I know that design is subjective, but how am I supposed to trust a library that operates on the visual space and yet decides that a bad default is ok?
    • Here looks like ggplot has the upper hand. But there is a stylesheet that makes matplotlib look like ggplot, so maybe that's not a big problem.
  • Must have a future.
    • The github contribution chart on matplotlib just keep going up, it's insane. While ggplot not so much. But maybe it's hard to compete with the python hype machine, and that is that.
  • Bonus points if interactive and renders to web too.

Non-requirements:

  • Easy learning curve.
    • I am a hardcore programm0r. I like it rough, as long as it's worth the effort.
  • Heavy math stuff.
    • I'm not designing rockets or wind turbines. I just want a way to visually represent data as lines, charts, pies, or maps, or maybe violins if I'm feeling fancy.

Thanks

top 9 comments
sorted by: hot top controversial new old
[–] [email protected] 7 points 1 year ago (1 children)

Check out plotly it's pretty nice and easy to get going

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

Plotly is more web and "interactive" focused, but if that's what you're looking for it's a fantastic library.

[–] [email protected] 6 points 1 year ago* (last edited 1 year ago) (1 children)

Given these criteria, ggplot2 wins by a landslide. The API, thanks to R's nonstandard evaluation feature, is crazy good compared to whatever is available in Python. Not having to use numpy/pandas as inputs is a bonus as well, somehow pandas managed to duplicate many bad features of R's data frame and introduce its own inconsistences, without providing many of the good features¹. Styling defaults are decent, definitely much better than matplotlib's, and it's much easier to consistently apply custom styling. Future of ggplot2 is defined by downstream libraries, ggplot2 is just the core of the ecosystem, which, at this point, is mature and stable. Matplotlib's activity is mostly because that lack of nonstandard evaluation makes it more cumbersome to implement flexible APIs, and so it just takes more work. Both have very minimal support for interactive and web, it's easier to just use shiny/dask to wrap them than to force them alone to do web/interactive stuff. Which, btw, again I'd say shiny » dask if nothing but for R's nonstandard evaluation feature.

Note though that learning proper R takes time, and if you don't know it yet, you will underestimate time necessary to get friendly. Nonstandard evaluation alone is so nonstandard that it gives headaches to people who'd otherwise be skilled programmers already. matplotlib would hugely win by flexibility, which you apparently don't need—but there's always that one tiny tweak you would wish to be able to do. Also, it's usually much easier to use the platform's default, whatever publishing platform you're going to use.

As for me, if I have choice, I'm picking ggplot2 as a default. So far it was good enough for significant majority of my academic and professional work.

¹ Admitably numpy was not designed for data analysis directly, and pandas has some nice features missing from R's data frames.

[–] [email protected] 2 points 1 year ago

Very nice and nonstandard answer, most appreciated.

[–] pixelpop3 4 points 1 year ago* (last edited 1 year ago)

For the types of visualizations you're describing, the choice probably won't matter. I view matplotlib as "matlab flavor" and ggplot2 as "R flavor". For R-type work (a certain type of table-based stats) I just use R.

For matlab type work (image processing, simulations, etc) I now use matplotlib. This is mostly numpy/scipy things rather than... pandas things. Python is interesting because it has things that are beyond matplotlib (VTK, etc) and beyond matlab. Typically when you're prototyping in matlab you're assuming you will have to rewrite in a different system eventually, but with python you can move the prototype further down to more polished prototype easily.

I do a lot of image processing and am too familiar with matlab, so matplotlib generally came naturally for translating that prior knowledge. So really it depends on what sorts of things you are familiar with, languages you use, and would want to do in the future. I think with either choice you will eventually hit some wall of difficulty.

There are also more visualization and plot focused things (TeX family or PostScript and PDF) as well as the "processing" language.

I use R for... not-image-type analysis stats and generate plots in R using R's plotting. I mostly use python for matlab-type things and matplotlib seems more natural for that.

Julia is on my todo-list and I have heard good things about their plotting ecosystem but I have not looked into it.

Incidentally VTK is extremely well designed for the type of language it's based on and the problems its solving... but that's not really 2D plotting.

[–] acow 3 points 1 year ago (2 children)

I went with ggplot2 some time ago, despite not using or knowing R at all. What pushed me in that direction was that I was using other plotting libraries (I don't recall which at the time), and there was some aspect of spacing between elements or some such that was making a particular plot look ever so slightly ugly in my eyes... and I couldn't fix it!

In my frustration, I consciously decided to set aside my version of your "reasonably designed" requirement (I find R consistently frustrating in this regard, though I know some people do all their programming in it and I salute them). I gave ggplot2 a try with a cargo culting approach: search for how to make the kind of plot you want to make, and just tweak that template. I was blown away. I could find recipes for everything I wanted to do, the results were instantly more attractive than what I had before, and I could tweak everything.

matplotlib is absolutely a reasonable option, but even years later I still have R environments attached to most projects specifically for data visualization, and still produce plots that are delightfully aesthetic. So here's one voice to say that ggplot2 has real merit, especially if your aim is specifically to produce visualizations rather than explore a programming ecosystem.

[–] pixelpop3 2 points 1 year ago* (last edited 1 year ago)

Just about everything is modifiable in matplotlib... It may not be easy, but all plotting libraries are designed to make some things easy at the expense of making other tasks more difficult. For matplotlib you just have to think about things the way matlab thinks about things... which is more computer graphics based. It can get ugly until you understand it. But if you understand how any plotting library actually works it's not that bad. All plotting libraries ultimately are built on graphical primitives like lines and fonts and triangles and patches computing where things belong by transforming coordinates and feeding them to a layout engine. It's not as magical as the APIs make them seem. So if you're willing to dig into their bowels (as OP mentions) there really aren't any many limits. Sometimes it's actually easiest to just declare a canvas in memory and draw it all by hand. Ultimately, things are either vector or raster formats (or some abstraction that supports both) and fed into some computer graphics engine (like postscript or some OS's or GPU canvas).

Anyway, sometimes the easiest answer is you export and edit the labels in the final figure. One really nasty way if you don't have PS or PDF tools is to sidetrack through Windows EMF and mess with fonts and positioning of text in PowerPoint.

[–] [email protected] 1 points 1 year ago

Interesting. This matches my one experience using ggplot2, in which I found it easy to modify existing code. Looks like the library works very well with the "cargo cult" approach

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

ggplot is absolutely the best in town, for a ton of reasons, if you are doing real viz and stats. Unfortunately is R only, which, as a hardcore programmer, you'd hate. (I honestly like it, but we are not many).

Go for plotly as others suggested

load more comments
view more: next ›