this post was submitted on 09 Mar 2024
25 points (96.3% liked)

Python

6412 readers
2 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

🐍 Python project:
💓 Python Community:
✨ Python Ecosystem:
🌌 Fediverse
Communities
Projects
Feeds

founded 1 year ago
MODERATORS
top 1 comments
sorted by: hot top controversial new old
[–] [email protected] 11 points 8 months ago* (last edited 8 months ago)

I think it's a good thing polars developers are heading toward interoperability. The Dataframe Interchange Protocol the article mentions sounds interesting.

For example, if you read the documentation for Plotly Express

I know this seems to be an important topic in the community. But honestly, I rarely use all the plotting backends at all. They are nice for quick visualizations, but most of the time I prefer to throw my data into matplotlib on my own, just for the sake of customization.

polars.DataFrame.to_pandas() by default uses NumPy arrays, so it will have to convert all your data from Arrow to Numpy; this will double your memory usage at least, and take some computation too. If you use Arrow, however, the conversion will take essentially no time and no extra memory is needed (“zero copy”)

I don't want to complain, it is definitely a good thing polars developers address this. pandas is the standard and as long as full interoperability between polars and the pandas ecosystem is lacking, this "hack" is needed. However, data transformation can be an incredibly sensitive topic. I do not even trust pandas or tensorflow in always doing the right thing when converting data - processing data in polars, converting it to pandas and then process it further - I am sceptical. And I am not even talking about performance here.

If you’re doing heavy geographical work, there will likely someday be a replacement for GeoPandas, but for now you probably going to spend a lot of time using Pandas

This is important. Geopandas is one of the most import libraries derived from pandas and widely used in the geoscience community. The idea of an equivalent like "geopolars" is insane in my eyes. I am biased as a data scientist mostly working on spatial data, but this is the main reason that I watch the development of polars only from the sidelines. Even if I wouldn't work with geographic data, GeoAI is such an important topic you can't just ignore it. And that's only the perspective from my field, who knows what other important communities are out there that rely on pandas.