this post was submitted on 18 Mar 2024
83 points (100.0% liked)

Python

6287 readers
8 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

🐍 Python project:
💓 Python Community:
✨ Python Ecosystem:
🌌 Fediverse
Communities
Projects
Feeds

founded 1 year ago
MODERATORS
 

If you care about performance, you may want to avoid CSV files. But since our data sources are often like our family, we can't make a choice, we'll see in this blog post how to process a CSV file as fast as possible.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 27 points 7 months ago (1 children)

Holy shit, switching to PyArrow is going to make me seem a mystical wizard when I merge in the morning. I’ve easily halved the execution time of a horrible but unavoidable job (yay crappy vendor “API” that returns a huge CSV).

[–] [email protected] 3 points 7 months ago

You and me both. I've been parsing around 10-100 million row CSVs lately and...this will hopefully help.