this post was submitted on 12 Oct 2023
5 points (100.0% liked)

Haskell

467 readers
3 users here now

founded 2 years ago
MODERATORS
5
CSV Parsing (mander.xyz)
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/haskell
 

New-ish to Haskell. Can't figure out the best way to get Cassava (Data.Csv) to do what I want. Can't tell if I'm missing some haskell type idioms or common knowledge or what.

Task: I need to read in a CSV, but I don't know what the headers/columns are going to be ahead of time. The user will provide input to say which headers from the CSV they want processed, but I won't know where (index-wise) those columns will be in the CSV, nor how many total columns there will be (either specified by the user or total). Say I have a [String] which lists the headers they want.

Cassava is able to read CSVs with and without headers.

Without headers Cassava can read in entire rows, even if it doesn't know how many columns are in that row. But then I wouldn't have the header data to filter for the values that I need.

With headers Cassava requires(?) you to define a record type instantiating its FromNamedRecord typeclass, which is how you access parts of the column by name (using the record fields). But in order for this to be well defined you need to know ahead of time everything about the headers: their names, their quantity, and their order. You then emulate that in your record type.

Hopefully I'm missing something obvious, but it feels a lot like I have my hands tied behind my back dealing with the types provided by Cassava.

Help greatly appreciated :)

you are viewing a single comment's thread
view the rest of the comments
[–] jadero 0 points 1 year ago

I've never worked with Cassava or Haskell, but I've done a lot of CSV processing.

Is there a way to just go ahead and read in everything (stop after a dozen or so rows), let the user select what they want, then go ahead and do the real import or processing? That has always been my main tactic across a variety of languages. To minimize user effort, I allowed them to save their choices so that they could just select a saved pattern the next time they got data from that source. Even better, CSVs usually have some kind of consistent pattern in their file names that can be leveraged to recommend or even automatically use a saved pattern.

Of course, that depends on being able to define "Record Types" on the fly from what the user selects/saves. I can't imagine that being a problem, but, as I said, I've never used Haskell or Cassandra.