CSV type provider: handling column with list of tags?



A column of my csv file is a lis tof tags, separated by a comma, eg:

“tag1, tag2, my third tag with spaces, tag 4”.

I loaded the csv file with string as the type of that column. However, to manipulate the data, I will need to split that string and get a list of tags so I can filter the rows according to their tags.

What is the best approach here?
The type of the column is string, but after the split it will be a list of strings…

Here is the type definition I use for the csv:

type Expenses = CsvProvider<"/notebooks/expenses.csv",HasHeaders=false, Schema="date(string),category(string),amount(float),currency(string),note(string),tags(string)", Culture="DE">

Do I need to define another type? How do I best handle the conversion from string to list of string, and which types do I need to define and use? Or is there a way to pass a column transformation to CsvProvider?



You can, of course, split the strings manually using something like

let separator = System.Text.RegularExpressions.Regex(@"\s*,\s*")
let stuff = "tag1, tag2, my third tag with spaces, tag 4"
stuff |> separator.Split

and recombine them using

let separator = ", "
let stuff = [|"foo"; "bar"; "baz"|]
stuff |> String.concat separator

I have no idea whether it’s possible to hook those transformations directly into the object created by the CSV provider. It does provide a .Map method, but the documentation suggests that transformed rows would have to have the same column types. If necessary, one can always use standard sequence operations to process .Rows, though.


thanks @murphy.

Regarding the type, how do I best define a type which is basically the same as the row, but with one column type changed from string to list of string? I put in bold other question that appeared while writing this.

Can I override the type of one column when I have the row type?
I managed to get the type of a row like this (is there a better way using |> ?):
Seq.head(e.Rows).GetType() System.Tuple6[System.String,System.String,System.Double,System.String,System.String,System.String]

But now I see the first column should be a date, but is a string???

Or is the simplest way (which I’m experimenting with now) to manually and explicitely define the type of a row with the list of strings for the tags?



I think the type of the rows can also be obtained directly from the type provider, i.e. as Expenses.Row in your case. If you want a reflection object for that type, you should be able to write typeof<Expenses.Row>.

I don’t think there is a construction that “modifies” the type of one tuple element. You will have to declare a complete new tuple or record type to store the transformed information.

Concerning the type of the first column in your data model, note that your type provider instantiation includes Schema="date(string),..." – if you want that column to have a date type, you should write Schema="date(date),..." instead.


I got it working, thanks!