CSV type provider: handling column with list of tags?


#1

Hi,

A column of my csv file is a lis tof tags, separated by a comma, eg:

“tag1, tag2, my third tag with spaces, tag 4”.

I loaded the csv file with string as the type of that column. However, to manipulate the data, I will need to split that string and get a list of tags so I can filter the rows according to their tags.

What is the best approach here?
The type of the column is string, but after the split it will be a list of strings…

Here is the type definition I use for the csv:

type Expenses = CsvProvider<"/notebooks/expenses.csv",HasHeaders=false, Schema="date(string),category(string),amount(float),currency(string),note(string),tags(string)", Culture="DE">

Do I need to define another type? How do I best handle the conversion from string to list of string, and which types do I need to define and use? Or is there a way to pass a column transformation to CsvProvider?

Thanks


#2

You can, of course, split the strings manually using something like

let separator = System.Text.RegularExpressions.Regex(@"\s*,\s*")
let stuff = "tag1, tag2, my third tag with spaces, tag 4"
stuff |> separator.Split

and recombine them using

let separator = ", "
let stuff = [|"foo"; "bar"; "baz"|]
stuff |> String.concat separator

I have no idea whether it’s possible to hook those transformations directly into the object created by the CSV provider. It does provide a .Map method, but the documentation suggests that transformed rows would have to have the same column types. If necessary, one can always use standard sequence operations to process .Rows, though.


#3

thanks @murphy.

Regarding the type, how do I best define a type which is basically the same as the row, but with one column type changed from string to list of string? I put in bold other question that appeared while writing this.

Can I override the type of one column when I have the row type?
I managed to get the type of a row like this (is there a better way using |> ?):
Seq.head(e.Rows).GetType() System.Tuple6[System.String,System.String,System.Double,System.String,System.String,System.String]
`

But now I see the first column should be a date, but is a string???

Or is the simplest way (which I’m experimenting with now) to manually and explicitely define the type of a row with the list of strings for the tags?

Thanks!


#4

I think the type of the rows can also be obtained directly from the type provider, i.e. as Expenses.Row in your case. If you want a reflection object for that type, you should be able to write typeof<Expenses.Row>.

I don’t think there is a construction that “modifies” the type of one tuple element. You will have to declare a complete new tuple or record type to store the transformed information.

Concerning the type of the first column in your data model, note that your type provider instantiation includes Schema="date(string),..." – if you want that column to have a date type, you should write Schema="date(date),..." instead.


#5

I got it working, thanks!