Friday, 18 January 2013

.Net DSV DAO

Delimiter-separated Values files (CSV, TSV...) are a really convenient way to store data (I've started to use the term DSV recently, I used to wrongly employ the CSV term when I was referring to TSV's...).
They're simple to understand, read and modify both manually or by code. As I mentioned in my previous entry, dealing with Excel files tends to be more hassle than it should, so in many occasions when presented with a large excel file to read, I'll save it to TSV and use one home made parser to read it.

You can write a basic DSV reader in just a few lines of code, just using ReadLine and Split:

public IEnumerable<List<string>> GetDataEntries(string filePath)
{

using (StreamReader sr = new StreamReader(filePath))
{
while (!sr.EndOfStream)
{
string curEntry = sr.ReadLine();
yield return curEntry.Split(this.separator).ToList();
}
}
}

The code above will work fine with simple data where your fields won't need to include break lines or field separators, otherwise, you'll get a mess of results. The standard way to store these special characters in your fields is wrapping fields in quotes (usually double quotes, I'll call it "quote character"). If you also want to have "quote characters" as part of the normal content of your field, you'll have to double them (i.e. you'll have 2 double quotes or 2 single quotes).
With this in mind, I decided to write a "decent" DSV parser that contemplates these cases. For the fun of it, I ended up adding some Add-Remove functionality to it, so it turned into a DSV DAO (Data Access Object).

I've uploaded it to my GitHub account, so you can view the code and download it in case you think it can be of any interest to you.

The BasicDsvDao class should only be used for those cases when we're 100% certain that our fields won't include "especial" characters, otherwise use the AdvancedDsvDao, that carefully tries to deal with those cases (break lines or separators as part of your fields...). Though I think that the standard uses double quotes as field delimiters, I've given the option to use single quotes.

Both DAO's will work with files with or without headers and with your Delimiter of choice. You can read the Headers, read the Data Entries, Update/Remove the headers, and Add/Remove Data Entries.

These DAO's have no knowledge about what the data in your DSV's represent, so they just return (or expect, if used for adding new entries) an IEnumerable of Strings for each "real line" in your DSV. If these Collections of Strings can be mapped to Objects, that's your job.

After writing code for mapping those collections to Objects multiple times, I set out to find some way to generalize this a little bit. I winded up with a ColletionToFromObjectMapper<T>, that will map into a T object one of those string collections obtained by your DAO, and viceversa. For each field in your DSV-Object that you want to map, you'll need a function that maps from String to Object, a function that maps from Oject to String, the name of the Property in the Object, and as the DAO works with header and headerless DSV's the position of the field in the DSV line. We use the CollectionItemToObjectMappingInfo class to express this.

public class CollectionItemToObjectMappingInfo 
    {
        public int Index { get; set; }
        public string PropertyName { get; set; }
        public Func<string, Object> FromStringMapping { get; set; }
        public Func<Object, string> ToStringMapping { get; set; }
    }

When working with DSVs with header, we would need the name of the name of the header corresponding to that DSV field, instead of the position, so we'll use for that this DictionaryItemToObjectMappingInfo class:

public class DictionaryItemToObjectMappingInfo /* where T : new()*/
    {
        public string Key { get; set; }
        public string PropertyName { get; set; }
        public Func<string, Object> FromStringMapping { get; set; }
        public Func<Object, string> ToStringMapping { get; set; }
    }

and then we'll use the MappingInfoConverter class to obtain a CollectionItemToObjectMappingInfo from a DictionaryItemToObjectMappingInfo and the List of Headers in the DSV. we'll use it all like this:

var dicToObjMappingInfos = new List<DictionaryItemToObjectMappingInfo>()
            {
                new DictionaryItemToObjectMappingInfo()
                {
                    Key = "person name",
                    PropertyName = "Name",
                    FromStringMapping = st => st,
                    ToStringMapping = ob => (string)ob
                }, 
                new DictionaryItemToObjectMappingInfo()
                {
                    Key = "how old",
                    PropertyName = "Age",
                    FromStringMapping = st => Convert.ToInt32(st),
                    ToStringMapping = num => num.ToString()
                }
            };

            var dao = new AdvancedDsvDao(filePath, '\t', true);
            var headersList = dao.GetHeaders();

            var colToObjMappingInfos = new MappingInfoConverter().Convert(dicToObjMappingInfos, headersList).ToList();
            
            //the "real life" usage would go on like this:
            var mapper = new CollectionToFromObjectMapper<Person>(colToObjMappingInfos);
            var people = dao.GetDataEntries()
                .Select(personRow => mapper.Map(personRow))
                .ToList();

And well, I think that's all. I haven't done an thorough use of this code, but so far it has served me well.

No comments:

Post a Comment