TDIing out loud

Ramblings on the paradigm-shift that is TDI.

Wednesday, March 16, 2011

CSV Parsing with a twist

So the question I got was this: how can I get the line being parsed by the CSV Parser?

Unfortunately, the CSVParser class does not have any public method for this, so the following is not possible:

lineRead = thisConnector.getParser().getCurrentLine()
Instead, with the help of Jens Thomassen, TDI surgeon, and the indispensible AL Debugger, I created this example TDI 7.1 AL to do just that. You can download the linked Example AssemblyLine and just drop the file onto a TDI Project. It's self-contained thanks to the ever-handy Form Entry Connector.

This AL contains first a Form Entry Iterator in the Feed section that reads a CSV bytestream line-by-line using the LineReader Parser. This returns each line from the CSV loaded into the Connection parameter of the Connector. Then in the Flow section there is another Form Entry Connector that has the CSV Parser, which gets us the actual CSV attributes.

To make this magic work, I did a couple of things:
  1. First I set the Flow Section Iterator to initialize 'only when used'. You do this by pressing the More... button out to the right of the Inherit From setting and changing the Initialize drop-down accordingly. This is to prevent the Connector and Parser from initializing until we have data for it.
  2. Then I added this code to the After Initialize Hook of this second Iterator:
    outStream = new java.io.PipedOutputStream()
    inpStream = new java.io.PipedInputStream(outStream)
    newline = new java.lang.String("\n\r")
    
    formEntry = thisConnector.getConnector()
    formEntry.initParser(inpStream, null)
    
    csvColumns = formEntry.getParser().getParam("csvColumns")
    firstRow = true

    The Piped stream allows me to write into one end of the pipe and have my CSV Parser read from the other end.
  3. Now in the Before GetNext Hook I need to write the current line into my pipe.
    outStream.write(work.getString("line").getBytes())
    outStream.write(newline.getBytes())
    
    if (firstRow && (csvColumns == "")) {
      outStream.write(work.getString("line").getBytes())
      outStream.write(newline.getBytes())
      firstRow = false;
    }

    If this is the first row of the file and no Field Names have been specified for the Parser, I am assuming this must be the column title line of the file, so I have to write it twice to the pipe.
And presto! I am getting both the 'line' Attribute and those parsed out of the CSV. Note that this is a slightly simplistic approach, and as a result the first Entry returned by the CSV Parser contains the column names as Attribute values.

So I made a Second Attempt using just a single Iterator in the Feed section, and scripting the setup and calls to the CSV Parser. You can also just drop this .assemblyline file onto a Project and then play with it. I am using the code that I built for the first example in a slightly different way:
  1. The After Initialize Hook script is a bit shorter:
    outStream = new java.io.PipedOutputStream()
    inpStream = new java.io.PipedInputStream(outStream)
    newline = new java.lang.String("\n\r")
    
    firstRow = true;
    initParser = true;
  2. All the work is done in After GetNext, once I have read in the line:
    if (initParser) {
      // In this next line you could instead get a pre-configured
      // Parser from your Project Resource library:
      //    csvParser = system.getParser("Parsers/MyCSV")
      //
      // AND to find out what the 'true name' of a Parser is
      // just add it to a Connector and then use the More...
      // Select Inheritance button to see the inheritance link
      // for the Parser (yeah, it's not that elegant ;)
      //
      csvParser = system.getParser("ibmdi.CSV")
      csvParser.setInputStream(inpStream)
      csvParser.initParser()
      csvColumns = csvParser.getParam("csvColumns")
    
      initParser = false;
    }
    
    if (firstRow && (csvColumns == "")) {
      outStream.write(conn.getString("line").getBytes())
      outStream.write(newline.getBytes())
    }
    
    outStream.write(conn.getString("line").getBytes())
    outStream.write(newline.getBytes())
    
    csvEntry = csvParser.readEntry()
    if (firstRow && (csvColumns == "")) {
      firstRow = false
      system.skipEntry() // skip the column names
    }
    
    if (csvEntry != null)
      conn.merge(csvEntry)
This time only the actual data values are returned.

And now for more coffee :)

4 comments:

Vitor Pereira said...

Hi there, I know this is a very old post but would you by any chance still have the example AL for your second attempt? I'm trying to wrap my head around this but I'm not getting far :)

Eddie Hartman said...

@Vitor - darn, my old Dropbox links are either no longer public, or gone. I'm going to have to ask for your patience while I make a new copy (and I think I may actually do this a bit differently). Watch this space :)

Eddie Hartman said...

@Vitor - Ok, so now the links have been updated and should work. Please let me know if there are any dead ones. And the same goes for any other links that I've published through the years. Dropbox no longer just supports a 'Public' folder so I fear that many of the assets I've published have become unavailable :/

Vitor Pereira said...

Sorry, only back here now. All links working in this blog post working. Brilliant! Thanks!