TDIing out loud, ok SDIing as well

Ramblings on the paradigm-shift that is TDI.

Monday, November 21, 2011

Early Christmas Stocking Stuffers

Since this past summer, TDI 7.1 Fixpacks now include a number of new components. However, since they are dropped in the examples folder during fixpack updating and not under jars, they won't show up in the Add Component wizard until you make them available to your TDI installation. This can be done in a couple of ways, as I have outlined here.

To whet your appetite, here is a list of powerful new components provided under examples:
  • TPAE IF Connectors include a 'Simple' version which is comparable to the 'Generic Maximo Connector', although it has been extended to work with a broader range of MBOs. The other TPAE IF Connector (without 'Simple' in its name) further enhances integration capabilities to Maximo application data.
  • TPAE IF Change Detection Connector allows you to hook into change notifications coming from the Maximo integration framework, for catching and propagating new and modified data to any number of targets. And deletes.
  • File Management Connector is for reading and modifying file system structures and file system metadata. More specifically, it can create, find and delete files and directories. You still use the 'File System' Connector to work with file contents, but now you could iterate through a directory and load each returned filepath into a File System Connector (for example, in a Connector Loop) to process the file.
  • File Transfer Function component allows you to securely copy files between any two systems, either to or from a remote system, or between two remote systems.
  • JSON Parser allows you to read serialized JSON into the hierarchical Entry-Attribute-value model of TDI where you can use features like XPath searches and 'DOM tree walking' techniques to access and manipulate parsed objects. And of course to turn an hierarchical Entry into a JSON stream. Makes going from JSON to XML, and vice versa, a snap :)
Of course there are more components available under examples, like the RegExp Parser and Exchange Change Detection Connector, ready for eager hands to explore and put to good work. Each example folder also includes documentation on that component, as well as example Configs that can be imported into TDI and used to jump start their usage and understanding.

Saturday, November 12, 2011

Getting TDI installed on Ubuntu

Courtesy of TDI champion and support guru, Jason Williams: Adventures with IBM TDI

Monday, August 1, 2011

work.FullName.toUpperCase() - What's wrong with this picture?

The fact that TDI lets you use '.' to reference attributes (work.FullName) coupled with JavaScript engine's (limited) automatic type conversion are a constant source of confusion. Although I feel a long rant coming on, I will resist and save some content on this subject for future blogs. For now, suffice to say that using dot notation to reference an attribute gets you just that: a com.ibm.di.entry.Attribute.

Quick refresher alert: Attributes are named containers of values. So for example, work.mail is a container named 'mail' that can in turn hold zero or more values. The values themselves can be any kind of Java object: String, Date, java.sql.Blob - even another attribute, as is the case when working with hierarchical data in TDI (for example, parsing xml, rdf, json, ...).

The JS Engine in TDI sometimes converts the attribute being referenced to a string representation of its value, as with this snippet:

"My name is " + work.FullName >> My name is Slim Shady

Since the Attribute is part of a String concatenation, TDI's JSEngine conveniently returns its value as a Java String instead of the attribute itself. But that's about as far as auto-conversion goes. In most cases when you send an Attribute to do a String's job, the Attribute's toString() method is used. This results in a string value that starts with the attribute name, then a colon and its value.

work.FullName >> "FullName": "Slim Shady"

The toString() method for both the Entry and Attribute classes give you this JSON-like result (of course, you use the JSON Parser for real JSON) and this may not be what you want.

Even worse is when you start calling String functions directly from your dot-referenced Attribute:
work.FullName.toUpperCase() >> com.ibm.jscript.InterpretException: Script interpreter error, line=1, col=12: Error calling method 'toUpperCase()' on an object of type 'com.ibm.di.entry.Attribute'

So what's a poor TDI'er to do? As usual, there is more than one way to flay this feline:

work.getString("FullName").toUpperCase() >> SLIM SHADY
work.FullName.getValue().toUpperCase() >> SLIM SHADY
work["FullName"].getValue().toUpperCase() >> SLIM SHADY
(work.FullName + "").toUpperCase() >> SLIM SHADY

The clue here is to reference the value of the attribute, not the attribute itself. Sometimes you'll want a String representation and will use methods like entry.getString(attributeName) or attribute.getValue(). In those situations where you need the actual Java class used to store the value then you use attribute.getValue(index), where index is is an integer value denoting which value you want: from zero to attribute.size() - 1.

For the sake of completeness, note that calling entry.getObject(attributeName) will return the Java object used to store the first value of the named attribute.

And if any of this is still unclear, please drop a note in the forum and I'll will strive to clarify without further pontification :)

Forum: https://groups.google.com/forum/#!forum/ibm.software.network.directory-integrator

Friday, July 1, 2011

Higher Availability

A question oft asked: How do I make my TDI solution highly available? The answer often boils down to what 'available' means to your solution. In many cases it means that one or more AssemblyLines continue to function. This can be 'wired into' a solution with surprisingly little effort.

To start with, don't create long-running ALs; In other words: don't set the timeout parameter for your Iterator to 'never time out'. Instead, let it wait a bit for new input (e.g. new changelog entries or messages on a queue) and then report End-of-Data so the AL stops. And then restart it.

So when doing a directory sync you let the Change Detection Connector 'listen' for changes and stop if none appear in, say, 10 minutes. Then you have some other process that relaunches the Sync AL: like a cronjob or Windows Scheduled task, or even another AL.

This is the simplest form of HA design, and it also gives you an opportunity to check status and send reports/alerts if needed whenever the Sync AL stops - simply by checking the error object in the AL's After Close Epilog Hook. Since you also expect the AL to stop every once in a while, this can be checked for - for example, using the TDI commandline utility, tdisrvctl, or even another AL. If you detect that the AL is hanging, you can stop and restart it. Furthermore, if an unhandled error occurs and the Sync AL stops abnormally, it is also restarted again.

The idea of monitoring an AL and starting another if it stops seems straightforward. But it is not really that easy. Just because an AL appears to have stopped does not mean it's a good idea to launch a backup. Unless you design for this, running multiple copies of the same AL simultaneously (e.g. reading changelog) may not be a good idea. Also, it is very hard to determine where the failure lies: did the AL stop, has the connection to the TDI Server api been lost, did the TDI server or JVM die, was there a network failure or did the server HW crash, ...? It might be that the AL is waiting for a lock situation on some connected system or resource, or working to re-establish a lost connection. Starting a second copy may serve no purpose.

From experience, the most common situation is that either the AL is hanging - which could be an error in script logic, or I/O latency of connected systems - or it has stopped due to an unexpected (and unhandled) exception. If the AL is actually hanging then it can be killed using api calls or via the tdisrvctl utility, and then restarted. If it has stopped abruptly, restarting is the answer again. If you use a cronjob or scheduled task to (re)launch the AL, then you can be sure the TDI Server is restarted anew each time as well.

Of course, you can use a message queue and divide up your solution into 'feeds' and 'handler' AssemblyLines, allowing you to run multiple copies to increase both performance and availability. You can also use Server mode Connectors to drive solution, since Server mode provides features for pooling and reuse/restart of concurrent AssemblyLines.

These and other techniques and reflections on building robust solutions have been captured by TDI architect Johan Varno and can be found here: http://www.redbooks.ibm.com/redpieces/pdfs/redp4672.pdf

Often the simplest answer is the best one: expect ALs to stop and then restart them again.

Thursday, June 16, 2011

Adding custom headers lines to CSV output

I've gotten this questions a few times recently: how to put custom content at the top of a CSV file.

One solution would be to code the After Close Hook for the File System Connector, creating a new file that combined the headers and the CSV content. However, there is an easier solution (thanks again to Jens 'Beautiful Mind' Thomassen):

Add the following script to the After Initialize Hook of the Connector:
// At this point, the Parser is also initialized,
// so you can grab the java.io.BufferedWriter it uses.
//
stream = thisConnector.connector.getParser().getWriter()
//
// Then add your custom content
//
stream.writeln("this is the first line")
stream.writeln("this is the second line")

And that's all there is to it. When the Connector does its first write operation, the field names will be written after the custom lines.

Tuesday, April 26, 2011

skipEntry() I thought I knew you...

When you want to abruptly end the current cycle of an AssemblyLine, a script call to
system.skipEntry()
is the most commonly used method. However, this is not always the best choice. In fact, it is only relevant when you have an Iterator in the Feed section. If you don't then
system.exitFlow()
is the better choice.

Except
if your AL has a Server mode Connector. To explain why, I'll have to say a bit first about how Server mode Connectors work.

A Server mode Connector (or SMC for short) adds two important features to your AL:
  • redundancy
  • AL pooling
Redundancy comes from the fact that when the AL starts, the SMC starts a thread that binds to the specified resource, for example an IP port. Whenever a client connects, a new thread is spawned to service this client. This thread is an AssemblyLine consisting of a copy of the SMC in Iterator mode, paired up with a Flow section from the AL pool. When this child AL terminates - for example, when the connection is closed or an error occurs in AL processing - the Flow section components are returned to the pool, ready for new connections.

As long as nothing breaks the SMC itself, e.g. if the bound resource fails, then new client connections will be handled by (re)launching ALs as needed.

The AL pool is not to be confused with Connector Pooling, which do you from your Connectors library (Resources > Connectors). Instead, under AssemblyLine Options you can define both the initial size and max size of this AL's pool. These settings only apply when a Server mode Connector is present in the AL Feed section. And note that Connectors (and connections to target systems) will be initialized for each copy in the pool. Pool initialization occurs when the SMC starts so that all connections are hot-and-ready to service client requests.

As a result of all this, when you want to stop Flow section execution you need to use
task.shutdown()
so that Connectors are released to the pool. Note that this will not result in the Server Connector making a reply back to the client. To do this you still need to use:
system.exitFlow()

Friday, March 25, 2011

Reference error: 'java' not found

Just to let you know, if you get the above error then it means you've probably spelled the Java class name wrong.
now = new java.util.Calender()
I was getting ready to start pulling out hair when I saw the light: 'Calendar'.

And I've also gotten other error messages that turned out to be caused by spelling. Just wanted to share that :)

Wednesday, March 23, 2011

To Google, or not to Google? That is the wrong question.

If you've ever hit an error or connection problem or logical challenge using TDI and cursed the lack of helpful content to be found online, you may not be looking in the right places.

Firstly, use the term 'tdi' in your search arguments only if you're interest is motors or diving. Searches that include 'tivoli directory integrator' will be less exciting but more relevant.

Secondly, remember that TDI is pure Java and also that the core of an issue might be some system that you are integrating. So the trick is to examine available clues, and then make a informed search.

For example, let's look at this error returned while trying to update a Domino server:
10:03:21,410 ERROR - [UpdateDomino] CTGDIS810E handleException - cannot handle exception , update
java.lang.Exception: CTGDKC002E Failed to execute the command. NotesException occurred: Invalid object type for method argument
at com.ibm.di.connector.dominoUsers.DominoUsersConnector.executeCommand(DominoUsersConnector.java:647)
at com.ibm.di.connector.dominoUsers.DominoUsersConnector.putEntry(DominoUsersConnector.java:732)
at com.ibm.di.server.AssemblyLineComponent.executeOperation(AssemblyLineComponent.java:3139)
at com.ibm.di.server.AssemblyLineComponent.add1(AssemblyLineComponent.java:1930)
at com.ibm.di.server.AssemblyLineComponent.update(AssemblyLineComponent.java:1681)
at com.ibm.di.server.AssemblyLine.msExecuteNextConnector(AssemblyLine.java:3669)
at com.ibm.di.server.AssemblyLine.executeMainStep(AssemblyLine.java:3294)
at com.ibm.di.server.AssemblyLine.executeMainLoop(AssemblyLine.java:2930)
at com.ibm.di.server.AssemblyLine.executeMainLoop(AssemblyLine.java:2913)
at com.ibm.di.server.AssemblyLine.executeAL(AssemblyLine.java:2882)
at com.ibm.di.server.AssemblyLine.run(AssemblyLine.java:1296)
10:03:21,410 ERROR - CTGDIS266E Error in NextConnectorOperation. Exception occurred: java.lang.Exception: CTGDKC002E Failed to execute the command. NotesException occurred: Invalid object type for method argument
java.lang.Exception: CTGDKC002E Failed to execute the command. NotesException occurred: Invalid object type for method argument
at com.ibm.di.connector.dominoUsers.DominoUsersConnector.executeCommand(DominoUsersConnector.java:647)
at com.ibm.di.connector.dominoUsers.DominoUsersConnector.putEntry(DominoUsersConnector.java:732)
at com.ibm.di.server.AssemblyLineComponent.executeOperation(AssemblyLineComponent.java:3139)
at com.ibm.di.server.AssemblyLineComponent.add1(AssemblyLineComponent.java:1930)
at com.ibm.di.server.AssemblyLineComponent.update(AssemblyLineComponent.java:1681)
at com.ibm.di.server.AssemblyLine.msExecuteNextConnector(AssemblyLine.java:3669)
at com.ibm.di.server.AssemblyLine.executeMainStep(AssemblyLine.java:3294)
at com.ibm.di.server.AssemblyLine.executeMainLoop(AssemblyLine.java:2930)
at com.ibm.di.server.AssemblyLine.executeMainLoop(AssemblyLine.java:2913)
at com.ibm.di.server.AssemblyLine.executeAL(AssemblyLine.java:2882)
at com.ibm.di.server.AssemblyLine.run(AssemblyLine.java:1296)
Step one is to find the topmost (first) stackdump, and then trace it to the top two or three lines:
10:03:21,410 ERROR - [UpdateDomino] CTGDIS810E handleException - cannot handle exception , update
java.lang.Exception: CTGDKC002E Failed to execute the command. NotesException occurred: Invalid object type for method argument
at com.ibm.di.connector.dominoUsers.DominoUsersConnector.executeCommand(DominoUsersConnector.java:647)
at ...
The first line is timestamped and filled with info coming from the AssemblyLine itself. In the above snippet there is the component name in brackets, UpdateDomino, followed by a codified error message - in this case, a very general one that tells us that the AL was unprepared to handle an exception thrown by one of its components. At the very end of the first line is the operation which failed: update. Although this information gives us context, it brings us no closer to solving the problem.

The second line of the snippet is more interesting here. It also has a numbered error message that translates to 'the desired operation failed because Notes flagged an exception'. After the colon is this error: Invalid object type for method argument. Now we have bait for our hook and can go fishing for answers.

In Google I look for: 'Invalid object type for method argument' update notes
Ok, so this wasn't the best example :) Plenty of TDI content here. The third link in the result page takes me to a redpaper entitled 'Domino Integration using TDI', and here on page 26 is the same Notes error message and the cause: the type of an attribute is not recognized by Notes. A typical situation is that a date value is being written, but was not converted to a Domino Date type. Or that a set of values was being written, and one of them was null. On a side note: This can happen if you are using the template example for AD - Domino synchronization and the AD instance you're working against has a different schema for Users than the standard, out-of-the-box one. In this case the 'Location' attribute in the Output Map or the Domino Connector may be in error.

Getting back to my rant, sometimes the search results aren't this promising. That's when you add 'java' to the list of terms, hoping that some Java developer, deployer or application user has seen this before, and an answer lies beckoning in some forum thread, blog post, presentation or page of documentation somewhere.

And finally, if you learn how to read a JavaDoc then you may find answers in TDI's JavaDocs, or those of the libraries that your solution uses. This includes stuff like database drivers and client APIs, as well as standard Java classes.

So the answer is to Google, but it's the question that's key.

Wednesday, March 16, 2011

CSV Parsing with a twist

So the question I got was this: how can I get the line being parsed by the CSV Parser?

Unfortunately, the CSVParser class does not have any public method for this, so the following is not possible:

lineRead = thisConnector.getParser().getCurrentLine()
Instead, with the help of Jens Thomassen, TDI surgeon, and the indispensible AL Debugger, I created this example TDI 7.1 AL to do just that. You can download the linked Example AssemblyLine and just drop the file onto a TDI Project. It's self-contained thanks to the ever-handy Form Entry Connector.

This AL contains first a Form Entry Iterator in the Feed section that reads a CSV bytestream line-by-line using the LineReader Parser. This returns each line from the CSV loaded into the Connection parameter of the Connector. Then in the Flow section there is another Form Entry Connector that has the CSV Parser, which gets us the actual CSV attributes.

To make this magic work, I did a couple of things:
  1. First I set the Flow Section Iterator to initialize 'only when used'. You do this by pressing the More... button out to the right of the Inherit From setting and changing the Initialize drop-down accordingly. This is to prevent the Connector and Parser from initializing until we have data for it.
  2. Then I added this code to the After Initialize Hook of this second Iterator:
    outStream = new java.io.PipedOutputStream()
    inpStream = new java.io.PipedInputStream(outStream)
    newline = new java.lang.String("\n\r")
    
    formEntry = thisConnector.getConnector()
    formEntry.initParser(inpStream, null)
    
    csvColumns = formEntry.getParser().getParam("csvColumns")
    firstRow = true

    The Piped stream allows me to write into one end of the pipe and have my CSV Parser read from the other end.
  3. Now in the Before GetNext Hook I need to write the current line into my pipe.
    outStream.write(work.getString("line").getBytes())
    outStream.write(newline.getBytes())
    
    if (firstRow && (csvColumns == "")) {
      outStream.write(work.getString("line").getBytes())
      outStream.write(newline.getBytes())
      firstRow = false;
    }

    If this is the first row of the file and no Field Names have been specified for the Parser, I am assuming this must be the column title line of the file, so I have to write it twice to the pipe.
And presto! I am getting both the 'line' Attribute and those parsed out of the CSV. Note that this is a slightly simplistic approach, and as a result the first Entry returned by the CSV Parser contains the column names as Attribute values.

So I made a Second Attempt using just a single Iterator in the Feed section, and scripting the setup and calls to the CSV Parser. You can also just drop this .assemblyline file onto a Project and then play with it. I am using the code that I built for the first example in a slightly different way:
  1. The After Initialize Hook script is a bit shorter:
    outStream = new java.io.PipedOutputStream()
    inpStream = new java.io.PipedInputStream(outStream)
    newline = new java.lang.String("\n\r")
    
    firstRow = true;
    initParser = true;
  2. All the work is done in After GetNext, once I have read in the line:
    if (initParser) {
      // In this next line you could instead get a pre-configured
      // Parser from your Project Resource library:
      //    csvParser = system.getParser("Parsers/MyCSV")
      //
      // AND to find out what the 'true name' of a Parser is
      // just add it to a Connector and then use the More...
      // Select Inheritance button to see the inheritance link
      // for the Parser (yeah, it's not that elegant ;)
      //
      csvParser = system.getParser("ibmdi.CSV")
      csvParser.setInputStream(inpStream)
      csvParser.initParser()
      csvColumns = csvParser.getParam("csvColumns")
    
      initParser = false;
    }
    
    if (firstRow && (csvColumns == "")) {
      outStream.write(conn.getString("line").getBytes())
      outStream.write(newline.getBytes())
    }
    
    outStream.write(conn.getString("line").getBytes())
    outStream.write(newline.getBytes())
    
    csvEntry = csvParser.readEntry()
    if (firstRow && (csvColumns == "")) {
      firstRow = false
      system.skipEntry() // skip the column names
    }
    
    if (csvEntry != null)
      conn.merge(csvEntry)
This time only the actual data values are returned.

And now for more coffee :)

Wednesday, February 23, 2011

Portable Solutions

There are two simple tricks that will make your TDI solutions more portable:
  1. Use forward slash in pathnames
  2. Use relative pathnames
A forward slash will work on Unix and Windows, whereas backslash is Windows-only. The reason I use relative pathnames is that my ultimate goal is a single zip file distributable.

The Solution Directory is the root for all relative paths. If you look in the ibmdisrv and ibmditk batch-files/scripts used to start TDI, you can see that before the TDI Java executable is launched the current directory is changed to the solution directory (bin/defaultSolDir.bat or .sh).

The CE workspace on the other hand is an Eclipse construct. Since TDI follows the Eclipse Project paradigm, you get a workspace with folders for the various projects, and with sub-directories below these that reflect the onscreen Navigator hierarchy.

All the TDI Server itself really needs is a single XML file that is 'compiled' based on Project assets, and this file is written to the sub-directory of the Project that starts with 'Runtime-'. This is subsequently where the Default test Server loads it from when you Run or Debug your AssemblyLines. You can change this preference in the Project properties by using the Linked File option.

The Runtime folder is also where TDI puts your default Property store (also named after the Project). If you look at the Connector tab for the Property store and click on the label of the Collection/Path parameter, you'll see that this value is tied to a substitution Expression:

{config.$directory}/ProjectName.properties

The substitution token {config.$directory} translates at run-time to 'wherever the TDI Server loaded the Config from'. Filepaths set this way ensure that supporting files need only reside in the same folder as the Config file, wherever that happens to be. As you'll see below, I like to be a bit more explicit.

Armed with this knowledge, let me share how I start new projects. For the sake of illustration I'm going to call this project TDI4SyncService.
  1. Make a sub-folder of my Solution Directory named 'TDI4SyncService'.
  2. Create the project called TDI4SyncService and set the properties to write the Config file to the folder created in the step above: TDI4SyncService/TDI4SyncService.xml.
  3. Edit the TDI4SyncService Property store so that the collection/filepath is 'TDI4SyncService/TDI4SyncService.properties"
Now I have a single folder that I can zip down and share. It contains both the Config xml and properties files, plus anything else my solution accumulates during development. I will even store .jar files here if they are project specific, editing the com.ibm.di.userjars property as needed. Plus I can drop in batchfiles/scripts for launching the solution so that TDI skills won't be required to use it.