TDIing out loud, ok SDIing as well

Ramblings on the paradigm-shift that is TDI.

Wednesday, November 21, 2007

What do JDBC commit/rollback and LDAP rebind have in common?

Both are features of their respective protocols and both are available for use from script. All you have to do is get hold of the Connector Interface, aka the "CI".

var ci = thisConnector.getConnector();

Note that "thisConnector" is a handy variable that always references the current component. Also, for TDI versions prior to 6.1.1 you don't have the getConnector() method and must reference the connector member field directly:

var ci = thisConnector.connector;

Once you have the CI, you have direct access to technology/vendor/platform-specific functionality, as well as the standard methods that all CI's must implement in order to support Connector Modes. However, working with the CI directly will not invoke any Hook flows.

Hook flow execution is initiated when an "AL Component" method is called, like getnext() or lookup(). These standard AL component functions can be found in the TDI JavaDocs (Help > Low Level API) . If look at the class com.ibm.di.server.AssemblyLineComponent you'll see these calls. For example, the update() method invokes the Update Mode Hook flow which performs a a Lookup and then branches to either Add or Modify. The actual read and write operations are provided by calling methods in the attached CI: findEntry(), putEntry() and modEntry(), in that order. Calling any of these CI functions directly from script means bypassing the Hook flow logic provided by the AL Component.

In addition to the methods required to support desired Modes, a CI can contain any number of supplementary functions. The JDBCConnector class offers jdbc-specific calls like commit() and rollback(); The LDAPConnector lets you rebind() and getServerInfo().

But the fun doesn't stop here. Many CI's can also return a handle to the underlying system and vendor libraries. For example, the JDBCConnector offers a getConnection() call to return the underlying driver's Connection object, implemented according to this standard interface:

http://java.sun.com/j2se/1.5/docs/api/java/sql/Connection.html

plus vendor-specific additions.

Another example is the Notes Connector that lets you get a reference to the currently opened Domino Session, Database or View object. Then you can make direct Domino API calls to do stuff like re-certify users or invoke AdminP processes.

And, of course, Functions and Parsers have Interfaces too.

Wednesday, October 31, 2007

Made-to-order music videos

I found this very cool online service and made TDI ready for MTV ;)

BlueGlue remix #2

Guess I'll need to add a concert t-shirt here as well then:

Prepare to be integrated...

Monday, October 29, 2007

Why throw multiple ALs at a single problem?

I get this one occasionally and it's a very valid question from a simplicity standpoint (a TDI mantra). However, there are good reasons for dividing a workload up among multiple AssemblyLines, and better performance is one of them -- as long as the task at hand allows for multi-threading; For example, data movement/migration/sync where the ordering of updates is not significant as long as all are processed. Here is a whitepaper that outlines a number of patterns that lend themselves to multi-AL solutions:

Performance Best Practices Paper for TDI 6.1 and 6.1.1

Another reason is to improve availability. Having several simultaneous worker ALs as mentioned above also means no single point of failure. Imagine a scenario where one or more AssemblyLines (on one or more TDI Servers) read source data and write to MQ. At the other end of this secure, persistent data pipe is another set of ALs picking up these messages on a first-come-first-serve basis. If an AssemblyLine fails then the integration service is degraded; not dead.

One oft-used multi-AL technique is to have a secondary "launcher" AL with a "while(true)" loop that starts the primary long-running AssemblyLine, waiting for it to complete - which it should never do (in theory). If for some reason the mission-critical AL stops then postmortem failure handling, including logs/alerts can be made by the calling AL; Finally, the AL can be re-started again in the next cycle of the loop.

I have also seen complex logic divided up into smaller, simpler ALs for reasons of simplicity and maintenance. The developer specs out each step in their integration flows - along with constraints, schema and invariants - and them implements these as individual ALs. This allows for incremental upgrades to a solution as well as unit testing.

Finally, entire AssemblyLines can be exposed as components (called "Adapters") facilitating sharing and reuse. You can find more on this topic here:

Introducing Adapters with Tivoli Directory Integrator 6.1

Monday, September 24, 2007

Online backup of the System Store (Derby/Cloudscape)

Follow these steps to build an automated System Store backup utility. Note that your System Store must be set up for Network mode.
  1. Create a new AL (call it "BackupSystemStore")

  2. Add a JDBC Connector (call it "Derby") in AddOnly mode and put it in Passive state.

  3. Configure the JDBC Connector to point at the System Store. The easiest way to do this is to click on the label of the JDBC URL parameter. In the resulting dialog, use the Expression drop-down to choose the com.ibm.di.store.database property. Or you could just enter this into the Expression field:

    {property.Solution-Properties:com.ibm.di.store.database}

    Now the JDBC URL parameter will be dynamically configured using the named property from your Solution-Properties (solution.properties). This is the same property TDI uses when working with its System Store database.

  4. Repeat step 3 for the other mandatory parameters:

    • JDBC Driver - com.ibm.di.store.jdbc.driver

    • Username - com.ibm.di.store.jdbc.user

    • Password - com.ibm.di.store.jdbc.password

    Since the Connector is not in Iterator mode, we don't have to set Table Name (i.e. no selectEntries).

  5. Add a Script component to the AL with this code:

    // Make sure the path uses forward "/" and ends with one
    // as required by SQL syntax.
    //
    function sqlPath( pth ) {
    pth = system.mapString(pth, "\\", "/");
    if (!pth.endsWith("/"))
    return pth + "/"
    else
    return pth;
    }

    // Here is the function for backuping up the System Store
    // to the specified directory.
    //
    function backupDB( backupdir ) {
    // Get today's date as a string:

    var todaysDate = new java.text.SimpleDateFormat("yyyy-MM-dd");
    var backupdirectory = sqlPath(backupdir) +
    todaysDate.format((java.util.Calendar.getInstance()).getTime());

    task.logmsg("Backuping Derby DB to " + backupdirectory + "...");

    res = Derby.getConnector().execSQL("CALL SYSCS_UTIL.SYSCS_BACKUP_DATABASE('" +
    backupdirectory + "')");


    if (res == null || res == "")
    task.logmsg("Done!")
    else
    task.logmsg("** Error: " + res);
    }

    // Here the function is called to perform the backup
    // to C:/_Backup/ (which is created automagically the
    // first time).
    //
    backupDB("C:\\_Backup\\");
Now you can set up a Scheduler AL with the Timer Connector to periodically call your BackupSystemStore AssemblyLine and make a fresh backup.

Thanks to Boli of Ascendant for the link that made my day :) Backing up Derby databases. This is from the admin guide found with the other Online Derby docs.

Friday, September 21, 2007

Buried treasure

A favored pasttime of my youth was treasure hunting. Whether it was scouring a hillside for Easter eggs, rummaging in the attic of my grandparents' house or just cleaning out the pockets of my jeans. I have this same, if somewhat unusual relationship to troubleshooting error messages from TDI. The secret is knowing where to dig.

The first thing you need to do is scan your log output for the very first error message and stack dump, which will be the root cause of the failure. Subsequent error message are likely to be related to this initial fault.

Once you've found the first error then you can start digging. This is done by first splitting the text of the message into two parts: what TDI is reporting, and what the underlying library or system is complaining about. For example, TDI could be giving you a standard message like:
CTGDIJ001E
No default JDBC driver. The 'jdbcDriver' parameter must be set to use the JDBC Connector.
Here the entire prose of the error message comes from TDI. In this case there is no data source involved (yet) since the Connector itself can't be initialized without the JDBC Driver parameter being set correctly. These are also the types of errors that Google may be less than helpful for, and your best bet may be the discussion forum, the community website or the online docs.

To illustrate the type of message that a net search can help you decipher, let's look at a message I helped debug last week (formatted in bold and italics to illustrate my point):

10:12:03 [DB2_Update] CTGDIS810E handleException - cannot handle exception , initialize

Unable to obtain schema: com.ibm.db2.jcc.c.SqlException: DB2 SQL error:
SQLCODE: -443, SQLSTATE: 38553, SQLERRMC: SYSIBM.SQLCOLUMNS; COLUMNS;SYSIBM:CLI:-805

In a case like this where the exception originated in a call to some driver or API, the first part of the message (italicized above) will help you find the point in your AssemblyLine where the failure occurred.

At the start of the error message is the name of an AssemblyLine component in brackets: [DB2_Update]. This tells you where the error occured in your AL. Immediately following the component name we can furthermore see TDI telling us that this unhandled exception occurred during initialization of the component. So far so good: we know where and when now. The next step is to discover why we are getting this exception.

That is where the text that I've formatted as bold comes in: this is the error that TDI received from the underlying RDBMS. Since it is a DB2 error (not specific to TDI) there is a much greater chance that someone else has fixed and documented this already.

So when I did a Google search with tidbits gleaned from this part of the message (e.g. "sqlcode -443 sqlstate 38553 cli -805") I uncovered plenty of relevant links, eventually leading me to a newgroup post that described a bind problem easily fixed with "bind db2schema.bnd". The search was over, the problem solved and I am ready for new adventures.

Wednesday, August 29, 2007

What's in a name?

Ok, so it's not immediately obvious that the pre-defined script variable system is actually detailed in the TDI JavaDocs under the com.ibm.di.function.UserFunctions class. Or that main goes by the whimsical class name com.ibm.di.server.RS and task is com.ibm.di.server.AssemblyLine.

But this is no problem when all you have to do is ask:

task.logmsg(" system: " + system.getClass());
task.logmsg(" task: " + task.getClass());
task.logmsg(" main: " + main.getClass());

The getClass() method is available for all Java objects and so will work for any variables that reference one; However, it will not work for JavaScript types (e.g. Number, String of Boolean).

To get around this limitation we can make our own getClass() function:

function getClass( v ) {
if (typeof(v) == "object")
return v.getClass()
else
typeof(v);
}

Then you'll be on speaking terms with variables from both worlds.

Monday, August 27, 2007

Exceptional trick for initialization code

Let's say you have an AssemblyLine that will be processing thousands of entries. and you want a progress message written every 100 cycles. This will require a counter.

var entryCount = 0;

And you'll need a snippet of script to write your message.

// Increment the counter and test if it's time to write the progress message.
// I chose here to write to standard output and not the log
//
entryCount++;
if (entryCount % 100 == 0)
java.lang.System.out.println("Entries processed: " + entryCount);

Since you don't want the counter set to 0 every cycle, you need to have your initialization code outside the main loop of the AL. This leaves three choices.

1) Put the init code in a Prolog Hook (either of the AL or some component). This is a common approach, but it does make ALs a bit harder to navigate since references to the same variable are spread across components and the AssemblyLine itself.

2) Another technique is to use a Connector Loop instead of standard Feeds-Flow behavior to drive data in your AssemblyLine. Without an active Feeds section, your initialization simply code can be handled by a Script component at the start of the Flow section. Legibility is improved since you get a component, preferrably named something like "Initialization", visible at the start of your AL. Of course, the AssemblyLine is a tad more complex, and you don't get to exploit End Of Cycle behaviors (like Iterator State Persistence). Plus you still initialize variables one place and then use them someplace else.

3) That leads me to my final point where I reveal the pun in my title above: use exception handling to ensure that code is run once and only once.

Exceptions are how errors are flagged and passed around in development languages like C++, Java and JavaScript. When some piece of code gets into trouble, it sends up a flare - which is called throwing an exception in the parlance. This exception causes normal processing to stop and control to be passed back up the call stack until it is either caught, or it causes the application to abort with an "unhandled exception" message.

TDI has exception handling logic that passes control to Error Hooks, as described in the Flow Diagrams, but you can implement your own using the JavaScript try-catch statement.

try {
... try some code that may fail with an exception ...
} catch (excptn) {
... end up here if the above fails (and passed the "excptn" variable) ...
}

Going back to the initial scenario of writing progress messages, all counter-related logic can be implemented in a single Script component at the top of your Feeds section - regardless of whether you are using an Iterator in the Flow or not.

try {
// This next line fails the first time since entryCount is defined.
entryCount++;
if (entryCount % 100 == 100)
java.lang.System.out.println("Entries processed: " + entryCount);
} catch (excptn) {
// The code below is invoked when the above fails (first time only).
// Note that we must init entryCount to 1 here, instead of 0 as before.
entryCount = 1;
}

With this method you still get good readability (although your Script component should probably be named "Init and show progress") and you keep the initialization and usage of your script variable to manageable snippet.