TDIing out loud, ok SDIing as well

Ramblings on the paradigm-shift that is TDI.

Monday, November 24, 2014

Hardening an AssemblyLine

This is done with a little effort in three areas:
  1. Error handling
  2. Log handling
  3. Auto-reconnect
For Error Handling you need to add code to the Prolog - On Error Hook for connection errors in components, and to the DataFlow - Default On Error (the catch-all Error Hook for all Connector modes). Here you will want to capture all error information to the log in order to help troubleshooting the issue. This can be done very simply by dumping he 'error' Entry to the log:

    task.dumpEntry(error);

The 'error' Entry holds a number of Attributes that indicate the error message and exception class, as well as where and when the error occurred. The 'exception' Attribute of the 'error' Entry is the Java exception itself, and you can print out the stacktrace using script like this:

   var sw;
   var pw;
   var ex = error.getObject('exception');

   if (ex != null) {
      try {
         sw = java.io.StringWriter();
         pw = java.io.PrintWriter(sw);
         ex.printStackTrace(pw);
         pw.close();
      } catch (ex2) {
         var sw = ex
      }


   task.logmsg("ERROR", sw.toString());

Note also that if you enable Error Hooks then execution will continue in the AssemblyLine as if nothing had occurred. So you will want to deal with the error yourself, for example by issuing a call like system.exitFlow() to stop the current cycle, or system.skipEntry() to pass control to the Feed Iterator in order to read the next entry. Or you may wish to re-throw the exception in some situations:

   throw error.getObject('exception');

Log Handling means making conscious decisions about how your ALs perform logging. Of course all log messages are added to the TDI Server's logfile (ibmdi.log). However, this can make it difficult to find and isolate messages, particularly if your solution has several AssemblyLines. Instead each AL should likely have its own log. My favorite LogAppender is the FileRollerAppender, which maintains a set of historical log files, rolling these by appending a number to the file extension each time the AL is run.

You may also want to provide a couple of log files for each AssemblyLine - one Appender set to log at INFO or DEBUG LEVEL, while another that only captures error messages - e.g. by setting the LogAppender level to WARN, which will include WARN, ERROR and FATAL.

I typically have my own scripted log() function that includes information about where in my solution the logging is taking place.

   function log(lvl, msg) {
      var toConsole = false;
      if (typeof lvl === 'undefined') { // if no arguments
         lvl = 'INFO';
      }
      if (typeof msg === 'undefined') { // if only one argument
         msg = lvl;
         lvl = 'INFO';
      }

      // In case the log level is not in uppercase
      lvl = lvl.trim().toUpperCase();

      // CONSOLE level = print to console
      if (lvl.startsWith('CON')) { // 'CONSOLE' log message
         toConsole = true;
         lvl = 'INFO';
      }
      // All error messages also go to the console
      if ('WARN'.equals(lvl) || 'ERROR'.equals(lvl) ||
          'FATAL'.equals(lvl)) {
         toConsole = true;
      }

      // Now to determine where log was called from.
      // First get the AL name
      var where = '[' + task.getShortName();

      // Afterwards comes the current component (if not an AL Hook)
      try {
         var compName = thisComponent.getName();
         where += '/' + compName;
      } catch (ex) {
         // was not in an AL component
      }

      // Now add the Hook name (if inside a Hook)
      try {
         var hookName = thisScriptObject.HookName.getValue();
         where += '/' + hookName;
      } catch (ex) {
         // was not in a Hook
      }

      // Log the message
      task.logmsg(lvl, where + '] ' + msg);

      // Print some messages to console as well
      if (toConsole) {
         java.lang.System.out.println(lvl + " - " + msg);
      }

   }

I have this function defined in a Resources > Scripts Script that is tagged to be implicitly included for all ALs.

Finally, Auto-Reconnect is a feature of Connectors and Function components found in the Connection Errors tab. Here you enable the reconnect feature typically only for when a connection is lost. The Number of Retries parameter is set to an adequate value, with the Delay Between Retries in seconds. In this way, if the connection goes down then the component can attempt to reconnect and then continue.

Note that the Reconnect rules must be set for this component. If you don't see built-in rules here then you will need to define your own, as described here.


Monday, August 4, 2014

Null Behavior


I've gotten this question (again) and decided to explain it here so that Google can find it.

Null Behavior allows you to deal with missing data without having to write Javascript. The Null Behavior feature lets you define what a 'null' attribute is and how it should be handled. By default the definition of 'null' is that the source attribute is missing or has no value (null value). The default handling is that an attribute with this name will be found in the target entry. So for an Input Map then the Work Entry will not have this attribute after the mapping is done. For an Output Map it will be the Conn Entry that does not have the attribute. Furthermore, if an attribute with this name was found in the target entry prior to the map, then it is deleted.

To illustrate this functionality, imagine you have an input map from a database connector with an attribute that gets its source value from a db column named 'TITLE', and that this column is nullable. In other words, not all rows need to have this column value. Alternatively, it could be an object repository (like an LDAP directory) and that the attribute in question is not found in all entries you are reading. During the mapping processes, SDI discovers that the source attribute (conn.TITLE) is not found. Null behavior will detect the 'null' and remove the attribute from the Work Entry.

Now imagine you are reading from a CSV file. In this case there are no missing attributes or null values, only empty string values for some attributes. So now you can change Null Behavior to define 'null' as being an empty string (plus all the other definitions over it in the Null Behavior dialog). Furthermore, you can define handling to be that you want a default value returned - for example N/A. ViolĂ , now all rows with an empty value for 'TITLE' will be returned with this attribute value set to 'N/A'.

Final note: Null Behavior can be defined at the map level using the More... button at the top of the map, or for a single attribute by right-clicking it and selecting Null Behavior.

Saturday, July 12, 2014

mapReduce Revealed!

Here is an excellent explanation of mapReduce that I've had to share (again and again). Enjoy!

http://www.slideshare.net/okurow/couchdb-mapreduce-13321353

Monday, May 26, 2014

Sometimes I dream in Javascript

Very nice article about big data, MapReduce and Javascript.

http://www.joelonsoftware.com/items/2006/08/01.html

And a rant I enjoyed greatly :)

http://steve-yegge.blogspot.no/2006/03/execution-in-kingdom-of-nouns.html

-E)