TDIing out loud, ok SDIing as well

Ramblings on the paradigm-shift that is TDI.

Wednesday, September 2, 2009

Clustering, HA and Scaling, Oh My!

Ok, here comes some ramblings on making better TDI solutions. Risk reduction is always a matter of expense versus exposure, and this approach requires an upfront investment and adds lots of moving parts, so it's not a size that fits all and pays most dividends for larger projects, or for series of similar smaller ones. That said...

You start by deconstructing a solution into a set of individual service AssemblyLines:
  1. Error Handler - First consider that all error hooks/scripts in all other ALs dispatch to a central error queue. Then you have this Error Handler AL that continuously iterates this queue and does logging/alerting/reacting as needed. This AL also emits a periodic heartbeat to an event queue*.

    * That's right, 2 queues. And note that you will want persistent queueing like MQ, the System Store/DB, files, etc.

  2. Event Handlers (plural) - This set of service ALs catch events - e.g. detecting changes; listening for incoming messages, mail or requests; polling for new/changed files, and so forth. Each AL pushes caught events to the event queue, along with its own heartbeats. As mentioned above, all errors are dispatched to the error queue.

  3. Workers (plural). These each grab the next relevant event off the queue and then performs a required action, like writing detected changes to a single target (e.g. for a sync solution you would have one Worker AL to write to Domino, one for AD, one for SAP, etc.), passing events to other systems (i.e. switching), performing searches and building responses (rss/REST/SOAP/...) or whatever. Each also reports status through the event queue and problems to the error queue.

  4. Heartbeat Monitor - Polls the queues to make sure things are happening, for example that events are being processed in a timely fashion, and that heartbeats are received (and cleaned up). If the Error Handler is down, it does its own alerting and logging (it can even send events to AMC or a backup TDI Server).
This approach is suited for unit testing and provides better solution availability and maintainability than one-stop-shopping ALs tend to do. It also scales easily - just run more ALs, using inter-AL comms for coordination.

But it's definitely not for the lighthearted, or those not comfortable in the AL Debugger. However, if you do it right, you end up with a reusable the AL service framework.

And if you send it to me for publication then I'll send you a limited edition, orange plastic Metamerge pen :)




These really are great pens :)