Observing and Tracking Data Pipelines

We define a data pipeline as the tethered activities defined from start to finish. Within Jitterbit operation logs, this is the executed chain of operations and is shown as a tree of operations. Troubleshooting a chain of operations relies on examining the Jitterbit operation logs. High-recurrence, active data pipelines can complicate this effort.
In Design Studio (DS), if an operation in the tree fails, ancestral operations will show a warning status, rather than success. This warning status does not appear unless your display shows the downstream failing operation.
DS only allows 1000 logs at a time. This can limit your ability to view all the logs of an entire desirable pipeline. In the Web Management Console (WMC), unless you name and save your queries, you face environment filtering as well.
A strategy exists that can improve your observability to an entire pipeline’s logs. One can view only the logs for a chain of operations or a particular instance of a chain of operations.
Update your existing Operation Setup scripts
Add this line to the Setup script of the initial operation of a chain:
WriteToOperationLog("PIPELINE ID::"+$pipelineId=IfEmpty($pipelineId,$jitterbit.operation.guid+' / '+$jitterbit.operation.instance_guid));
For each downstream operation in a pipeline, include the following in an initial Setup script:
WriteToOperationLog("PIPELINE ID::"+$pipelineId);
These Operation Setup scripts are the first Operation component. They appear before Sources and other non-Script components. For various reasons to be shared in a future blog, it’s a good practice to have a Setup script for every operation in your design.
Many times, this line can be ignored, but the logs will now include the $pipelineId variable value in every operation log in a chain. It comprises of
- the unique id of the first operation in the chain,
- followed by a slash,
- followed by the unique instance id of the first operation in the chain.
We’re using the first operation in the chain to collect and reuse its operation GUID (Global Unique Identifier) and its instance GUID. The operation’s toolbar pick list menu item “Additional Details…” to acquire an Operation’s GUID displays the Operation’s GUID. (Instance GUID’s are defined at runtime)
Enter the appropriate GUID for the Message value of the Operation log filter/query (DS/WMC). We now see only the chain’s operation logs or just the instance logs of an operation chain.
For example, let’s say you see a failed operation in the operation log. You can examine this failed operation’s log to cut out the pipeline Id value into your software clip board. Here, we’ll cut just the instance or second GUID—the Operation GUID after the slash character (‘/’). If we use this for the message value of a filter or query, we see the entire operation chain leading up to, including and after the failed operation. We can now easily find the relevant the Operation logs that precede the failed Operation. Identifying the critical logged values might explain the error.
We can also see all the historical (30 days most) pipeline operation logs for this chain. By using the first GUID—the Operation GUID before the slash character—for the filter or query message value, we can see this pipeline log history.
Always take care when designing and instrumenting (logging) your Operation chains. Using the above pipeline ID approach, you can more quickly troubleshoot by examining just the relevant logs in a sea of operation logs.
No Comments