Hierarchically Sorting XML and CSV Records

XML is an extremely versatile data transport format. By using the Jitterbit XSLT Transform Component, you can manipulate your XML using XSLT. Many stylesheets are posted online. If you’re new to XSLT, see this XSLT introduction. When wanting to sort a CSV file hierarchically, first convert it to XML. (In the wild, I’ve seen this need with Concur’s Employee import file.) With a parameterized XSLT style sheet, we can sort data in a parent/child sort order.
For example, if I have an XML file of <employees\> with <employee\> stanzas ..
<employees> <employee> <TRX_TYPE> 300 </TRX_TYPE> <EMPLOYEE_NUMBER> B3303 </EMPLOYEE_NUMBER> <EMAIL> simple.sample@example.com </EMAIL> <FIRSTNAME> Simple </FIRSTNAME> <LASTNAME> Sample </LASTNAME> <MIDDLENAME> E </MIDDLENAME> <SUPERVISOR_NUMBER> 36617 </SUPERVISOR_NUMBER> <EXPENSE_APPROVER> N </EXPENSE_APPROVER> </employee> <employee> <TRX_TYPE> 300 </TRX_TYPE> <EMPLOYEE_NUMBER> 36617 </EMPLOYEE_NUMBER> <EMAIL> singleton.patterno@example.com </EMAIL> <FIRSTNAME> Singleton </FIRSTNAME> <LASTNAME> Patterno </LASTNAME> <SUPERVISOR_NUMBER/> <EXPENSE_APPROVER> Y </EXPENSE_APPROVER> </employee> </employees>
..and want to sort so that supervisors appear before their staff, the XSLT uses three parameters (global variables):
$_HIER.parent = "SUPERVISOR_NUMBER"; // <SUPERVISOR_NUMBER> $_HIER.child = "EMPLOYEE_NUMBER"; // <EMPLOYEE_NUMBER>
The third is a composite of the other two:
//to avoid the misinterpretation of Jitterbit's square-bracket notation $_notHIER.parent = "[not("+$_HIER.parent+" = ../employee/"+$_HIER.child+")]";
The XSLT stylesheet reads as follows.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xmlns:mf="http://sortExample.com/mf" exclude-result-prefixes="mf"> <xsl:output indent="yes"/> <xsl:strip-space elements="*"/> <xsl:key name="ref" match="employee" use="[_HIER.parent]"/> <xsl:variable name="main-root" select="/"/> <xsl:function name="mf:refs" as="element(employee)*"> <xsl:param name="input" as="element(employee)*"/> <xsl:copy-of select="$input"/> <xsl:sequence select="if (key('ref', $input/[_HIER.child], $main-root)) then mf:refs(key('ref', $input/[_HIER.child], $main-root)) else ()"/> </xsl:function> <xsl:template match="employees"> <xsl:copy> <xsl:sequence select="mf:refs(employee[_notHIER.parent])"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
Notice use of
$_HIER.parent in line 8
$_HIER.child (twice) in line 15 and
$_notHIER.parent in line 20
In addition to their use above, note the references to the employee (lines 12 & 13) and employees (line 18). Your customization will reflect your equivalent tags.
Now you can take advantage of stylesheets meant for a command-line processor. By using the Jitterbit variable’s square-bracket notation, parameterization is easy. Cut and paste a stylesheet found elsewhere and parameterize without great XSLT knowledge. You will find your Jitterbit XSLT solutions will grow.
Below is a pipeline that converts a CSV employee file to XML, sorts as described above, then converts the XML back in CSV. If your data is already in XML format, then you only execute “Hierarchically sort Employees by their Supervisor”. The script “Setup XSLT Hier. Sort” will make the three global variable assignments above. The Error Handler script is simply a call to the Event Handler script.
No Comments