Huge Message Processing with WSO2 ESB Smooks Mediator


Smooks is a powerful framework for processing, manipulating and transforming XML and non XML data. WSO2 ESB supports executing Smooks features through 'Smooks Mediator'. 

One of the main features introduced in Smooks v1.0 is the ability to process huge messages (Gbs in size) [1]. Now with the WSO2 ESB 4.5.0 release (and later), Huge Message Processing feature is supported through Smooks Mediator!

Smooks supports three types of processing for huge messages which are,
1. one-to-one transformation
2. splitting and routing
3. persistence

This post shows how to process large input messages using Splitting and routing approach. 

Step 1: Create sample Huge Input file. 

This post assumes the input message is in the following format.

<order id="332">
    <header>
        <customer number="123">Joe</customer>
    </header>
    <order -items="-items">
        <order -item="-item" id="1">
            <product quantity="4">Pen</product>
            <price>8.80</price>
        </order>
        <order -item="-item" id="2">
            <product quantity="1">Book</product>
            <price>8.80</price>
        </order>
        <order -item="-item" id="3">
            <product quantity="2">Bottle</product>
            <price>8.80</price>
        </order>
        <order -item="-item" id="4">
            <product quantity="8">Note Book</product>
            <price>8.80</price>
        </order>
    </order>
</order>

You can write a simple java program to generate a file with large number of entries. 

FileWriter fw = new FileWriter("input-message.txt");
PrintWriter pw = new PrintWriter(fw);
       
        /*XML */
        pw.print("<order id="332">\n <header>\n   <customer number="123">Joe</customer>\n </header>\n <order -items="-items">\n");
        for(int i=0;i<=2000000;i++){
        pw.print("\t<order -item="-item" id="&quot;+i+&quot;">\n\t\t<product quantity="4">Pen</product>\n\t\t<price>8.80</price>\n\t</order>\n");
                 
        }
        pw.write(" </order>\n</order>");


Step 2: Smooks Configuration 

Let's write the Smooks configuration to split and route the above message. When we are processing huge messages with Smooks, we should make sure to use the SAX filter.

The basic steps of this Smooks process are, 
1. Java Binding - Bind the input message to java beans
2. Templating - Apply a template which represents split message on input message elements
3. Routing - Route each split message

So for doing each of the above steps we need to use the relevant Smooks cartridges.

1. Java Binding

The Smooks JavaBean Cartridge allows you to create and populate Java objects from your message data [2]. We can map input message elements to real java objects by writing bean classes or to virtual objects which are Maps and Lists. Here we will be binding to virtual objects. In that way we can build complete object model without writing our own business classes.

Let's assume that we are going to split the input message such that one split message contains a single order item information (item-id, product, quantity, price) with the order information (order-id, customer-id, customer-name).

So we can define two beans in our smooks configuration;  order and orderItem.

<smooks-resource-list xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd" xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.2.xsd" xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd">
    
 <core:filtersettings type="SAX"></core:filtersettings>

 <!-- Extract and decode data from the message.  Later Used in the freemarker template.
         Note that we could also use a NodeModel here... -->
 <jb:bean beanid="order" class="java.util.Hashtable" createonelement="order">

     <jb:value data="order/@id" decoder="Integer" property="orderId"></jb:value>
     <jb:value data="header/customer/@number" decoder="Long" property="customerNumber"></jb:value>
     <jb:value data="header/customer" property="customerName"></jb:value>
     <jb:wiring beanidref="orderItem" property="orderItem"></jb:wiring>
     </jb:bean>

 <jb:bean beanid="orderItem" class="java.util.Hashtable" createonelement="order-item">

     <jb:value data="order-item/@id" decoder="Integer" property="itemId"></jb:value>
     <jb:value data="order-item/product" property="product"></jb:value>
     <jb:value data="order-item/product/@quantity" decoder="Integer" property="quantity"></jb:value>
     <jb:value data="order-item/price" decoder="Double" property="price"></jb:value>

 </jb:bean>     

</smooks-resource-list>

2. Templating

Smooks Templating allows fragment-level templating using different templating solutions. Smooks supported templating technologies are FreeMarker and XSL templating. In here we are going to use FreeMarker templating solution.

Configuring FreeMarker templates in Smooks is done through the http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd configuration namespace. We can refer the message content in template definition through the java beans which we have defined in the above step.

There are two methods of FreeMarker template definitions. They are In line and External Template Reference. In this example let's use in-line templating.

First we need to decide the format of a single split message. Since we are going to split the input message such that one split message contains a single order-item information (item-id, product, quantity, price) with the order information (order-id, customer-id, customer-name), it will look as follows.

The java object model we had populated above is been used in template definition.

         
<orderitem id="${order.orderItem.itemId}" order="${order.orderId}">
    <customer>
           <name>${order.customerName}</name>
           <number>${order.customerNumber?c}</number>
    </customer>
       <details open="">
           <product>${order.orderItem.product}</product>
           <quantity>${order.orderItem.quantity}</quantity>
           <price>${order.orderItem.price}</price>
       </details>
 </orderitem>


Let's add the templating configuration to our smooks configuration.

<smooks-resource-list xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd" xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd" xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.2.xsd" xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd">
    
 <core:filtersettings type="SAX"></core:filtersettings>

 <!-- Extract and decode data from the message.  Later Used in the freemarker template.
         Note that we could also use a NodeModel here... -->
 <jb:bean beanid="order" class="java.util.Hashtable" createonelement="order">

     <jb:value data="order/@id" decoder="Integer" property="orderId"></jb:value>
     <jb:value data="header/customer/@number" decoder="Long" property="customerNumber"></jb:value>
     <jb:value data="header/customer" property="customerName"></jb:value>
     <jb:wiring beanidref="orderItem" property="orderItem"></jb:wiring>
     </jb:bean>

 <jb:bean beanid="orderItem" class="java.util.Hashtable" createonelement="order-item">

     <jb:value data="order-item/@id" decoder="Integer" property="itemId"></jb:value>
     <jb:value data="order-item/product" property="product"></jb:value>
     <jb:value data="order-item/product/@quantity" decoder="Integer" property="quantity"></jb:value>
     <jb:value data="order-item/price" decoder="Double" property="price"></jb:value>

 </jb:bean>     

<ftl:freemarker applyonelement="order-item">
  <ftl:template><!--<orderitem id="${order.orderItem.itemId}" order="${order.orderId}">
    <customer>
    <name>${order.customerName}</name>
           <number>${order.customerNumber?c}</number>
    </customer>
       <details>
           <product>${order.orderItem.product}</product>
           <quantity>${order.orderItem.quantity}</quantity>
           <price>${order.orderItem.price}</price>
       </details>
   </orderitem>-->
  </ftl:template>
  <ftl:use>
       <!-- Output the templating result to the "orderItemSplitStream" file output stream... -->
       <ftl:outputto outputstreamresource="orderItemSplitStream"></ftl:outputto>
  </ftl:use>
</ftl:freemarker>

</smooks-resource-list>
Please note that using <ftl:outputto>, you can direct Smooks to write the templating result directly to an OutputStreamResource.

 3. Routing

So far we have defined the bean model of the message, then defined the template of a single split message. Now we have to continue smooks configuration to route each message fragment to an endpoint. These endpoints can be file, database or JMS endpoints.

In this sample let's route the message fragments to file locations. As in the above step we defined the outputTo element to write to orderItemSplitStream resource, lets add outputStream named orderItemSplitStream to our smooks configuration.

We need to define following attributes when defining the outputStream

fileNamePattern

Can be composed by referring java object model we created. The composing name should be a unique name for each message fragment.

destinationDirectoryPattern

Destination where files should be created.

highWaterMark

Maximum number of files that can be created in the directory. This should be increased according to the input message size.

<smooks-resource-list xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd" xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd" xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.2.xsd" xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd">
    
 <core:filtersettings type="SAX"></core:filtersettings>

 <!-- Extract and decode data from the message.  Later Used in the freemarker template.
         Note that we could also use a NodeModel here... -->
 <jb:bean beanid="order" class="java.util.Hashtable" createonelement="order">

     <jb:value data="order/@id" decoder="Integer" property="orderId"></jb:value>
     <jb:value data="header/customer/@number" decoder="Long" property="customerNumber"></jb:value>
     <jb:value data="header/customer" property="customerName"></jb:value>
     <jb:wiring beanidref="orderItem" property="orderItem"></jb:wiring>
     </jb:bean>

 <jb:bean beanid="orderItem" class="java.util.Hashtable" createonelement="order-item">

     <jb:value data="order-item/@id" decoder="Integer" property="itemId"></jb:value>
     <jb:value data="order-item/product" property="product"></jb:value>
     <jb:value data="order-item/product/@quantity" decoder="Integer" property="quantity"></jb:value>
     <jb:value data="order-item/price" decoder="Double" property="price"></jb:value>

 </jb:bean>     

<ftl:freemarker applyonelement="order-item">
  <ftl:template><!--<orderitem id="${order.orderItem.itemId}" order="${order.orderId}">
    <customer>
    <name>${order.customerName}</name>
           <number>${order.customerNumber?c}</number>
    </customer>
       <details>
           <product>${order.orderItem.product}</product>
           <quantity>${order.orderItem.quantity}</quantity>
           <price>${order.orderItem.price}</price>
       </details>
   </orderitem>-->
  </ftl:template>
  <ftl:use>
       <!-- Output the templating result to the "orderItemSplitStream" file output stream... -->
       <ftl:outputto outputstreamresource="orderItemSplitStream"></ftl:outputto>
  </ftl:use>
</ftl:freemarker>

<!-- Create/open a file output stream.  This is writen to by the freemarker template (above).. -->
<file:outputstream openonelement="order-item" resourcename="orderItemSplitStream">
     <file:filenamepattern>order-${order.orderId}-${order.orderItem.itemId}.xml</file:filenamepattern>
     <file:destinationdirectorypattern>
          /home/lakmali/dev/test/smooks/orders
     </file:destinationdirectorypattern>
     <file:highwatermark mark="10000000"></file:highwatermark>
</file:outputstream>
 
</smooks-resource-list>

Step 3: Process with WSO2 ESB Smooks Mediator

Now we have finished writing the smooks configuration which will split and route an incoming message. So now we need to get this executed against our Huge Message. WSO2 ESB Smooks Mediator is a solution for this which integrates Smooks features with WSO2 ESB.

So our next step is writing a synapse configuration to fetch the file containing the incoming message through VFS transport and  mediate through the Smooks Mediator to get our task done.

Here is the synpase Configuration
<definitions xmlns="http://ws.apache.org/ns/synapse">
   <proxy name="SmooksSample" startonload="true" transports="vfs">
      <target>
         <insequence>
            <smooks config-key="smooks-key">
               <input type="xml" />
               <output type="xml"/>
            </smooks>
         </insequence>
      </target>
      <parameter name="transport.vfs.ActionAfterProcess">MOVE</parameter>
      <parameter name="transport.PollInterval">5</parameter>
      <parameter name="transport.vfs.MoveAfterProcess">file:///home/lakmali/dev/test/smooks/original</parameter>
      <parameter name="transport.vfs.FileURI">file:///home/lakmali/dev/test/smooks/in</parameter>
      <parameter name="transport.vfs.MoveAfterFailure">file:///home/lakmali/dev/test/smooks/original</parameter>
      <parameter name="transport.vfs.FileNamePattern">.*\.xml</parameter>
      <parameter name="transport.vfs.ContentType">application/xml</parameter>
      <parameter name="transport.vfs.ActionAfterFailure">MOVE</parameter>
   </proxy>
   <localentry key="smooks-key" src="file:repository/samples/resources/smooks/smooks-config-658.xml"></localentry>
   <sequence name="fault">
         <log level="full"/>
         <property name="MESSAGE" value="Executing default fault sequence"/>
         <property expression="get-property('ERROR_CODE')" name="ERROR_CODE"/>
         <property expression="get-property('ERROR_MESSAGE')" name="ERROR_MESSAGE"/>
         <drop/>
   </sequence>
   <sequence name="main">
      <log/>
      <drop/>
   </sequence>
</definitions>
Make sure to Change the VFS Transport Configuration Parameters.


transport.vfs.MoveAfterProcess - Move the input file to this location after processing
transport.vfs.FileURI - Input File location
transport.vfs.MoveAfterFailure - Move the input file to this location after a failure

Create a proxy service with the given synpase configuration. There is an available ESB sample with this configuration which you can run by executing the following command.

Go to ESB_HOME/bin
And run
./wso2esb-samples.sh -sn 658

Now drop the sample Huge Input file to transport.vfs.FileURI location.

Now check the destinationDirectoryPattern location where you can find the split file results of the huge file.

MKKP78F3XW2U
[1] http://www.smooks.org/mediawiki/index.php?title=V1.5:Smooks_v1.5_User_Guide#Processing_Huge_Messages_.28GBs.29
[2] http://www.smooks.org/mediawiki/index.php?title=V1.5:Smooks_v1.5_User_Guide#Java_Binding

Comments

  1. Hi lakmali,

    I am following your blog, i am working on the Wso2 ESB and DataService Server.

    While i am running the CURL command on terminal side, with respect to ESB,
    The JSON look like as follows.

    {"Body":{"Id":0,"Body":[{"UserGroupId":-1,"UserGroupCode":"NONE","UserGroupName":"NONE","UserGroupCreatedById":-1,"UserGroupCreatedOn":"/Date(1340024633000)/","UserGroupModifiedById":-1,"UserGroupModifiedOn":"/Date(1340024633000)/","UserGroupSortOrder":9999,"UserGroupStatus":1,"UserGroupVersion":1,"UserGroupSourceType":1,"UserGroupDetailArray":[{"UserGroupDetailId":-1,"UserGroupId":-1,"UserGroupCode":"NONE","UserGroupName":"NONE","UserGroupDetailSlNo":1,"UserId":-1,"UserCode":"GBADMIN","UserName":"Administrator"},{"UserGroupDetailId":-149999779,"UserGroupId":-1799999942,"UserGroupCode":"SHSBCPLUM","UserGroupName":"HSBCMALAD_Plumber","UserGroupDetailSlNo":9,"UserId":-2147483640,"UserCode":"r2416","UserName":"AlexRupan"}]}]},"Current":"","ETag":"","First":"","From":"","Id":"","Last":"","Next":"","Previous":"","ReplyTo":"","Status":200,"To":"","Total":5}



    for this i tried in DSS using Nested queries it's not working.
    My Dss as follows.


    USCProduction



    select usergroupdetailid as UserGroupDetailId ,musergroup.usergroupid as UserGroupId,usergroupcode as UserGroupCode,usergroupname as UserGroupName,slno as UserGroupDetailSlNo,muser.userid as UserId,usercode as UserCode,username as UserName from muser join musergroupdetail on muser.userid= musergroupdetail.userid join musergroup on musergroupdetail.usergroupid=musergroup.usergroupid












    select musergroup.usergroupid as UserGroupId,usergroupcode as UserGroupCode,usergroupname as UserGroupName from musergroup join musergroupdetail on musergroup.usergroupid=musergroupdetail.usergroupid

















    @My question:-How can i create in Esb proxy service .

    Could you please help me.
    Thanks in Advance
    Anil

    ReplyDelete
  2. Hi Lakmali,


    Here i am sending the my GIT URL which contain my Query

    https://gist.github.com/anonymous/5300686

    Could you please help me.
    Thanks in Advance
    Anil

    ReplyDelete
  3. Hi,

    Im trying using smooks to process a huge file in 4.8.1. I have installed the patch for the smooks mediator but when I am processing a file of 3 Gb I get a java heap space.

    Too I follow the steps to configure Transferring large files of documentation: https://docs.wso2.com/display/ESB481/VFS+Transport

    Any suggest to fix it?

    Thank you

    ReplyDelete
    Replies
    1. Hi,
      It should be due to https://wso2.org/jira/browse/ESBJAVA-4229. We have fixed this issue and fix will be available in the next ESB release.

      Delete
  4. Hi, This is a very helpful article, thank you. However I am receiving the following error when i update my existing proxy with the proxy code you are recommending:
    "Proxy service requires a valid name"
    It does not seem to like the Definition or localentry tags. I am using a custom proxy at the moment. Is there a certain type of proxy I need to use that would resolve this error?

    ReplyDelete

Post a Comment

Popular posts from this blog

PHP-SOAP web service with out a WSDL

Boomi Mapping - Removing special chars from an input

Boomi Mapping - User Defined function based on list of elements in a collection