Monday, March 25, 2013

Huge Message Processing with WSO2 ESB Smooks Mediator


Smooks is a powerful framework for processing, manipulating and transforming XML and non XML data. WSO2 ESB supports executing Smooks features through 'Smooks Mediator'. 

One of the main features introduced in Smooks v1.0 is the ability to process huge messages (Gbs in size) [1]. Now with the WSO2 ESB 4.5.0 release (and later), Huge Message Processing feature is supported through Smooks Mediator!

Smooks supports three types of processing for huge messages which are,
1. one-to-one transformation
2. splitting and routing
3. persistence

This post shows how to process large input messages using Splitting and routing approach. 

Step 1: Create sample Huge Input file. 

This post assumes the input message is in the following format.


    
Joe
Pen 8.80 Book 8.80 Bottle 8.80 Note Book 8.80

You can write a simple java program to generate a file with large number of entries. 

FileWriter fw = new FileWriter("input-message.txt");
PrintWriter pw = new PrintWriter(fw);
       
        /*XML */
        pw.print("\n 
\n Joe\n
\n \n"); for(int i=0;i<=2000000;i++){ pw.print("\t\n\t\tPen\n\t\t8.80\n\t\n"); } pw.write(" \n
");

Step 2: Smooks Configuration 

Let's write the Smooks configuration to split and route the above message. When we are processing huge messages with Smooks, we should make sure to use the SAX filter.

The basic steps of this Smooks process are, 
1. Java Binding - Bind the input message to java beans
2. Templating - Apply a template which represents split message on input message elements
3. Routing - Route each split message

So for doing each of the above steps we need to use the relevant Smooks cartridges.

1. Java Binding

The Smooks JavaBean Cartridge allows you to create and populate Java objects from your message data [2]. We can map input message elements to real java objects by writing bean classes or to virtual objects which are Maps and Lists. Here we will be binding to virtual objects. In that way we can build complete object model without writing our own business classes.

Let's assume that we are going to split the input message such that one split message contains a single order item information (item-id, product, quantity, price) with the order information (order-id, customer-id, customer-name).

So we can define two beans in our smooks configuration;  order and orderItem.


    
 

 
 

     
     
     
     
     

 

     
     
     
     

      


2. Templating

Smooks Templating allows fragment-level templating using different templating solutions. Smooks supported templating technologies are FreeMarker and XSL templating. In here we are going to use FreeMarker templating solution.

Configuring FreeMarker templates in Smooks is done through the http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd configuration namespace. We can refer the message content in template definition through the java beans which we have defined in the above step.

There are two methods of FreeMarker template definitions. They are In line and External Template Reference. In this example let's use in-line templating.

First we need to decide the format of a single split message. Since we are going to split the input message such that one split message contains a single order-item information (item-id, product, quantity, price) with the order information (order-id, customer-id, customer-name), it will look as follows.

The java object model we had populated above is been used in template definition.

         
 
    
           ${order.customerName}
           ${order.customerNumber?c}
    
       
${order.orderItem.product} ${order.orderItem.quantity} ${order.orderItem.price}


Let's add the templating configuration to our smooks configuration.


    
 

 
 

     
     
     
     
     

 

     
     
     
     

      


  
  
  
       
       
  



Please note that using <ftl:outputto>, you can direct Smooks to write the templating result directly to an OutputStreamResource.

 3. Routing

So far we have defined the bean model of the message, then defined the template of a single split message. Now we have to continue smooks configuration to route each message fragment to an endpoint. These endpoints can be file, database or JMS endpoints.

In this sample let's route the message fragments to file locations. As in the above step we defined the outputTo element to write to orderItemSplitStream resource, lets add outputStream named orderItemSplitStream to our smooks configuration.

We need to define following attributes when defining the outputStream

fileNamePattern

Can be composed by referring java object model we created. The composing name should be a unique name for each message fragment.

destinationDirectoryPattern

Destination where files should be created.

highWaterMark

Maximum number of files that can be created in the directory. This should be increased according to the input message size.


    
 

 
 

     
     
     
     
     

 

     
     
     
     

      


  
  
  
       
       
  




     order-${order.orderId}-${order.orderItem.itemId}.xml
     
          /home/lakmali/dev/test/smooks/orders
     
     

 


Step 3: Process with WSO2 ESB Smooks Mediator

Now we have finished writing the smooks configuration which will split and route an incoming message. So now we need to get this executed against our Huge Message. WSO2 ESB Smooks Mediator is a solution for this which integrates Smooks features with WSO2 ESB.

So our next step is writing a synapse configuration to fetch the file containing the incoming message through VFS transport and  mediate through the Smooks Mediator to get our task done.

Here is the synpase Configuration
<definitions xmlns="http://ws.apache.org/ns/synapse">
   <proxy name="SmooksSample" startonload="true" transports="vfs">
      <target>
         <insequence>
            <smooks config-key="smooks-key">
               <input type="xml" />
               <output type="xml"/>
            </smooks>
         </insequence>
      </target>
      <parameter name="transport.vfs.ActionAfterProcess">MOVE</parameter>
      <parameter name="transport.PollInterval">5</parameter>
      <parameter name="transport.vfs.MoveAfterProcess">file:///home/lakmali/dev/test/smooks/original</parameter>
      <parameter name="transport.vfs.FileURI">file:///home/lakmali/dev/test/smooks/in</parameter>
      <parameter name="transport.vfs.MoveAfterFailure">file:///home/lakmali/dev/test/smooks/original</parameter>
      <parameter name="transport.vfs.FileNamePattern">.*\.xml</parameter>
      <parameter name="transport.vfs.ContentType">application/xml</parameter>
      <parameter name="transport.vfs.ActionAfterFailure">MOVE</parameter>
   </proxy>
   <localentry key="smooks-key" src="file:repository/samples/resources/smooks/smooks-config-658.xml"></localentry>
   <sequence name="fault">
         <log level="full"/>
         <property name="MESSAGE" value="Executing default fault sequence"/>
         <property expression="get-property('ERROR_CODE')" name="ERROR_CODE"/>
         <property expression="get-property('ERROR_MESSAGE')" name="ERROR_MESSAGE"/>
         <drop/>
   </sequence>
   <sequence name="main">
      <log/>
      <drop/>
   </sequence>
</definitions>
Make sure to Change the VFS Transport Configuration Parameters.


transport.vfs.MoveAfterProcess - Move the input file to this location after processing
transport.vfs.FileURI - Input File location
transport.vfs.MoveAfterFailure - Move the input file to this location after a failure

Create a proxy service with the given synpase configuration. There is an available ESB sample with this configuration which you can run by executing the following command.

Go to ESB_HOME/bin
And run
./wso2esb-samples.sh -sn 658

Now drop the sample Huge Input file to transport.vfs.FileURI location.

Now check the destinationDirectoryPattern location where you can find the split file results of the huge file.

MKKP78F3XW2U
[1] http://www.smooks.org/mediawiki/index.php?title=V1.5:Smooks_v1.5_User_Guide#Processing_Huge_Messages_.28GBs.29
[2] http://www.smooks.org/mediawiki/index.php?title=V1.5:Smooks_v1.5_User_Guide#Java_Binding

2 comments:

  1. Hi lakmali,

    I am following your blog, i am working on the Wso2 ESB and DataService Server.

    While i am running the CURL command on terminal side, with respect to ESB,
    The JSON look like as follows.

    {"Body":{"Id":0,"Body":[{"UserGroupId":-1,"UserGroupCode":"NONE","UserGroupName":"NONE","UserGroupCreatedById":-1,"UserGroupCreatedOn":"/Date(1340024633000)/","UserGroupModifiedById":-1,"UserGroupModifiedOn":"/Date(1340024633000)/","UserGroupSortOrder":9999,"UserGroupStatus":1,"UserGroupVersion":1,"UserGroupSourceType":1,"UserGroupDetailArray":[{"UserGroupDetailId":-1,"UserGroupId":-1,"UserGroupCode":"NONE","UserGroupName":"NONE","UserGroupDetailSlNo":1,"UserId":-1,"UserCode":"GBADMIN","UserName":"Administrator"},{"UserGroupDetailId":-149999779,"UserGroupId":-1799999942,"UserGroupCode":"SHSBCPLUM","UserGroupName":"HSBCMALAD_Plumber","UserGroupDetailSlNo":9,"UserId":-2147483640,"UserCode":"r2416","UserName":"AlexRupan"}]}]},"Current":"","ETag":"","First":"","From":"","Id":"","Last":"","Next":"","Previous":"","ReplyTo":"","Status":200,"To":"","Total":5}



    for this i tried in DSS using Nested queries it's not working.
    My Dss as follows.


    USCProduction



    select usergroupdetailid as UserGroupDetailId ,musergroup.usergroupid as UserGroupId,usergroupcode as UserGroupCode,usergroupname as UserGroupName,slno as UserGroupDetailSlNo,muser.userid as UserId,usercode as UserCode,username as UserName from muser join musergroupdetail on muser.userid= musergroupdetail.userid join musergroup on musergroupdetail.usergroupid=musergroup.usergroupid












    select musergroup.usergroupid as UserGroupId,usergroupcode as UserGroupCode,usergroupname as UserGroupName from musergroup join musergroupdetail on musergroup.usergroupid=musergroupdetail.usergroupid

















    @My question:-How can i create in Esb proxy service .

    Could you please help me.
    Thanks in Advance
    Anil

    ReplyDelete
  2. Hi Lakmali,


    Here i am sending the my GIT URL which contain my Query

    https://gist.github.com/anonymous/5300686

    Could you please help me.
    Thanks in Advance
    Anil

    ReplyDelete