Discuss New Concept,New Technic,New Tools, Including EAI,BPM,SOA,Tibco,IBM MQ,Tuxedo, Cloud,Hadoop,NoSQL,J2EE,Ruby,Scala,Python, Performance,Scalability,Distributed,HA, Social Network,Machine Learning.

March 22, 2011

Hardeep Singh
(hardeep@us.ibm.com), Advanced Technologies Development, IBM, Software Group

Summary: This article is the second of a two-part series about the InfoSphere™ MashupHub, part of the IBM® Mashup Center. Part 1 discussed the product architecture and its tools and utilities, and introduced a simple use-case scenario. Now, in Part 2, you’ll explore the tools more deeply and extend the use-case scenario to showcase the different components and illustrate the advantages of using Web 2.0 concepts, such as data feeds and feed mashups in an enterprise.

View more content in this series

Introduction

Individual streams of business information in the form of XML feeds are interesting in their own right. However, it is only when you start amalgamating multiple feeds to solve a business problem that the real power of the Web 2.0 mashup really becomes apparent. Information integration solutions that eluded enterprises before due to lack of common data infrastructure suddenly seem achievable. Instead of needing dedicated applications and buy-ins from huge hierarchies, individuals can now leverage cross-corporate information to solve a business problem.

Feed mashups

In order to enable this feed fusion, there needs to be an infrastructure in which data from diverse feeds can be cached, filtered, joined, sorted, and transformed. IBM Mashup Center’s Feed Mashup Editor and Feed Mashup Engine provide this infrastructure.
Figure 1. Feed Mashup architecture

Since there is no standardized language for creating mashups, it is up to the individual engine to define the format in which a mashup application is implemented. The IBM Mashup Center’s Feed Mashup Engine defines a number of operations that can be performed on feed data in a mashup. These operations are exposed to a mashup developer through an XML-based model.

Mashup Editor Client

The Mashup Editor Client provides the integrated development environment (IDE) to create mashup applications by graphically exposing the operations to developers. This article focuses mostly on the Mashup Editor Client by showcasing operator usage and functionality through an easy to understand use case scenario.

Model

The client mashup is in XML and is maintained and modified using an in-memory DOM model. Any changes made to the mashup are automatically reflected in the model and the GUI is then updated to reflect the changes.

At any time, the mashup model can be viewed by clicking on the canvas and then pressing CTRL+M. This brings up the Mashup Model View dialog that visualizes the client XML model either as a hierarchy or serialized to a string format.

The mashup flow information is located in the /mashup/flow branch, while the feed data is stored in the /mashup/data branch.

Note: In all references to paths in the article, namespaces are ignore to increase readability
Figure 2. Mashup Editor Client XML model

Preview

When the Preview tab is clicked, the model is serialized and sent to the IBM Mashup Center’s MashupHub server. Here the client XML model is transformed to the engine XML model before it is processed by the feed mashup engine.
Listing 1. Preview XML

  development 

The resulting XML output of the operator being previewed is returned to the client where it is stored in the client model.

When a source URL is loaded, two calls are made to the server. The first call returns the actual data representation of the source which is stored at /mashup/data/sourcexml.

The editor tries to analyze this data to figure out the feed type (for example RSS/ATOM/XML). It then makes a second call to the server with the feed type set in the model. The engine returns an ATOM representation of the feed result, which is stored at /mashup/data/feed in the client model.
Figure 3. Client model source data cache

Note: All the other operators make a single call and the output result, which is in an ATOM format, is stored in the /mashup/data/feed path. A maximum of 50 entries are returned in a preview call. There is no maximum limit for the data returned in the source operator’s first load call.

After the editor makes a guess at the feed type, it automatically sets the path for the repeating element that defines the business data in the source. It is also possible to manually identify the repeating data element so that only that element is added to each entry of the preview result.

For example, in the sales lead feed result, only the data in the row element and its descendants is needed for the mashup.
Figure 4. Feed result with default repeating element

If you look at the source element in the client model, you will notice that the path for the repeating element that was set automatically for the second call is /feed/entry.
Figure 5. Client model with default repeating element

In order to fetch only the row elements in the feed entries, you need to change the repeating element path to/feed/entry/content/row. This is done on the Advanced tab using the drop-down tree representing the original feed data (/mashup/data/sourcexml).

When the XML feed type is selected from Feed Type drop-down box the Repeating Element drop-down becomes enabled and the element path can be selected from the tree. It is also possible to manually modify this path as long as you take care of the namespace prefixes.
Figure 6. Set a new repeating element on the Advanced tab

The underlying model is updated to reflect the new path for the repeating element in the source.
Figure 7. Client model reflects changes to the repeating element path set on the Advanced tab

When the source data is previewed again, the XML generated for the feed mashup engine now reflects the new repeating element and feed type.
Listing 2. XML preview with repeating element and feed type

  development 

The result of the new preview call to the engine has each entry in the result set containing only the row element and its descendants. The main advantage of this is that when this source feed entry is used in other operators it is less cluttered with irrelevant data hierarchies.
Figure 8. Source operator output with repeating element set to row

Scenario

The scenario used in Part 1, for creating a mashup to better forecast inventory requirements based on sales leads, was intentionally kept bare bones. Only a single sales lead was considered and the resulting mashup feed calculated the inventory requirement for each product sale to a customer.
Figure 9. Simple use case mashup of Part 1

The scenario for this article extends the previous one to make the mashup feed more useful to the product manager and in the process also incorporates most of the operators. It considers sales leads from all sales executives and calculates the combined inventory requirement for a given product for all customers.

Combine all sales leads into a single output

The original scenario required all the sales executives to create feeds from their sales leads spreadsheets and allow the product manager to access the relevant information. In Part 1, for simplicity sake, only one sales feed was considered, but the enhanced scenario for this article considers all the sales leads available.

The next step is to combine all the different sales leads sources into a single output.

Note: This step is equivalent to doing a UNION in a SQL query.

Drag the combine operator from the palette and plug the outputs of each sales leads source into the socket of the combine operator.
Figure 10. Combine operator used to append all sales feeds into a single output

The combine operator requires two or more inputs, the merge and foreach operators require two inputs, all the other operators accept only a single input, while the source operator does not accept any input. The four connected inputs for the combine operator are reflected in the client model.
Figure 11. Client model shows multiple inputs of combine operator

Entries from each source are unconditionally appended to the output of the combine operation and the result contains the total entries in the input sources.
Figure 12. Output of combine operator contains all entries

Group entries by product ID

From the previous picture, it can be seen that the output of the combine operation can have multiple entries for each product. To calculate the inventory requirements for an individual product for all tentative customers, it is necessary that all sales for a given product are grouped together in a single entry.

Drag and drop a group operator to the canvas and connect the output of the combine operator to the socket of the group operator.

The next step is to identify the key path that is used to group the sales entries. This is done by clicking on the Group elementdrop-down box and selecting the path to row/col_C/text() node (which points to the product ID value of the entry).

Select the row element path from the Associated Data Element drop-down box to indicate the element that will be added to the group.
Figure 13. Group entries by product ID

Modifications made to the group operator are applied to the underlying client model. Notice that each row in the group operator grid is represented as a param element in the model, and the path information is stored in the text node of the element. The param element is used to store any text/variable/path data in the model and the type of data is defined by the type attribute. The operatorspecific attribute is overloaded and it is interpreted based on the containing operator type.
Figure 14. Client model shows the parameter values for the group operator

To see the result of the group operation, go to the Preview tab. All rows for a given product ID are grouped together under the nestval element.
Figure 15. Result of grouping by product ID

Transform output to aggregate quantities for each sale

Only the total units ordered for each product is relevant for calculating the inventory requirements, so you can get rid of the extraneous information by using a transform operator.

Drag and drop a transform operator to the canvas and connect the output of the group operator to it.

The left side Input tree shows a single entry from the output of the group operator while the right side tree defines the template for the Output of the transformation. You can add new elements or attributes to the Output tree by either dragging and dropping them from the Input tree or by using the tree’s context menu.

Create a new element called product in the Input tree and add to it an attribute pid. Now drag and drop entry/groupval text node to the attribute.

Note: If you are having any problems with the drag and drop, you can also use the context menu for the Input tree. First select the target node in the Output tree, then right-click on the source node in the Input tree and select Copy to the output tree.
Figure 16. Create new output format containing product ID and total_units for grouped feeds using transform operator

Functions

In order to aggregate the total units to be manufactured for each product, add a summation function to the output tree.

First, create a new element total_units under the product element, and then add a function to this element by selecting Specify a function value in the context menu.

In the function’s dialog, select the aggregate function Sum from the drop-down list of functions. A property grid listing all the parameters for the function appears under the function selection box.

In the Value cell of the expression parameter for the function, you can either enter a text string, specify a path in the Input from where to pick the value at run time, get the result from an embedded function, or fetch the value from a variable at run time.

Since you need to add all quantities in one entry you have to specify the path to the quantity value in the input. To do this, select the Specify values from the Input tree option in the Value drop-down box. In the Input tree dialog, select the path to the quantity node (text node for col_E).
Figure 17. Add an aggregate function to the transform output to sum all units of one product

When you try to preview the result of the transformation, you get a server exception indicating that a non-numeric data unit was found at the specified path.
Figure 18. Server error due to no numeric data in the input feed

On previewing the output of the combine operator, you can see that the extraneous data is caused by the header row that was specified while creating the sales leads feed from an Excel spreadsheet (refer to Part 1‘s section “Feed from a spreadsheet”).
Figure 19. Reason for the illegal data is the header information in each sales feed

Filter out erroneous data

In order to aggregate the number of units ordered for any product, it is necessary to first remove all non-numeric entries in that path from the output of the combine operator. This requires the output of the combine to be filtered before it is passed to the group operator.

Drag and drop a filter operator to the canvas. Disconnect the output of the combine operator from the group operator and connect it to the filter operator. Connect the output of the filter operator to the input of the group operator

In the filter operator, add a condition to select entries that do not contain the word “units” in the value of the quantities node (col_E).
Figure 20. Filter the combined sales feed to remove entries containing non-numeric characters in the units node

If you look at this operator in the client model, you will notice that a condition element has been added to it and the operation attribute for the condition is set to notcontains. Also, the text value of the first param contains the xpath location of the quantities node while the text value of the second param contains the string “units.”
Figure 21. Client model reflects the filter condition for removing non-numeric data

Important: A consequence of disconnecting the input of the group operator was that all the conditions and parameters set in the group operator or its dependent operators became invalid, so they need to be recreated.

Once all the action from the group operator onward have been redone, if the preview for the transform operator is clicked, the result contains a single entry for each product that has the product ID and the total number of units ordered.
Figure 22. Once the non-numeric data is removed, the combined sales feed can be properly transformed

Use product ID to get product details

The next step is to get the detailed information for each product in the sales list. To do this you first need to load the product source from the catalog and then add a filter to it. This step is defined in Part 1 (see the “Create a filter on the product feed” section of that article).

Get details for each product in the sales list

For each transformed entry in the combined sales list, fetch the details information about the product using the product ID as the key.

This step is computed by looping over each entry in the combined sales list (master list) and getting the details for each product from the filtered product feed. The product ID (/entry/product/pid) from the combined sales list entry is passed to the pid variable of the get one product filter to select details for that product.
Figure 23. Use the foreach operator to get product details for each entry in the transformed sales list

The result of this operation is an entry for each unique product in the sales list that contains the total units ordered as well as details information about the product.
Figure 24. Results of the foreach operation adds product details to each transformed sales entry

Transform list to get silver needed for each product

In order to determine the silver needed for each ordered product, you need to calculate the total weight of all the units ordered for that product.

Once again, you use the transform operator to filter out the irrelevant data as well as compute the combined weight of each product.

Since you need to multiply the total units with the weight of one unit, you select the Numeric Multiply function from the functions list. For the first parameter, you select the path to the total_units node and for the second parameter, select the path to theweight node.
Figure 25. Use the transform operator to calculate the total weight of all the units ordered for each product

The resulting transformation now contains the total weight of all units ordered for a given product. This information can now be used to calculate the raw material requirements.
Figure 26. Results of the transformation shows the total weight of all units ordered for each product

Sort the output in ascending order of product ID

You can sort the output of the transform so that the final feed result lists the entries in a an ascending order, based on the product IDs.

Drag and drop a sort operator and connect the output of the transform operator to its input.

Select the pid attribute from the drop-down tree. This provides the path whose values is used for sorting the entries.
Figure 27. Sort the result by product ID

Set feed type related information in the Publish operator

Finally, connect the output of the sort operator to the publish (endpoint) operator.

Select the feed type to be RSS and enter the header information needed for the RSS feed.
Figure 28. Set the mashup feed type

Click Run to display the mashup feed output in your browser.
Figure 29. Feed result from invoking the mashup URL

Modify the mashup to try these out

• Calculate inventory requirement for each Item by a specified date
• Calculate packaging type and quantity required (hint use the size)

Use variables to dynamically bind run time parameters

In many cases, there is a need to pass a parameter value dynamically at run time. For example, you used a variable in the “get one product filter” condition to return details of a product whose ID was passed at run time to the operator.

If it was a SQL query, it would be equivalent to the following parameterized query:

 Select * from product where pid=? 

If it was a program, it would be equivalent to a function in which the variable value is passed in an argument:

 Object get_one_product(String pid) { Hashtable product_list = call_get_product_list_hashtable_function(); return list.get(pid); } 

During run time there are two ways that a variable defined in the mashup is initialized:

• Initialize the variable of the inner feed (details) of a foreach operation with data from the outer (master) feed. This was done in this scenario when you added product details to each product in the sales list. The following pseudo code explains the process of variable binding in the foreach operation.
 for(i=0; masterlist.length >i ;i++) { String pid = masterlist[i].getpid (); Object productdetails =get_one_product(pid); masterlist[i].adddetails(productdetails) } 
• Passing the value as a parameter to the mashup URL. For example, if you had created the inner (product details) part of your mashup as a separate feed, then the URL of this product details mashup would have exposed the pid variable as one of its parameter.

The feed URL for the mashup exposes the pid variable as a parameter with its value set to the default value defined while creating the variable.

http://localhost:8080/mashuphub/client/plugin/generate/entryid/79/pluginid/10?pid=100-101-10

If you click on the View Feed link, the description string defined in the variable is used as a prompt for changing the pid value in the URL parameter.

Note: The default value is already set in the parameter field.
Figure 32. When the feed is viewed the user is prompted to set the variable value

Scope of the variable

Variable creation is done in the same process that is used to assign them to a parameter or condition (by selecting the Use a variable to return the value option in the condition/parameter grid Value drop-down box).

By default, the variable that is created is automatically selected for assigning to the condition, but it is possible to create multiple variables at one time and assign any or none of them to that parameter/condition value of the operator/function).

This means that even though a variable is created in one operator, its scope is based on where it is assigned. This fact also decides if it is exposed externally (parameterized in the mashup URL) or is set internally.

The decision on whether to parameterize the variable in the mashup URL or not is based on the fact that it is assigned a value in a foreach loop or not:

• If a variable associated with the details input of the foreach operator is initialized with data from the master input then the variable is not added as a parameter to the mashup URL.
• If a variable is not associated with any parameter or condition in the mashup it is ignored.
• All other variables that are associated with some parameter or condition in any operator in the mashup are parameterized in the mashup URL. This enables consumers of the mashup to dynamically alter its results by changing the parameter values in the mashup URL.

All variables that are created in the mashup are added as parameters to the model at the /mashup/flow/variables path.
Figure 33. All variables are stored in the client model under the /mashup/flow/variables element

Casting data type

In the absence of a schema, XML data values are treated as strings. This fact did not affect the sort result in this scenario since the product ID (pid) is of string type. However, in many other situations (like sorting by date) it becomes necessary to cast the value at the selected path to a specific data type. This can be done by using the context menu of the drop-down tree.

Right-click on the node you want to select and pick the data type from the menu option.

Note: This step does not close the drop-down, as setting a data type does not implicitly constitute the selection of the node.

The data type casting is only local to the operator where you set it and the type information is used when the engine mashup XML is generated to cast the value at the specified path to the data type indicated.

Note: Although data type casting is primarily useful for sorting, it can also be applied in other operators.
Figure 34. Data type casting

Summary

In the Web 2.0 world, if you consider the analogy of the Intranet being a database of loosely coupled XML feed data sources, then the feed mashup engine is the equivalent to the database engine and the feed mashup editor is equivalent to a SQL query builder. The holy grail of simple information sharing and integration across organizational and department boundaries appears to be within reach.

Resources

Learn

Get products and technologies

• IBM Mashup Center: ind an easy to use business mashup solution, supporting line of business assembly of dynamic situational applications â€” with the management, security, and governance capabilities IT requires.