[repost ]Building A Social Music Service Using AWS, Scala, Akka, Play, MongoDB, And Elasticsearch


This is a guest repost by Rotem Hermon, former Chief Architect for serendip.me, on the architecture and scaling considerations behind making a startup music service.

serendip.me is a social music service that helps people discover great music shared by their friends, and also introduces them to their “music soulmates” – people outside their immediate social circle that shares a similar taste in music.

Serendip is running on AWS and is built on the following stack: scala (and some Java), akka(for handling concurrency), Play framework (for the web and API front-ends), MongoDB andElasticsearch.

Choosing The Stack

One of the challenges of building serendip was the need to handle a large amount of data from day one, since a main feature of serendip is that it collects every piece of music being shared on Twitter from public music services. So when we approached the question of choosing the language and technologies to use, an important consideration was the ability to scale.

The JVM seemed the right basis for our system as for its proven performance and tooling. It’s also the language of choice for a lot of open source system (like Elasticsearch) which enablesusing their native clients – a big plus.

When we looked at the JVM ecosystem, scala stood out as an interesting language option that allowed a modern approach to writing code, while keeping full interoperability with Java. Another argument in favour of scala was the akka actor framework which seemed to be a good fit for a stream processing infrastructure (and indeed it was!). The Play web framework was just starting to get some adoption and looked promising. Back when we started, at the very beginning of 2011, these were still kind of bleeding edge technologies. So of course we were very pleased that by the end of 2011 scala and akka consolidated to become Typesafe, with Play joining in shortly after.

MongoDB was chosen for its combination of developer friendliness, ease of use, feature set and possible scalability (using auto-sharding). We learned very soon that the way we wanted to use and query our data will require creating a lot of big indexes on MongoDB, which will cause us to be hitting performance and memory issues pretty fast. So we kept using MongoDB mainly as a key-value document store, also relying on its atomic increments for several features that required counters.
With this type of usage MongoDB turned out to be pretty solid. It is also rather easy to operate, but mainly because we managed to avoid using sharding and went with a single replica-set (the sharding architecture of MongoDB is pretty complex).

For querying our data we needed a system with full blown search capabilities. Out of the possible open source search solutions, Elasticsearch came as the most scalable and cloud oriented system. Its dynamic indexing schema and the many search and faceting possibilities it provides allowed us to build many features on top of it, making it a central component in our architecture.

We chose to manage both MongoDB and Elasticsearch ourselves and not use a hosted solution for two main reasons. First, we wanted full control over both systems. We did not want to depend on another element for software upgrades/downgrades. And second, the amount of data we process meant that a hosted solution was more expensive than managing it directly on EC2 ourselves.

Some Numbers

Serendip’s “pump” (the part that processes the Twitter public stream and Facebook user feeds) digests around 5,000,000 items per day. These items are passed through a series of “filters” that detect and resolve music links from supported services (YouTube, Soundcloud, Bandcamp etc.), and adds metadata on top of them. The pump and filters are running as akka actors, and the whole process is managed by a single m1.large EC2 instance. If needed it can be scaled easily by using akka’s remote actors to distribute the system to a cluster of processors.

Out of these items we get around 850,000 valid items per day (that is items that really contains relevant music links). These items are indexed in Elasticsearch (as well as in MongoDB for backup and for keeping counters). Since every valid item means updating several objects, we get an index rate of ~40/sec in Elasticsearch.
We keep a monthly index of items (tweets and posts) in Elasticsearch. Each monthly index contains ~25M items and has 3 shards. The cluster is running with 4 nodes, each on a m2.2xlarge instance. This setup has enough memory to run the searches we need on the data.

Our MongoDB cluster gets ~100 writes/sec and ~300 reads/sec as it handles some more data types, counters and statistics updates. The replica set has a primary node running on am2.2xlarge instance, and a secondary on a m1.xlarge instance.

Building A Feed

When we started designing the architecture for serendip’s main music feed, we knew we wanted the feed to be dynamic and reactive to user actions and input. If a user gives a “rock-on” to a song or “airs” a specific artist, we want that action to reflect immediately in the feed. If a user “dislikes” an artist, we should not play that music again.

We also wanted the feed to be a combination of music from several sources, like the music shared by friends, music by favorite artists and music shared by “suggested” users that have the same musical taste.
These requirements meant that a “fan-out-on-write” approach to feed creation will not be the way to go. We needed an option to build the feed in real-time, using all the signals we have concerning the user. The set of features Elasticsearch provides allowed us to build this kind of real-time feed generation.

The feed algorithm consists of several “strategies” for selecting items which are combined dynamically with different ratios on every feed fetch. Each strategy can take into account the most recent user actions and signals. The combination of strategies is translated to several searches on the live data that is constantly indexed by Elasticsearch. Since the data is time-based and the indexes are created per month, we always need to query only a small subset of the complete data.

Fortunately enough, Elasticsearch handles these searches pretty well. It also provides a known path to scaling this architecture – writes can be scaled by increasing the number of shards. Searches can be scaled by adding more replicas and physical nodes.

The process of finding “music soulmates” (matching users by musical taste) is making good use of the faceting (aggregation) capabilities of Elasticsearch. As part of the constant social stream processing, the system is preparing data by calculating the top shared artists for social network users it encounters (using a faceted search on their shared music).

When a serendip user gives out a signal (either by airing music or interacting with the feed), it can trigger a re-calculating of the music soulmates for that user. The algorithm finds other users that are top matched according to the list of favorite artists (which is constantly updated), weighing in additional parameters like popularity, number of shares etc. It then applies another set of algorithms to filter out spammers (yes, there are music spammers…) and outliers.

We found out that this process gives us good enough results while saving us from needing additional systems that can run more complex clustering or recommendation algorithms.

Monitoring And Deployment

Serendip is using ServerDensity for monitoring and alerting. It’s an easy to use hosted solution with a decent feature set and reasonable pricing for start-ups. ServerDensity natively provides server and MongoDB monitoring. We’re also making heavy use of the ability to report custom metrics into it for reporting internal system statistics.

An internal statistic collection mechanism collects events for every action that happens in the system, and keeps them in a MongoDB collection. A timed job reads those statistics from MongoDB once a minute and reports them to ServerDensity. This allows us to use ServerDensity for monitoring and alerting Elasticsearch as well as our operational data.

Managing servers and deployments is done using Amazon Elastic Beanstalk. Elastic Beanstalk is AWS’s limited PaaS solution. It’s very easy to get started with, and while it’s not really a full featured PaaS, its basic functionality is enough for most common use cases. It provides easy auto-scaling configuration and also gives complete access via EC2.

Building the application is done with a Jenkins instance that resides on EC2. The Play web application is packaged as a WAR. A post-build script pushes the WAR to Elastic Beanstalk as a new application version. The new version is not deployed automatically to the servers – it’s done manually. It is usually deployed first to the staging environment for testing, and once approved is deployed to the production environment.


For conclusion, here are some of the top lessons learned from building serendip, not by any special order.

  1. Know how to scale. You probably don’t need to scale from the first day, but you need to know how every part of your system can scale and to what extent. Give yourself enough time in advance if scaling takes time.
  2. Prepare for peaks. Especially in the life of a start-up, a single lifehacker or reddit post can bring your system down if you’re always running at near top capacity. Keep enough margin so you can handle a sudden load or be ready to scale really fast.
  3. Choose a language that won’t hold you back. Make sure the technologies you want to use have native clients in your language, or at least actively maintained ones. Don’t get stuck waiting for library updates.
  4. Believe the hype. You want a technology that will grow with your product and will not die prematurely. A vibrant and active community and some noise about the technology can be a good indication for its survival.
  5. Don’t believe the hype. Look for flame posts about the technology you’re evaluating. They can teach you about its weak points. But also don’t take them too seriously, people tend to get emotional when things don’t work as expected.
  6. Have fun. Choose a technology that excites you. One that makes you think “oh this is so cool what can I do with it”. After all, that’s (also) what we’re here for.

[repost ]Servlet 3.0 实战:异步 Servlet 与 Comet 风格应用程序


自 JSR 315 规范(即 Servlet 3.0)的草案公开发布以来,最新一代 Servlet 规范的各种新特性被越来越多的开发人员所关注。规范中提到的一系列高级目标:如可插拔的 Web 框架、便捷开发特性、增强安全性支持等都令人期待。但其中关注程度最高的,毫无疑问是异步 Servlet。本文将详细介绍 Comet 风格应用的实现方式,以及 Servlet 3.0 中的异步处理特性在 Comet 风格程序中的实际应用。


作为 Java EE 6 体系中重要成员的 JSR 315 规范,将 Servlet API 最新的版本从 2.5 提升到了 3.0,这是近 10 年来 Servlet 版本号最大的一次升级,此次升级中引入了若干项令开发人员兴奋的特性,如:

  • 可插拔的 Web 架构(Web framework pluggability)。
  • 通过 Annotations 代替传统 web.xml 配置文件的 EOD 易于开发特性(ease of development)。
  • Serlvet 异步处理支持。
  • 安全性提升,如 Http Only Cookies、login/logout 机制。
  • 其它改进,如文件上传的直接支持等。

其中,在开源社区中讨论得最多的就是 Servlet 异步处理的支持,所谓 Servlet 异步处理,包括了非阻塞的输入/输出、异步事件通知、延迟 request 处理以及延迟 response 输出等几种特性。这些特性大多并非 JSR 315 规范首次提出,譬如非阻塞输入/输出,在 Tomcat 6.0 中就提供了 Advanced NIO 技术以便一个 Servlet 线程能处理多个 Http Request,Jetty、GlassFish 也曾经有过类似的支持。但是使用这些 Web 容器提供的高级特性时,因为现有的 Servlet API 没有对这类应用的支持,所以都必须引入一些 Web 容器专有的类、接口或者 Annotations,导致使用了这部分高级特性,就必须与特定的容器耦合在一起,这对很多项目来说都是无法接受的。因此 JSR 315 将这些特性写入规范,提供统一的 API 支持后,这类异步处理特性才真正具备广泛意义上的实用性,只要支持 Servlet 3.0 的 Web 容器,就可以不加修改的运行这些 Web 程序。

JSR 315 中的 Servlet 异步处理系列特性在很多场合都有用武之地,但人们最先看到的,是它们在“服务端推”(Server-Side Push)方式 —— 也称为 Comet 方式的交互模型中的价值。在 JCP(Java Community Process)网站上提出的 JSR 315 规范目标列表,关于异步处理这个章节的标题就直接定为了“Async and Comet support”(异步与 Comet 支持)。

本文将详细介绍 Comet 风格应用的实现方式,以及 Servlet 3.0 中的异步处理特性在 Comet 风格程序中的实际应用。

经典 Request-Response 交互模型的突破

“Comet 技术”、“服务端推技术(Server-Side Push)”、“反向 Ajax 技术”这几个名称说的是同一件事情,可能您已经听说过其中的一项或者几项。但没听说过也没有关系,一句话就足以表达它们全部的意思:“在没有客户端请求的情况下,服务端向客户端发送数据”。

这句话听起来很简单很好理解,但是任何一个长期从事 B/S 应用程序开发的程序都清楚,这实现起来并不简单,甚至很长一段时间内,人们认为这是并不可能的。因为这种做法完全不符合传统基于 HTTP 协议的交互思想:只有基于 Socket 层次的应用才能做到 Server 和 Client 端双方对等通讯,而基于 HTTP 的应用中,Server 只是对来自 Client 的请求进行回应,不关心客户端的状态,不主动向客户端请求信息,因此 Http 协议被称为无状态、单向性协议,这种交互方式称为 Request-Response 交互模型。

无状态、单向的经典 Request-Response 交互模型有很多优点,譬如高效率、高可伸缩等。对于被动响应用户请求为主的应用,像 CMS、MIS、ERP 等非常适合,但是对于另外一些需要服务端主动发送的需求,像聊天室(用户不发言的时候也需要把其它用户的发言传送回来)、日志系统(客户端没有请求,当服务端有日志输出时主动发送到客户端)则处理起来很困难,或者说这类应用根本不适合使用经典的 Request-Response 交互模型来处理。当“不适合”与“有需求”同时存在时,人们就开始不断寻找突破这种限制的方法。

Comet 实现的方法

  • 简单轮询最早期的 Web 应用中,主要通过 JavaScript 或者 Meta HTML 标签等手段,定时刷新页面来检测服务端的变化。显然定时刷新页面服务端仍然在被动响应客户端的请求,只不过客户端的请求是连续、频繁的,让用户看起来产生有服务端自动将信息发过来的错觉。这种方式简单易行,但缺陷也非常明显:可能大部分请求都是无意义的,因为服务端期待的事件没有发生,实际上并没有需要发送的信息,而不得不重复的回应着页面上所有内容给浏览器;另外就是当服务端发生变化时,并不能“实时”的返回,刷新的间隔太短,产生很大的性能浪费,间隔太长,事件通知又可能晚于用户期望的时间到达。

    当绝大部分浏览器提供了 XHR(XmlHttpRequest)对象支持后,Ajax 技术出现并迅速流行,这一阶段做的轮询就不必每次都返回都返回整个页面中所有的内容,如果服务端没有事件产生,只需要返回极少量内容的 http 报文体。Ajax 可以节省轮询传输中大量的带宽浪费,但它无法减少请求的次数,因此 Ajax 实现的简单轮询仍然有轮询的局限性,对其缺陷只能一定程度缓解,而无法达到质变。

  • 长轮询(混合轮询)长轮询与简单轮询的最大区别就是连接时间的长短:简单轮询时当页面输出完连接就关闭了,而长轮询一般会保持 30 秒乃至更长时间,当服务器上期待的事件发生,将会立刻输出事件通知到客户端,接着关闭连接,同时建立下一个连接开始一次新的长轮询。


  • Comet 流(Forever Frame)Comet 流是按照长轮询的实现思路进一步发展的产物。令长轮询将事件通知发送回客户端后不再关闭连接,而是一直保持直到超时事件发生才重新建立新的连接,这种变体我们就称为 Comet 流。客户端可以使用 XmlHttpRequest 对象中的 readyState 属性来判断是 Receiving 还是 Loaded。Comet 流理论上可以使用一个链接来处理若干次服务端事件通知,更进一步节省了发送到服务端的请求次数。

无论是长轮询还是 Comet 流,在服务端和客户端都需要维持一个比较长时间的连接状态,这一点在客户端不算什么太大的负担,但是服务端是要同时对多个客户端服务的,按照经典 Request-Response 交互模型,每一个请求都占用一个 Web 线程不释放的话,Web 容器的线程则会很快消耗殆尽,而这些线程大部分时间处于空闲等待的状态。这也就是为什么 Comet 风格服务非常期待异步处理的原因,希望 Web 线程不需要同步的、一对一的处理客户端请求,能做到一个 Web 线程处理多个客户端请求。

实战 Servlet 异步处理

当前已经有不少支持 Servlet API 3.0 的 Web 容器,如 GlassFish v3、Tomcat 7.0、Jetty 8.0 等,在本文撰写时,Tomcat 7 和 Jetty 8 都仍然处于测试阶段,虽然支持 Servlet 3.0,但是提供的样例代码仍然是与容器耦合的 NIO 实现,GlassFish v3 提供的样例(玻璃鱼聊天室)则是完全标准的 Servlet 3.0 实现,如果读者需要做找参考样例,不妨优先查看 GlassFish 的 example 目录。本文后一部分会提供另外一个更具备实用性的例子“Web 日志系统”作为 Servlet API 3.0 的实战演示进行讲解。

Web 日志系统实战

Apache Log4j 是当前最主流的日志处理器,它有许多不同的 Appender 可以将日志输出到控制台、文件、数据库、Email 等等。在大部分应用中用户都不可能查看服务器的控制台或者日志文件,如果能直接在浏览器上“实时”的查看日志将会是给开发维护带来方便,在本例中将实现一个日志输出到浏览器的 Appender 实现。

清单 1. Log4j 异步 Web Appender
 * 基于 AsyncContext 支持的 Appender 
 * @author zzm 
 public class WebLogAppender extends WriterAppender { 
     * 异步 Servlet 上下文队列
     public static final Queue<AsyncContext> ASYNC_CONTEXT_QUEUE 
     = new ConcurrentLinkedQueue<AsyncContext>(); 

     * AsyncContextQueue Writer 
     private Writer writer = new AsyncContextQueueWriter(ASYNC_CONTEXT_QUEUE); 

     public WebLogAppender() { 

     public WebLogAppender(Layout layout) { 
         super.layout = layout; 

上面是 Appender 类的代码模版,派生自 org.apache.log4j.WriterAppender,Log4j 默认提供的所有 Appender 都从此类继承,子类代码执行的逻辑仅仅是告知 WriterAppender 如何获取 Writer。而我们最关心的如何异步将日志信息输出至浏览器,则在 AsyncContextQueueWriter 中完成。

清单 2:异步上下文队列 Writer
 * 向一个 Queue<AsyncContext> 中每个 Context 的 Writer 进行输出
 * @author zzm 
 public class AsyncContextQueueWriter extends Writer { 

     * AsyncContext 队列
     private Queue<AsyncContext> queue; 

     * 消息队列
     private static final BlockingQueue<String> MESSAGE_QUEUE 
     = new LinkedBlockingQueue<String>(); 

     * 发送消息到异步线程,最终输出到 http response 流
     * @param cbuf 
     * @param off 
     * @param len 
     * @throws IOException 
     private void sendMessage(char[] cbuf, int off, int len) throws IOException { 
         try { 
             MESSAGE_QUEUE.put(new String(cbuf, off, len)); 
         } catch (Exception ex) { 
             IOException t = new IOException(); 
             throw t; 

     * 异步线程,当消息队列中被放入数据,将释放 take 方法的阻塞,将数据发送到 http response 流上
     private Runnable notifierRunnable = new Runnable() { 
        public void run() { 
            boolean done = false; 
            while (!done) { 
                String message = null; 
                try { 
                    message = MESSAGE_QUEUE.take(); 
                    for (AsyncContext ac : queue) { 
                        try { 
                            PrintWriter acWriter = ac.getResponse().getWriter(); 
                        } catch (IOException ex) { 
                } catch (InterruptedException iex) { 
                    done = true; 

     * @param message 
     * @return 
     private String htmlEscape(String message) { 
         return "<script type='text/javascript'>\nwindow.parent.update(\""
         + message.replaceAll("\n", "").replaceAll("\r", "") + "\");</script>\n"; 

     * 保持一个默认的 writer,输出至控制台
     * 这个 writer 是同步输出,其它输出到 response 流的 writer 是异步输出
     private static final Writer DEFAULT_WRITER = new OutputStreamWriter(System.out); 

     * 构造 AsyncContextQueueWriter 
     * @param queue 
     AsyncContextQueueWriter(Queue<AsyncContext> queue) { 
         this.queue = queue; 
         Thread notifierThread = new Thread(notifierRunnable); 

     public void write(char[] cbuf, int off, int len) throws IOException { 
         DEFAULT_WRITER.write(cbuf, off, len); 
         sendMessage(cbuf, off, len); 

     public void flush() throws IOException { 

     public void close() throws IOException { 
         for (AsyncContext ac : queue) { 

这个类是 Web 日志实现的关键类之一,它继承至 Writer,实际上是一组 Writer 的集合,其中包含至少一个默认 Writer 将数据输出至控制台,另包含零至若干个由 Queue<AsyncContext> 所决定的 Response Writer 将数据输出至客户端。输出过程中,控制台的 Writer 是同步的直接输出,输出至 http 客户端的则由线程 notifierRunnable 进行异步输出。具体实现方式是信息放置在阻塞队列 MESSAGE_QUEUE 中,子线程循环时使用到这个队列的 take() 方法,当队列没有数据这个方法将会阻塞线程直到等到新数据放入队列为止。

我们在 Log4j.xml 中修改一下配置,将 Appender 切换为 WebLogAppender,那对 Log4j 本身的扩展就算完成了:

清单 3:Log4j.xml 配置
   <appender name="CONSOLE" class="org.fenixsoft.log.WebLogAppender"> 
      <param name="Threshold" value="DEBUG"/> 
      <layout class="org.apache.log4j.PatternLayout"> 
         <!-- The default pattern: Date Priority [Category] Message\n --> 
         <param name="ConversionPattern" value="%d %p [%c] %m%n"/> 

接着,建立一个支持异步的 Servlet,目的是每个访问这个 Servlet 的客户端,都在 ASYNC_CONTEXT_QUEUE 中注册一个异步上下文对象,这样当有 Logger 信息发生时,就会输出到这些客户端。同时,将建立一个针对这个异步上下文对象的监听器,当产生超时、错误等事件时,将此上下文从队列中移除。

清单 4:Web 日志注册 Servlet
 * Servlet implementation class WebLogServlet 
 @WebServlet(urlPatterns = { "/WebLogServlet" }, asyncSupported = true) 
 public class WebLogServlet extends HttpServlet { 

     * serialVersionUID 
    private static final long serialVersionUID = -260157400324419618L; 

     * 将客户端注册到监听 Logger 的消息队列中
    protected void doGet(HttpServletRequest req, HttpServletResponse res) 
    throws ServletException, IOException { 
        res.setHeader("Cache-Control", "private"); 
        res.setHeader("Pragma", "no-cache"); 
        PrintWriter writer = res.getWriter(); 
        // for IE 
        writer.println("<!-- Comet is a programming technique that enables web 
        servers to send data to the client without having any need for the client 
        to request it. -->\n"); 

        final AsyncContext ac = req.startAsync(); 
        ac.setTimeout(10 * 60 * 1000); 
        ac.addListener(new AsyncListener() { 
            public void onComplete(AsyncEvent event) throws IOException { 

            public void onTimeout(AsyncEvent event) throws IOException { 

            public void onError(AsyncEvent event) throws IOException { 

            public void onStartAsync(AsyncEvent event) throws IOException { 

服务端处理到此为止差不多就结束了,我们再看看客户端的实现。其实客户端我们直接访问这个 Servlet 就可以看到浏览器不断的有日志输出,并且这个页面的滚动条会一直持续,显示 http 连接并没有关闭。为了显示,我们还是对客户端进行了包装,通过一个隐藏的 frame 去读取 WebLogServlet 发出的信息,既 Comet 流方式实现。

清单 5:客户端页面
 <script type="text/javascript" src="js/jquery-1.4.min.js"></script> 
 <script type="text/javascript" src="js/application.js"></script> 
     .consoleFont{font-size:9; color:#DDDDDD; font-family:Fixedsys} 
     .inputStyle{font-size:9; color:#DDDDDD; font-family:Fixedsys; width:100%; 
            height:100%; border:0; background-color:#000000;} 
 <body style="margin:0; overflow:hidden" > 
 <table width="100%" height="100%" border="0" cellpadding="0" 
     cellspacing="0" bgcolor="#000000"> 
    <td colspan="2"><textarea name="result" id="result" readonly="true" wrap="off" 
         style="padding: 10; overflow:auto" class="inputStyle" ></textarea></td> 
 <iframe id="comet-frame" style="display: none;"></iframe> 
清单 6:客户端引用的 application.js
 $(document).ready(function() { 
     var url = '/AsyncServlet/WebLogServlet'; 
     $('#comet-frame')[0].src = url; 

 function update(data) { 
     var resultArea = $('#result')[0]; 
     resultArea.value = resultArea.value + data + '\n'; 

为了模拟日志输出,我们读取了一个已有的日志文件,将内容调用 Logger 输出到浏览器,读者在调试时直接运行源码包中的 TestServlet 即可,运行后整体效果如下所示:

图 1. 运行效果



Comet 的出现为 Web 交互带来了全新的体验,而 Servlet 3.0 和异步 IO 则为 Comet 实现过程中服务端 Web 线程占用的问题提供了规范的解决方案。随着各种支持 Servlet 3.0 容器的出现,Comet 的应用将越来越频繁,目前开发 Comet 应用还是具有一定的挑战性,但随着需求推动技术的发展,相信 Comet 的应用会变得和 AJAX 一样普及。


描述 名字 大小
本文用到的示例程序源码 AsyncServlet.rar 377 KB




[repost ]WebSphere Liberty To WebSphere eXtreme Scale Setup


Add Java to your path and set your JAVA_HOME environment variable appropriately

set JAVA_HOME=C:\work\java\ibm-java-sdk-60-win-i386

Download the following free for developers or trial versions of WebSphere software

WebSphere Liberty

  • Download link: https://www.ibmdw.net/wasdev/downloads/
  • Click on the “Download V8.5.5″ image under “WebSphere Application Server Liberty Profile”
  • Click on the “Download” image, review and agree to the license, click the “Download Now” link and save the resulting “wlp-developers-runtime-” file.
  • Download link: https://www.ibmdw.net/wasdev/downloads/
  • Click on the “Download V8.6″ image under “WebSphere eXtreme Scale for Developers Liberty Profile”
  • Click on the “I confirm” button, review and agree to the license, click the “Download Now” link and save the resulting “wxs-wlp_8.6.0.2.jar” file.
  • Install the WebSphere Liberty Profile Developers Runtime file into a directory of your choice
java -jar wlp-developers-runtime-
  • At the end of the install it will ask for the directory and choose something like C:\work\java and it will create a directory call C:\work\java\wlp
  • Install the “WebSphere eXtreme Scale for Developers Liberty Profile” file into the same directory as the developers runtime
java -jar wxs-wlp_8.6.0.2.jar
  • For the rest of these instructions we will assume this to be the WLP_SERVERDIR
set WLP_SERVERDIR=C:\work\java\wlp

WebSphere eXtreme Scale

cd \work\java
unzip extremescaletrial860.zip
cd \work\java\ObjectGrid
set WXS_SERVERDIR=C:\work\java\ObjectGrid

Install the following development tools

Install the dependency that isn’t available in Maven public repositories:

mvn install:install-file -Dfile=objectgrid.jar -DgroupId=com.ibm.websphere.objectgrid -DartifactId=objectgrid -Dversion= -Dpackaging=jar

Get the Acme Air codebase

  • Go into a directory that you want to have the code in and use git to clone it
cd \work\eclipse
git clone https://github.com/acmeair/acmeair.git
  • For the rest of these instructions we will assume this to be the ACMEAIR_SRCDIR
set ACMEAIR_SRCDIR=\work\eclipse\acmeair

Build the Acme Air codebase

mvn clean compile package install

Create a WebSphere eXtreme Scale configuration

  • To create a new configuration we will create a copy of the “gettingstarted” configuration and customize it to a configuration and directory called “acmeair”
xcopy gettingstarted\*.* acmeair\. /s/e/i/v/q
  • Under %WXS_SERVERDIR%\acmeair, customize the env.bat to include pointers to the classes you have built
  • You will find a line with SAMPLE_SERVER_CLASSPATH, modify it as below (ensure these directories and jars exist based on your environment variables)
SET SAMPLE_SERVER_CLASSPATH=%SAMPLE_HOME%\server\bin;%SAMPLE_COMMON_CLASSPATH%;%ACMEAIR_SRCDIR%\acmeair-common\target\classes;%ACMEAIR_SRCDIR%\acmeair-services-wxs\target\classes;%HOMEPATH%\.m2\repository\commons-logging\commons-logging\\1.1.1\commons-logging-1.1.1.jar

  • Next we copy the Acme Air specific eXtreme Scale configuration files from our source directory
cd %WXS_SERVERDIR%\acmeair
copy /y %ACMEAIR_SRCDIR%\acmeair-services-wxs\src\main\resources\deployment.xml server\config\.
copy /y %ACMEAIR_SRCDIR%\acmeair-services-wxs\src\main\resources\objectgrid.xml server\config\.

Now start the Acme Air WebSphere eXtreme Scale configuration catalog server and container server

  • In one window start the catalog server
cd %WXS_SERVERDIR%\acmeair
  • In another window start a single container server
    • Ensure that you have set JAVA_HOME and have Java in the path as before
cd %WXS_SERVERDIR%\acmeair
.\runcontainer.bat c0

Now we will load sample data into eXtreme Scale

  • In another window, we do this by running a Acme Air loader program
    • Ensure you have JAVA_HOME, Java and Maven in your path as before
cd %ACMEAIR_SRCDIR%\acmeair-loader
mvn exec:java
  • You should see output that indicates flights and customers (200) were loaded

Create and start the WebSphere Liberty server and then deploy the application

bin\server create server1
  • Edit %WLP_SERVERDIR%\usr\servers\server1\server.xml to change the featureManager section to:
  • Copy the web application you previously built
copy %ACMEAIR_SRCDIR%\acmeair-webapp\target\acmeair-webapp-1.0-SNAPSHOT.war %WLP_SERVERDIR%\usr\servers\server1\dropins\.
  • Start the WebSphere Liberty server
bin\server start server1

Finally look at the application

  • Load the following url:
  • Login (use the provided credentials), search for flights (suggest today between Paris and New York), book the flights, use the checkin link to cancel the bookings one at a time, view your account profile

Optionally, if you want you can use eclipse to work with the codebase

  • Tell eclipse to import existing projects into the workspace looking in %ACMEAIR_SRCDIR%

[repost ]WebSphere Liberty Profile Cluster Sharing an In-Memory Data Grid


WebSphere Liberty Profile Cluster Sharing an In-Memory Data Grid


WebSphere Liberty Profile is a fast, lightweight and simple Java web application container allowing developer to develop, test and deploy applications easily.  In my previous articles, I explained how to install Liberty Profile on Mac and how to develop and deploy your first REST based services.

Liberty Profile is a standalone Java container.  It is not designed to be included in larger deployments based on WebSphere Application Server  ND cells.

However, Liberty Profile can take benefit of a shared persistence engine to store HTTP Session data. This allows two or more independent Liberty Profile instances to share a common user session for web applications.  When one instance fails, the surviving instances can continue to serve user requests as-is nothing happened.

Persistent data store might be a relational database (such as Derby used for development purposes) or a in-memory data grid. In-Memory Data Grid are software solutions providing in-memory data storage, replicated across different containers (or machines). Many IMDG solutions are available from different vendors or in open-source.  Most common ones are MemCached, Terracotta (Software AG),Coherence (Oracle) and IBM’s WebSphere eXtreme Scale.

If you are totally new to eXtreme Scale, I would recommend to read some basic information about its architecture before continuing to read this article.

Configuring WebSphere Application Server (WAS – full profile) to store HTTP Session in a eXtreme Scale container is a matter of three clicks in WAS admin console.  It is slightly more complicate with Liberty Profile, just a few configuration steps described below.

There are four different ways to install eXtreme Scale (XS) with Liberty :

  • Run XS Container in a separate JVM or separate machine than Liberty Profile
  • Run XS Container inside the same JVM as Liberty Profile
  • Use Liberty Profile as client for an XS container
  • Configure Liberty Profile to store HTTP Session data to an XS container

In this article, I will show you how to configure Liberty Profile to

  1. Start an XS server within the same JVM as Liberty profile
  2. Store HTTP Session data in this in-memory data grid,allowing to create clusters of Liberty Profile Instances

My final architecture is depicted in the image below.

0. Download and Install Liberty Profile and eXtreme Scale for Liberty Profile (both solutions are available at no charge from IBM – with forum based and peer-to-peer support only).

  • Liberty Profile installation is described in my previous blog entry.
  • eXtreme Scale for Liberty Profile installation is just a matter of unzipping the file in the directory above wlp

1. Create two servers instances

cd wlpBLOG
sst:wlpBLOG sst$ ./bin/server create ServerONE
Server ServerONE created.
sst:wlpBLOG sst$ ./bin/server create ServerTWO
Server ServerTWO created.

2. Change default HTTP Port in both server.xml so that the two instances can run in parallel

<httpEndpoint host="localhost" httpPort="9080" httpsPort="9443" id="defaultHttpEndpoint"/>

3. Add two features in server.xml for each server.  One to tell Liberty to run an XS server embedded.  And one to tell Liberty to use XS as HTTP Session store for web applications.

<!-- Enable features -->

4. Configure the the WXS container inside Liberty Profile : add WXS configuration in Liberty Profile

<!-- Configuration for XS Server -->
<xsServer isCatalog="true" serverName="XS_ServerONE"/>
<!-- Configuration for Web Application XS HTTP Session data storage -->
<xsWebApp catalogHostPort="localhost:2809"

5. Configure the the WXS container inside Liberty Profile : add XML configuration files in WLP runtime directory

In the directory WLP_HOME/usr/servers/ServerONE, create a “grids” directory and drop those two files

<?xml version="1.0" encoding="UTF-8"?>
<deploymentPolicy xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ibm.com/ws/objectgrid/deploymentPolicy ../deploymentPolicy.xsd"
<objectgridDeployment objectgridName="session">
<mapSet name="sessionMapSet" numberOfPartitions="47" minSyncReplicas="0" maxSyncReplicas="0" maxAsyncReplicas="1" developmentMode="false" placementStrategy="FIXED_PARTITIONS">
<map ref="objectgridSessionMetadata"/>
<map ref="objectgridSessionAttribute.*"/>
<map ref="objectgridSessionTTL.*"/>


<?xml version="1.0" encoding="UTF-8"?>
<objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd"
<objectGrid name="session" txTimeout="30">
<bean id="ObjectGridEventListener" className="com.ibm.ws.xs.sessionmanager.SessionHandleManager"/>
<backingMap name="objectgridSessionMetadata" pluginCollectionRef="objectgridSessionMetadata" readOnly="false" lockStrategy="PESSIMISTIC" ttlEvictorType="LAST_ACCESS_TIME" timeToLive="3600" copyMode="COPY_TO_BYTES"/>
<backingMap name="objectgridSessionAttribute.*" template="true" readOnly="false" lockStrategy="PESSIMISTIC" ttlEvictorType="NONE" copyMode="COPY_TO_BYTES"/>
<backingMap name="objectgridSessionTTL.*" template="true" readOnly="false" lockStrategy="PESSIMISTIC" ttlEvictorType="LAST_ACCESS_TIME" timeToLive="3600" copyMode="COPY_TO_BYTES"/>
<backingMapPluginCollection id="objectgridSessionMetadata">
<bean id="MapEventListener" className="com.ibm.ws.xs.sessionmanager.MetadataMapListener"/>

6. Tell Liberty’s session manager to reuse the same session ID for all user’s requests, even if handled by different JVM (See Liberty’sdocumentation for more details)

<httpSession idReuse="true"/>

7. Start Liberty Profile

sst:wlpBLOG sst$ ./bin/server start ServerONE
Server ServerONE started with process ID 11769.

In the logs, wait for the following line

[AUDIT ] CWWKF0011I: The server ServerONE is ready to run a smarter planet.

8. Create & Deploy a simple JSP file for testing

Create a Dynamic Web Project in Eclipse, and add the following index.jsp page

<%@page contentType="text/html" pageEncoding="UTF-8"%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Liberty Profile Cluster Demo</title>
<h1>Liberty Profile - eXtreme Scale HTTP Session Demo!</h1>
Integer count;
Object o = session.getAttribute("COUNT");
if (o != null) {
count = (Integer) o;
count = count + 1;
} else {
count = 1;
session.setAttribute("COUNT", count);
<h3>This counter is increased each time the page is loaded.  Its value is stored in the <code>HttpSession</code></h3>
<h3><font color="#FF0000">Counter = <%=count%></font></h3>
<h4>Page server by cluster instance : <font color="#FF0000"><b><%= System.getProperty("wlp.server.name") %></b></font></h4>
Page generated at = <%=new java.util.Date().toString()%><br/>

Then deploy the WAR to the server instance (example of creating a WAR and deploying it to Liberty is given in my previous blog post)

9. Test, open your favorite browser and connect to http://localhost:9080/

You should see the following screen

Each time you will refresh the page (CTRL-R), the counter should be increased by one

Congrats, you have your first instance up and running, let’s now configure a second instance.

Repeat Steps 2-7 on a second Liberty instance to create a second cluster member.  Remember to change the following

  • The name of the instance
  • The HTTP and HTTPS ports used by Liberty Profile (step 2 above)
  • The WXS configuration – only one catalog server is needed (step 3 above, change isCatalog=”no”)
  • You do not need to copy the XML files in the grids directory of the second instance (step 5) – This is only required on the instance running XS’ Catalog Server

Then deploy your test application to instance #2.  To test your application, point your browser to

http://localhost:9081/<YOUR APPLICATION NAME>

You should see a page similar to the one shown at step 9 above.  Try to alternatively reload the page from ServerONE and the page from ServerTWO : you should see the session counter to increase in a sequence across the two server instances.

You’ve just created your first Liberty Profile cluster with two instances and a shared in-memory grid for HTTP session storage.

I leave you as an exercise to install and configure a load balancer in front of these two instances.  Hint : I am using the open-source balancefor demo / test purpose.

If you find errors / typos in this (long) article, let me know – I will fix them – Thanks !

Enjoy !

[repost ]The Great Java Application Server Debate with Tomcat, JBoss, GlassFish, Jetty and Liberty Profile


Summary of Findings

In an interesting note, although JBoss won the overall competition, it never earned the top score in any single category. Here is a quick overview of the 8 sections we covered above.

  • Download and Installation – all Application Servers see the need for lowering the barriers to access and provide easy download and installation procedures.

  • Tooling support – most favored Eclipse, but Tomcat, JBoss and GlassFish all had excellent support for the big 3 IDEs.

  • Server Configuration – the Liberty Profile won here with it’s new dynamic model of hot reloading both the runtime components and config. Jetty and Tomcat were also strong.

  • Documentation & Community – Tomcat has a big, vibrant and established community that beats the competition hands down. Docs are also very good.

  • Real Performance Metrics – Tomcat outperformed the others in our developer oriented performance tests. The Liberty Profile suffered long initialization times, while the others chased Tomcat.

  • Features & Open Standards compliance – JBoss and GlassFish rule the feature war, both being Java EE 6 compliant and supporting OSGi applications. Jetty and Tomcat are happier remaining as lightweight as possible as web containers, while the Liberty Profile looks to be growing each year.

  • Administration & Management/UI – JBoss and GlassFish both provide top class admin consoles, while Jetty doesn’t ship an ounce of UI. The Liberty Profile and Tomcat do provide glimpses of admin but not far enough.

  • Cost $$$/Licensing – all servers are free for the developer – WIN! The Liberty Profile is a little more restrictive when it comes to licenses.

  • WebSphere and WebLogic – both prove to be ‘enterprise’ from download all the way to production.