Discuss New Concept,New Technic,New Tools, Including EAI,BPM,SOA,Tibco,IBM MQ,Tuxedo, Cloud,Hadoop,NoSQL,J2EE,Ruby,Scala,Python, Performance,Scalability,Distributed,HA, Social Network,Machine Learning.

November 23, 2012  Tagged with: , , ,

original:http://sujitpal.blogspot.com/2012/11/an-elasticsearch-web-client-with-scala.html

In this post, I describe the second part of my submission for the Typesafe Developer Contest. This part is a rudimentary web based search client to query an ElasticSearch (ES) server. It is a Play2/Scala web application that communicates with the ES server via its JSON query DSL.

The webapp has a single form that allows you to specify a Lucene query and various parameters and returns a HTML or JSON response. It will probably remind Solr developers of the admin form. I find the Solr admin form very useful for trying out qeries before baking them into code, and I envision a similar use for this webapp for ES search developers.

Since ES provides a rich JSON based Query DSL, the form here has a few more features than the Solr admin form, such as allowing for faceting and sorting. Although in the interests of full disclosure, it provides only a subset of the variations possible via direct use of JSON and curl on the command line. But its good for quick and dirty verification of search ideas. In order to quickly get started with ES’s query DSL, I found this DZone article by Peter Kar and this blog post by Pulkit Singhal very useful (apart from the ES docs themselves, of course).

Since Play2 was completely new to me a week ago and now I am the proud author of a working webapp, I would like to share with you some of my insights into this framework. I typically learn new things by making analogies to stuff I already know, so I will explain Play2 by making analogies to Spring. If you know Spring, it may be helpful, and if you don’t, well, maybe it was not that terribly helpful anyway…

Routing in Play2 is done using the conf/routes file, which maps URL patterns and HTTP methods to Play2 controller actions. Actions can be thought of as @RequestMapping methods in a Multi-action Spring controller, and are basically functions that transform a Request into a Response. A response can be a String wrapped in an Ok() method or it can be a method call into a view with some data, which returns a templated string to Ok(). There, thats it – about everything you need to know about Play2 to get to using it.

Unlike the last time (with Akka), this time around I did not use the Typesafe Play tutorial. Instead I downloaded Play2 and used the play command to build a new web application template (play new search), then to compile and run it. The best tutorial I found was this one on flurdy.com, which covers everything from choice of IDE to deployment on Heroku and everything in between. Other useful sources are Play’s documentation (available with the Play2 download) and this example Play2 app on GitHub.

Here is my conf/routes file. I added the two entries under Search pages. They both respond to HTTP GET requests and call the form() and search() Actions respectively. The other two entries come with the generated project and are needed (so don’t delete them).

 1 2 3 4 5 6 7 8 9 10 # conf/routes # Home page GET / controllers.Application.index # Search pages GET /form controllers.Application.form GET /search controllers.Application.search # Map static resources from the /public folder to the /assets URL path GET /assets/*file controllers.Assets.at(path="/public", file)

There is another file in the conf directory, called conf/application.conf. It contains properties required by the default application. I added a new property for the URL for the ES server in this file.

 1 2 3 # conf/application.conf ... es.server="http://localhost:9200/"

The Play2 “new” command also generates a skeleton controller app/controllers/Application.scala, into which we add the two new form and search Actions. Here is the completed Application.scala file.

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 // app/controllers/Application.scala package controllers import models.{Searcher, SearchParams} import play.api.data.Forms.{text, number, mapping} import play.api.data.Form import play.api.libs.json.{Json, JsValue} import play.api.libs.ws.WS import play.api.mvc.{Controller, Action} import play.api.Play object Application extends Controller { // define the search form val searchForm = Form( mapping ( "index" -> text, "query" -> text, "filter" -> text, "start" -> number, "rows" -> number, "sort" -> text, "writertype" -> text, "fieldlist" -> text, "highlightfields" -> text, "facetfields" -> text ) (SearchParams.apply)(SearchParams.unapply) ) // configuration parameters from conf/application.conf val conf = Play.current.configuration val server = conf.getString("es.server").get // home page - redirects to search form def index = Action { Redirect(routes.Application.form) } // form page def form = Action { val rsp = Json.parse(WS.url(server + "_status"). get.value.get.body) val indices = ((rsp \\ "indices")). map(_.as[Map[String,JsValue]].keySet.head) Ok(views.html.index(indices, searchForm)) } // search results action - can send view to one of // three different pages (xmlSearch, jsonSearch or htmlSearch) // depending on value of writertype def search = Action {request => val params = request.queryString. map(elem => elem._1 -> elem._2.headOption.getOrElse("")) val searchParams = searchForm.bind(params).get val result = Searcher.search(server, searchParams) searchParams.writertype match { case "json" => Ok(result.raw).as("text/javascript") case "html" => Ok(views.html.search(result)).as("text/html") } } }

We first define a Search form and map it to the SearchParams class (defined in the model, below). The index Action has been changed to redirect to the form Action. The form method makes a call to the ES server to get a list of indexes (ES can support multiple indexes with different schemas within the same server), and then delegates to the index view with this list and an empty searchForm.

The search Action binds the request to the searchParams bean, then sends this bean to the Searcher.search() method, which returns a SearchResult object containing the results of the search. Two different views are supported – the HTML view (delegating to the search view template) and the raw JSON view that just dumps the JSON response from ES.

The respective views for the form and search are shown below. Not much to explain here, except that its another templating language that you have to learn. Its set up like a function – you pass in parameters that you use in the template. I followed the lead of the flurdy.com tutorial referenced above and kept it as HTML-ish as possibly, but Play2 has an extensive templating language of its own that you may prefer.

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 @** app/views/index.scala.html **@ @(indices: Seq[String], searchForm: Form[SearchParams]) @import helper._ @main("Search with ElasticSearch") {

Search with ElasticSearch

@form(action = routes.Application.search) {
Index Name
Lucene Query
Filter Query
Start Row
Maximum Rows Returned
Sort Fields
Output Type
Fields To Return
Fields to Highlight
Fields to Facet
} }

The resulting input form looks like this:

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 @** app/views/search.scala.html **@ @(result: SearchResult) @import helper._ @main("Search with ElasticSearch - HTML results") {

Search Results

@result.meta("start") to @result.meta("end") results of @result.meta("numFound") in @result.meta("QTime") ms

JSON Query: @result.meta("query_json")

@for(doc <- result.docs) {
@for((fieldname, fieldvalue) <- doc) {
@fieldname @fieldvalue
} }
}

Finally, we come to the part of the application that is not autogenerated by Play2 and which contains all the business logic of the application – the model. Here is the code.

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 // app/models/Searcher.scala package models import scala.Array.canBuildFrom import play.api.libs.json.{Json, JsValue} import play.api.libs.ws.WS case class SearchResult( meta: Map[String,Any], docs: Seq[Seq[(String,JsValue)]], raw: String ) case class SearchParams( index: String, query: String, filter: String, start: Int, rows: Int, sort: String, writertype: String, fieldlist: String, highlightfields: String, facetfields: String ) object Searcher { def search(server: String, params: SearchParams): SearchResult = { val payload = Searcher.buildQuery(params) val rawResponse = WS.url(server + params.index + "/_search?pretty=true").post(payload).value.get.body println("response=" + rawResponse) val rsp = Json.parse(rawResponse) val meta = (rsp \ "error").asOpt[String] match { case Some(x) => Map( "error" -> x, "status" -> (rsp \ "status").asOpt[Int].get ) case None => Map( "QTime" -> (rsp \ "took").asOpt[Int].get, "start" -> params.start, "end" -> (params.start + params.rows), "query_json" -> payload, "numFound" -> (rsp \ "hits" \ "total").asOpt[Int].get, "maxScore" -> (rsp \ "hits" \ "max_score").asOpt[Float].get ) } val docs = if (meta.contains("error")) Seq() else { val hits = (rsp \ "hits" \ "hits").asOpt[List[JsValue]].get val idscores = hits.map(hit => Map( "_id" -> (hit \ "_id"), "_score" -> (hit \ "_score"))) val fields = hits.map(hit => (hit \ "_source").asOpt[Map[String,JsValue]].get) idscores.zip(fields). map(tuple => tuple._1 ++ tuple._2). map(doc => doc.toSeq.sortWith((doc1, doc2) => doc1._1 < doc2._1)) } new SearchResult(meta, docs, rawResponse) } def buildQuery(params: SearchParams): String = { val queryQuery = Json.toJson( if (params.query.isEmpty || "*:*".equals(params.query)) Map("match_all" -> Map.empty[String,String]) else Map("query_string" -> Map("query" -> params.query))) val queryFilter = if (params.filter.isEmpty) null else Json.toJson(Map("query_string" -> Json.toJson(params.filter))) val queryFacets = if (params.facetfields.isEmpty) null else { val fields = params.facetfields.split(",").map(_.trim) Json.toJson(fields.zip(fields. map(field => Map("terms" -> Map("field" -> field)))).toMap) } val querySort = if (params.sort.isEmpty) null else Json.toJson(params.sort.split(",").map(_.trim).map(field => if (field.toLowerCase.endsWith(" asc") || field.toLowerCase.endsWith(" desc")) (field.split(" ")(0), field.split(" ")(1)) else (field, "")).map(tuple => if (tuple._2.isEmpty) Json.toJson(tuple._1) else Json.toJson(Map(tuple._1 -> tuple._2)))) val queryFields = if (params.fieldlist.isEmpty) null else Json.toJson(params.fieldlist.split(",").map(_.trim)) val queryHighlight = if (params.highlightfields.isEmpty) null else { val fields = params.highlightfields.split(",").map(_.trim) Json.toJson(Map("fields" -> fields.zip(fields. map(field => Map.empty[String,String])).toMap)) } Json.stringify(Json.toJson(Map( "from" -> Json.toJson(params.start), "size" -> Json.toJson(params.rows), "query" -> queryQuery, "filter" -> queryFilter, "facets" -> queryFacets, "sort" -> querySort, "fields" -> queryFields, "highlight" -> queryHighlight). filter(tuple => tuple._2 != null))) } }

The first two are simple case classes, SearchParams and SearchResults are an FBO (Form Backing Object) and DTO (Data Transfer Object) respectively from the Spring world. The search() method takes the ES server URL and the filled in SearchParams object, calls buildQuery() to build the ES Query JSON, then hits the ES server. It then parses the JSON response from ES to create the SearchResult bean, which is passes back to the search Action. The SearchResults object contains a Map containing response metadata, a List of List of key-value pairs which contain the documents, and the raw JSON response from ES.

Here are some screenshots of the results for “hedge fund” from our Enron index that we built using the code from the previous post.

The one on the left shows HTML results (and also shows the JSON query that one would need to use to get the results. The one on the right shows the raw JSON results from the ES server.

Thats all I have for this week. Hope you found it interesting.

Update 2011-11-20 – There were some minor bugs caused by the fields parameter being blank. If the fields parameter is blank, the _source JSON field is returned by ES instead of an array of field objects. The fix is to pass in a “*” (all fields) as the default for the fields parameter. The updated code can be found on my GitHub page.