[repost ]Indri Parameter Files

original:http://lemur.sourceforge.net/indri/IndriParameters.html

The indri applications, IndriBuildIndex, IndriDaemon, and IndriRunQuery accept parameters from either the command line or from a file. The parameter file uses an XML format. The command line uses dotted path notation. The top level element in the parameters file is named parameters.

Repository construction parameters

memory
an integer value specifying the number of bytes to use for the indexing process. The value can include a scaling factor by adding a suffix. Valid values are (case insensitive) K = 1000, M = 1000000, G = 1000000000. So 100M would be equivalent to 100000000. The value should contain only decimal digits and the optional suffix. Specified as <memory>100M</memory> in the parameter file and as -memory=100M on the command line.
corpus

a complex element containing parameters related to a corpus. This element can be specified multiple times. The parameters are

path
The pathname of the file or directory containing documents to index. Specified as <corpus><path>/path/to/file_or_directory</path></corpus> in the parameter file and as -corpus.path=/path/to/file_or_directory on the command line.
class
The FileClassEnviroment of the file or directory containing documents to index. Specified as <corpus><class>trecweb</class></corpus> in the parameter file and as -corpus.class=trecweb on the command line. The known classes are:

  • html — web page data.
  • xml — xml marked up data.
  • trecweb — TREC web format, eg terabyte track.
  • trectext — TREC format, eg TREC-3 onward.
  • trecalt — TREC format, eg TREC-3 onward, with only the TEXT field included.
  • warc — WARC (Web ARChive) format, such as can be output by the heritrix webcrawler.
  • warcchar — WARC (Web ARChive) format, such as can be output by the heritrix webcrawler. Tokenizes individual characters, enabling indexing of unsgemented text.
  • doc — Microsoft Word format (windows platform only).
  • ppt — Microsoft Powerpoint format (windows platform only).
  • pdf — Adobe PDF format.
  • txt — Plain text format.
annotations
The pathname of the file containing offset annotations for the documents specified in path. Specified as <corpus><annotations>/path/to/file</annotations></corpus> in the parameter file and as -corpus.annotations=/path/to/file on the command line.
metadata

The pathname of the file or directory containing offset metadata for the documents specified in path. Specified as <corpus><metadata>/path/to/file</metadata></corpus> in the parameter file and as -corpus.metadata=/path/to/file on the command line.

Combining the first two of these elements, the parameter file would contain:
<corpus>
<path>/path/to/file_or_directory</path>
<class>trecweb</class>
</corpus>

metadata

a complex element containing one or more entries specifying the metadata fields to index, eg title, headline. There are three options

  1. field — Make the named field available for retrieval as metadata. Specified as <metadata><field>fieldname</field></metadata> in the parameter file and as metadata.field=fieldname on the command line.

  2. forward — Make the named field available for retrieval as metadata and build a lookup table to make retrieving the value more efficient. Specified as <metadata><forward>fieldname</forward></metadata> in the parameter file and asmetadata.forward=fieldname on the command line. The external document id field “docno” is automatically added as a forward metadata field.

  3. backward — Make the named field available for retrieval as metadata and build a lookup table for inverse lookup of documents based on the value of the field. Specified as <metadata><backward>fieldname</backward></metadata> in the parameter file and as metadata.backward=fieldname on the command line. The external document id field “docno” is automatically added as a backward metadata field.

field

a complex element specifying the fields to index as data, eg TITLE. This parameter can appear multiple times in a parameter file. If provided on the command line, only the first field specified will be indexed. The subelements are:

name
the field name, specified as <field><name>fieldname</name></field> in the parameter file and as -field.name=fieldname on the command line.
numeric
the symbol true if the field contains numeric data, otherwise the symbol false, specified as <field><numeric>true</numeric></field> in the parameter file and as -field.numeric=true on the command line. This is an optional parameter, defaulting to false. Note that 0 can be used for false and 1 can be used for true.
parserName
the name of the parser to use to convert a numeric field to an unsigned integer value. The default is NumericFieldAnnotator. If numeric field data is provided via offset annotations, you should use the value OffsetAnnotationAnnotator. If the field contains a formatted date (see Date Fields) you should use the value DateFieldAnnotator.
stemmer

a complex element specifying the stemming algorithm to use in the subelement name. Valid options are:

  • porter — Porter stemmer
  • krovetz — Krovetz stemmer
  • arabic_stop — Larkey stemmer, remove stopwords
  • arabic_norm2 — Larkey stemmer, table normalization
  • arabic_norm2_stop — Larkey stemmer, table normalization with stopping
  • arabic_light10 — Larkey stemmer, light9 plus ll prefix
  • arabic_light10_stop — Larkey stemmer, light10 and remove stop words

Specified as <stemmer><name>stemmername</name></stemmer> and as -stemmer.name=stemmername on the command line. This is an optional parameter with the default of no stemming.

normalize
true to perform case normalization when indexing, false to index with mixed case. Default true
stopper
a complex element containing one or more subelements named word, specifying the stopword list to use. Specified as <stopper><word>stopword</word></stopper> and as -stopper.word=stopword on the command line. This is an optional parameter with the default of no stopping.
offsetannotationhint
An optional parameter to provide a hint to the indexer to speed up indexing of offset annotations when using offset annotation files as specified in the <corpus> parameter. Valid values here are “unordered” and “ordered”. An “unordered” hint (the default) will inform the indexer that the document IDs of the annotations are not necessarily in the same order as the documents in the corpus. The indexer will adjust its internal memory allocations appropriately to pre-allocate enough memory before reading in the annotations file. If you are absolutely certain that the annotations in the offset annotation file are in the exact same order as the documents, then you can use the “ordered” hint. This will tell the indexer to not read in the entire file at once, but rather read in the offset annotations file as needed for only the annotations that are specified for the currently indexing document ID.

QueryEnvironment Parameters

Retrieval Parameters

index
path to an Indri Repository. Specified as <index>/path/to/repository</index> in the parameter file and as -index=/path/to/repository on the command line. This element can be specified multiple times to combine Repositories.
server
hostname of a host running an Indri server (IndriDaemon). Specified as <server>hostname</server> in the parameter file and as -server=hostname on the command line. The hostname can include an optional port number to connect to, using the form hostname:portnum. This element can be specified multiple times to combine servers.
count
an integer value specifying the maximum number of results to return for a given query. Specified as <count>number</count> in the parameter file and as -count=number on the command line.
query

An indri query language query to run. This element can be specified multiple times. The query element may take numerous optional parameters. With none of the optional parameters, the query text can be the body of the element, eg:

<query>combine(query terms)</query>

The optional parameters are:

type
one of indri, to use the indri query language, or nexi to use the nexi query language. The default is indri. This element may appear 0 or 1 times.
number
The query number or identifier. This may be a non-numeric symbol. The default is to number the queries in the parameters in order, starting with 0. This element may appear 0 or 1 times.
text
The query text, eg, “#combine(query terms)”. This element may appear 0 or 1 times and must be used if any of the other parameters are supplied.
workingSetDocno
The external document id of a document to add to the working set for the query. This element may appear 0 or more times. When specified, query evaluation is restricted to the document ids specified.
feedbackDocno
The external document id of a document to add to the relevance feeedback set for the query. This element may appear 0 or more times. When specified, query expansion is performed using only the document ids specified. It is still necessary to specify a non-zero value for the fbDocs parameter when specifying feedbackDocno elements.
rule

specifies the smoothing rule (TermScoreFunction) to apply. Format of the rule is:

( key ":" value ) [ "," key ":" value ]*

Here’s an example rule in command line format:

-rule=method:linear,collectionLambda:0.2,field:title

and in parameter file format:
<rule>method:linear,collectionLambda:0.2,field:title</rule>

This corresponds to Jelinek-Mercer smoothing with background lambda equal to 0.2, only for items in a title field.

If nothing is listed for a key, all values are assumed. So, a rule that does not specify a field matches all fields. This makes -rule=method:linear,collectionLambda:0.2 a valid rule.

Valid keys:

method
smoothing method (text)
field
field to apply this rule to
operator
type of item in query to apply to { term, window }

Valid methods:

dirichlet
(also ‘d’, ‘dir’) (default mu=2500)
jelinek-mercer
(also ‘jm’, ‘linear’) (default collectionLambda=0.4, documentLambda=0.0), collectionLambda is also known as just “lambda”, either will work
twostage
(also ‘two-stage’, ‘two’) (default mu=2500, lambda=0.4)

If the rule doesn’t parse correctly, the default is Dirichlet, mu=2500.

stopper
a complex element containing one or more subelements named word, specifying the stopword list to use. Specified as <stopper><word>stopword</word></stopper> and as -stopper.word=stopword on the command line. This is an optional parameter with the default of no stopping.
maxWildcardTerms
(optional) An integer specifying the maximum number of wildcard terms that can be generated for a synonym list for this query or set of queries. If this limit is reached for a wildcard term, an exception will be thrown. If this parameter is not specified, a default of 100 will be used.

Baseline (non-LM) retrieval

baseline

Specifies the baseline (non-language modeling) retrieval method to apply. This enables running baseline experiments on collections too large for the Lemur RetMethod API. When running a baseline experiment, the queries may not contain anyindri query language operators, they must contain only terms.

Format of the parameter value:

(tfidf|okapi) [ "," key ":" value ]*

Here’s an example rule in command line format:

-baseline=tfidf,k1:1.0,b:0.3

and in parameter file format:
<baseline>tfidf,k1:1.0,b:0.3</baseline>

Methods:

tfidf

Performs retrieval via tf.idf scoring as implemented in lemur::retrieval::TFIDFRetMethod using BM25TF term weighting. Pseudo-relevance feedback may be performed via the parameters below.

Parameters (optional):

k1
k1 parameter for term weight (default 1.2)
b
b parameter for term weight (default 0.75)
okapi

Performs retrieval via Okapi scoring as implemented in lemur::retrieval::OkapiRetMethod. Pseudo-relevance feedback may <bold>not</bold> be performed with this baseline method.

Parameters (optional):

k1
k1 parameter for term weight (default 1.2)
b
b parameter for term weight (default 0.75)
k3
k3 parameter for query term weight (default 7)

Formatting Parameters

queryOffset
an integer value specifying one less than the starting query number, eg 150 for TREC formatted output. Specified as <queryOffset>number</queryOffset> in the parameter file and as -queryOffset=number on the command line.
runID
a string specifying the id for a query run, used in TREC scorable output. Specified as <runID>someID</runID> in the parameter file and as -runID=someID on the command line.
trecFormat
the symbol true to produce TREC scorable output, otherwise the symbol false. Specified as <trecFormat>true</trecFormat> in the parameter file and as -trecFormat=true on the command line. Note that 0 can be used for false, and 1 can be used for true.

Pseudo-Relevance Feedback Parameters

fbDocs
an integer specifying the number of documents to use for feedback. Specified as <fbDocs>number</fbDocs> in the parameter file and as -fbDocs=number on the command line.
fbTerms
an integer specifying the number of terms to use for feedback. Specified as <fbTerms>number</fbTerms> in the parameter file and as -fbTerms=number on the command line.
fbMu
a floating point value specifying the value of mu to use for feedback. Specified as <fbMu>number</fbMu> in the parameter file and as -fbMu=number on the command line.
fbOrigWeight
a floating point value in the range [0.0..1.0] specifying the weight for the original query in the expanded query. Specified as <fbOrigWeight>number</fbOrigWeight> in the parameter file and as -fbOrigWeight=number on the command line.

IndriDaemon Parameters

index
path to the Indri Repository to act as server for. Specified as <index>/path/to/repository</index> in the parameter file and as -index=/path/to/repository on the command line.
port
an integer value specifying the port number to use.Specified as <port>number</port> in the parameter file and as -port=number on the command line.

[repost ]Indri Repository Structure

original:http://sourceforge.net/p/lemur/wiki/Indri%20Repository%20Structure/

Overview

An Indri Repository is a collection of a set of files with a specified format that contains all the relevant information regarding a collection. The collection contains information about the indexed documents, any fields or metadata, the inverted indexes for the collection and other necessary items. It is important to note that when you request to open an Indri index, there is no one specific file that you should give, but rather, the root of the collection folder structure should be used.

Technical Details

While building an Indri index, the indexer will build indexes in memory before writing them out to disk. This increases the speed of indexing, and allows the indexer to flush only when ready (or necessary) from a separate thread. As collections are indexed, the indexer will keep multiple indexes in memory which act as one repository.

The indexer will automatically merge in-memory indexes when the soft-limit for memory is reached. Indri will typically also merge indexes after a few very big documents or a lot of very small documents. For example, if you are using a gigabyte of memory, I would guess that Indri would write to disk after about 100,000 documents.

When merging two small indexes, Indri always chooses to merge the most recent index with the one before it. In many cases, though, Indri will choose to merge many indexes together at once (as many as 50). The last index is always included.

The detailed explanation of the Indri repository structure and index build can be found in the paper Dynamic Collections in Indri (PDF format) by Trevor Strohman.

For details of the index building and merging operations, see Low Latency Index Maintenance in Indri (PDF format), also by Trevor Strohman.

Disk Structure

On disk, an Indri collection is made up of several files:

  • Frequent Vocabulary Files:
    • for any term that appears more than 1000 times in a corpus
    • File: “frequentID” – a !BulkTree structure (essentially a B-Tree) mapping from termID to a term string. The value entries also store the start offset in the inverted list file and the length of the entry in the inverted list file.
    • File: “frequentString” – a !BulkTree structure mapping from term string to a termID. The value entries also store the start offset in the inverted list file and the length of the entry in the inverted list file.
    • File: “frequentTerms” – a list (not a tree) of tuples having <termID, term=”” string=””> for each pair – used only at index merge time while building a collection.
  • Infrequent Vocabulary Files:
    • File: “infrequentID” – a !BulkTree structure mapping from termID to a term string. The value entries also store the start offset in the inverted list file and the length of the entry in the inverted list file.
    • File: “infrequentString” – a !BulkTree structure mapping from term string to a termID. The value entries also store the start offset in the inverted list file and the length of the entry in the inverted list file.
  • Inverted Lists
    • File: “invertedFile” – the inverted lists for all terms in the collection. This file consists of (for each term):
      • corpus statistics such as the doc frequency and corpus frequency and statistics for each field < doc frequency, corpus frequency >
      • the maxDocumentLength and minDocumentLength that the term occurs in
      • the actual term string
      • top document ID list (top 1% of the documents my document frequency of the term) in
      • the actual inverted list (in RVL Compressed format) consisting of:
        • docID (delta-encoded from previous docID)
        • size of position data
        • the actual position data – 1 integer per position in this document (also delta encoded from previous position)
  • Field Information File – The inverted lists file for all fields in the collection. This file consists of (for each field):
    • like the regular inverted lists, RVL Compression used and for each entry:
      • the docID (delta encoded from last doc ID)
      • number of extents in the document and for each extent:
        • extent begin (delta-encoded from last begin)
        • extent end (delta-encoded from last end)
        • extent ordinal
        • numeric value (if applicable)

[repost ]Indri Document Scoring

original:http://sourceforge.net/p/lemur/wiki/Indri%20Document%20Scoring/

Indri uses the language modeling approach to information retrieval. Language modeling assigns a probability value to each document, meaning that every score is a value between 0 and 1. For computational accuracy reasons, Indri returns the log of the actual probability value. log(0) equals negative infinity, and log(1) equals zero, so Indri document scores are always negative.

Without diving into a lot of math, it’s probably best to assume that these values are not comparable across queries. In particular, you’ll probably notice that as you add words to a query, the average document score tends to drop, even though the system probably gets better at finding good documents.

By default, Indri uses a query likelihood function with Dirichlet prior smoothing to weight terms. The formulation is given by:

c(w;D) =count of word in the document

c(w;C) =count of word in the collection

|D| =number of words in the document

|C| =number of words in the collection

numerator = c(w;D) + mu * c(w;C) / |C|

denominator = |D| + mu

score = log( numerator / denominator )

By default, mu is equal to 2500, which means that for the very small documents you’re using, the score differences will be very small.

More information can be found on [Indri Retrieval Model].


Related

Wiki: Home
Wiki: Indri Retrieval Model
Wiki: Scored Query Evaluation
Wiki: Technical Details

[repost ]展望CDI 2.0规范

original:http://wildfly.iteye.com/blog/2029429

作者Antoine Sabot-Durand是红帽软件工程师,CDI规范专家组成员,本文提出了对未来的CDI 2.0规范想法,体现了作者对于组件运行的本质思考,是难得的好文。

 

不过估计写成此文时一气呵成,所以有些笔误和未达意之处。我翻译的也比较急,难免有错误之处,敬请谅解。

 

 

====== 以下为译文  =========
CDI可能是在JavaEE中最容易被忽略的规范之一。4年前1.0版本发布时,CDI是作为”JavaEE的扩展点“出现的。从技术来说它的确如此,但是并不是所有JavaEE的JSR都完全采用CDI,来提供一致的编程体验。今天IT世界快速发展,经过很多年使用巨大内存和多核的技术方案,我们又回归到原来需要优化资源使用的日子,面向移动或者嵌入式平台开发。由于CDI可以在JavaEE之外使用,所以它可以在新的方案里成为有趣的一个角色。但在此之前,它应该继续演进,并为这个新的挑战做好准备(并继续推进在JavaEE中越来越重要)。在这篇文章中我将尝试分享我的想法,CDI应该如何演变来满足未来的需求。我会探讨新的功能特性,使用一种更加模块化的体系结构,来使CDI可以容易扩展,从小型的树莓派设备到大型的集群方案。

声明:由于我会领导CDI 2.0规范(和Pete Muir一起),需要强调这篇文章都是我个人的观点,而不代表CDI 2.0的官方内容,或者是红帽软件CDI专家组的任何看法。

我想看到的新特性

一些特性已经被专家组讨论了,有些没有加入到1.1的推迟到2.0版本。有些是第三方开发的内容(主要是Apache的Deltaspike)规范标准化,有些是全新的。无论如何,大部分观点都很重要,可以帮助第三方项目或者JSR来促成最佳的CDI规范。

在JavaSE中启动

如今每一个CDI实现都提供一种方式来在JavaSE或者简单服务器环境(如仅包括Servlet)中启动CDI,Apache Deltaspike有一个通用的解决方案来做到这个。现在可以考虑把CDI规范集成到JavaSE中了。

容器热交换

强类型注入非常有用,但我认为CDI这方面太严格了。为了支持动态JVM语言,我们应该能够触发BeanManager重新启动来加载更多的bean,比如提供API来启动一个新的CDI容器来扩充内容。如果启动顺利,通过更多的步骤在新的容器中来复制状态和存在的实例,并保留旧容器的垃圾存在现状。这样在出错的情况下,我们可以保存当前的BeanManager。这是一个代价比较大,但是非常有用的特性,特别是对于一些高级工具或者特定功能的应用系统(比如CMS,电子商务平台等)

轻量级CDI

正如之前谈到的,业界希望能让CDI运行在嵌入式系统中,如树莓派,Arduino,乐高积木Mindstorms或者Android。如今大量使用Proxy阻碍了这个方向。

Java Proxy在2000年时非常有用,但现在这项技术太老了,带来很多问题(巨大的堆栈跟踪,JVM优化,高资源消耗等),现在CDI规范也想当然的未来代理用在实现之中。

这样看来,有一个好的方案是提供一个CDI规范子集,更少的限制,更加轻量。我们可以称之为“CDI Lite”(类似EE6中的EJB Lite)。这个子集会包括DI的所有内容,但可能去除所有和上下文(Context),Interceptor(拦截器),Decorator(装饰器)等内容。

也有讨论如何在不移除特性的情况下让CDI更轻量一些。

再见Proxy,欢迎Annotation和InvokeDynamic

另一个方法是在CDI实现中找到其他方案来替换Proxy,现在我们有两个选择:

  • Annotation处理:编译时就对annotation进行处理,生成静态“magic”注入/装饰代码,类似于Dagger。
  • InvokeDynamic:运行时进行链接,可以提供类似Proxy的行为,而较少的缺陷和可能有更好的性能。对于第二种方法,我开始做一些研究,希望很很快提供概念验证方案。

XML 配置文件

现在是提供这个功能的时候了。现存方案中使用扩展点来提供这个特性,但这个应该是核心功能,可以用于在部署时定义如何注册Bean和Annotation的重新载入,所以应该在规范中定义。如果CDI能够在其他规范中越来越适用,那么这个XML定义可能会成为通用的JavaEE配置文件格式的入口点!

如今,一些框架比如ApacheCamel不能使用CDI,就是因为没有配置方案,所以提供这个能力会扩展其应用范围。

异步的消息和事件

现在是Vert.x和Node.js异步处理的时代(我们想象一下:回调)。可以通过使用并发包在规范中进行定义。我们将提供一种方法来支持异步调用,无须用EJB的异步调用和异步事件方式,比如在@Observes加入一个异步boolean项,并且提供可选项来处理回调。

支持@Startup

加入一个简单功能:CDI bean在初始化阶段以后就自动完成实例化步骤,就和EJB目前的差不多。

这是一个开发者普遍的需求,而并不难做到。

可移植扩展API的推广和简化用法

 

 

我认为可移植扩展API是CDI中最好的特性。当然Ioc,事件和上下文管理都是很棒的,但它们在CDI 1.0引入时并不是最新的概念。

而可移植扩展API完全是CDI的创新,加入到JavaEE DNA中,可以无须使用私有方法就可以自然的扩展。

我认为这一点没有被推广很可惜(CDI的缩写中也没有体现其想法),内容还被放置在规范文本的结尾处,无视很多项目和规范可以从中收益。

我的分析是,缺乏和很多复杂概念项目进行沟通,难以深入到扩展的开发之中。我们也许可以提供更高的简化层来容易上手来开发扩展。不要误会我的意思,现有的机制是很好的,可以继续保留(可以增强),但我们可以提供辅助工具来更容易的创建扩展。可以是以下几种:

  1. Deltaspike AnnotationTypeBuilder和BeanBuilder标准化,使得创建Bean更容易。
  2. 针对类型和annotation的自省工具。
  3. 更简单的方法来创建新的和扩展已有的作用域。我们鼓励其他规范重用现有作用域(如@RequestScoped)的生命周期,但只能在实现层上做到这一点。

我们还应该特别注意初始化过程中的第一个事件(直到AfterTypeDiscovery)。因为CDI还不知如何处理类型和Annotation元数据的修改。这些未来会成为JavaEE配置系统的一部分。

事件执行顺序化

在CDI 1.1中,通过@Priority定义了装饰器和拦截器的优先顺序。是否也可以用在订购事件上?在@Observes上使用@Priority不是个好注意,因为这个annotation来自拦截器包中,我们可以在@Observes加入priority字段。

去除对Producer和Custom bean的偏见

为什么Producer和Custom bean是CDI的二等公民?我希望能够用装饰器或者拦截器在Producer上,或者至少有API让我能做到这一点。

扩展事件的作用域范围,从包到整个服务器

让CDI事件总线运行在JavaEE世界中更高的层次上,这样可以选择事件的传播范围,是当前应用程序,当前模块(EAR),当前包,或者是其他所有程序都侦听这个事件。

瞬态注入

当把一个依赖bean注入到一个长期存在的bean时,后者初始化时这个注入完成并只执行一次。如果我想每次访问时都再次初始化并且重新注入时(如我在Agorava项目遇到的情况),现在我必须这样写:

Java代码  收藏代码
  1. @Inject Instance<MyBean> myBeanInstances;
  2. public Mybean getMyBean() { return myBeanInstances.get(); }

我希望以后可以这样写:
@Inject @Transient MyBean myBean;

这是个语法糖,但这种写法易写易读。我们可以试着再找一些例子来说明代码如何简化。

更方便的流式查询语法

实现<T>接口的实例和编程时查询都是很有用的,但有时候非常繁琐,特别是处理Qualifiers时。

这个可以通过提供工具和用Java8的类型注解,来生成qualifier来简化,为什么不采用查询DSL?

Java代码  收藏代码
  1. myBeanInstance.restrictedTo(BeanImp.class).withQualifier(new @MyQualifier(“Binding”) AnnotationLiteral<>(), new @MyOtherQualifier AnnotationLiteral<>()).select();

是不是看上去友好些?

监控设施

还记得Seam2时优秀的调式页面么?我想有同样的东西或者工具来轻松构建相同的功能,监视bean和运行范围。CDI很神奇,可以有很好的工具来查看所有的内容,bean的成本,上下文和我们部署的拦截器。

更好的模块化支持:新的CDI架构(JavaEE)?

 

 


很多JSR都在抱怨CDI规范定义像铁板一块,和它们比较,实现很庞大(他们不想依赖更大的东西)。这个和缺乏JavaSE的引导支持,可能是CDI集成难以走远的2个主要原因。因此,我们可以提供一个更加模块化的方法,收集所有的模块,使用和JavaEE之外一致的代码栈。我理想中的JSR/模块是这样:

容器

容器模块存储所有应用程序用到的bean,作为独立模块提供以下功能:

  • 提供最小化的api/实现给客户端程序,通过JNDI来或者bean
  • 提供机制来加入插件到容器中,以支持新的组件(Servlet, JPA Entity, Guice, Spring bean)
  • 准备成为未来JavaEE的通用容器

事件总线

事件和观察者模式是CDI规范的强大功能,但它们同样可以用于CDI以外。

我们可以设想一个新的规范或者是CDI的一个模块,基于CDI事件总线,来给JavaEE提供更广泛的事件模型。一个API只能依赖半打类(如果我们增加异步处理,排序和事件作用域会更多)来做工作。

组件扫描和扩展引擎

现在每个规范都在启动时进行类的扫描。一般来说应用服务器都是用专有的方式做扫描工作,这个过程可以标准化,通过扫描阶段性事件和元数据的操作,这样我们可以提供一致体验和一个标准的方法来扩展JavaEE。 CDI已经提供了大多数此功能与初始化机制,允许“观察”现有类的部署和修改这些类(即注解)的元数据。

如果ProcessAnnotatedType事件可以在服务器层级被捕获,并允许放置”veto”到给定的servlet或者一组JPA entity上面?这个特性让我们做到单一容器和单一配置文件,这个特性是很多开发者梦寐以求的。

基本的DI

这个模块包括所有的简单注入的API。所以有@Inject, @Qualifier, Instance<>, @Producer, InjectionPoint和反射相关的,就是我说过的”CDI Lite”

上下文管理

Context是CDI一个很好的功能,但大家都不需要,应该放在一个可选的API包中。处理所有正常的范围上下文和复杂的生命周期管理。

拦截器和装饰器

如今拦截器已经有了自己的JSR,加入装饰器到其中会让其更完备。

结论:CDI需要你!

以上是我对CDI的个人愿望清单,你也许有你自己的。我并不知道这些是否是好的想法,是否都可行,我们需要帮助来完成未来的CDI 2.0。 如果你想愿意,敬请关注CDI上的官方网站,@cdispec Twitter帐户(或者我的)和这个博客,并给我们反馈,通过CDI Maillist或CDI的IRC频道(freenode上jsr346)。未来几个月会决定CDI(和Java EE)的未来。

[repost ]CDI Dependency Injection – An Introductory Tutorial Part 1 – Java EE

original:http://java.dzone.com/articles/cdi-di-p1?page=0,5

You can have more than one member of the qualifier annotation as follows:

Code Listing: Transport qualifier annotation with more than one member

01.package org.cdi.advocacy;
02. 
03.import java.lang.annotation.Retention;
04.import java.lang.annotation.Target;
05.import static java.lang.annotation.ElementType.*;
06.import static java.lang.annotation.RetentionPolicy.*;
07. 
08.import javax.inject.Qualifier;
09. 
10. 
11.@Qualifier @Retention(RUNTIME) @Target({TYPE, METHOD, FIELD, PARAMETER})
12.public @interface Transport {
13.TransportType type() default TransportType.STANDARD;
14.int priorityLevel() default -1;
15.}

Now CDI is going to use both of the members to discriminate for injection.

If we had a transport like so:

Code Listing: AutomatedTellerMachineImpl using two qualifier members to discriminate

1.public class AutomatedTellerMachineImpl implements AutomatedTellerMachine {
2. 
3.@Inject @Transport(type=TransportType.STANDARD, priorityLevel=1)
4.private ATMTransport transport;

Then we get this:

Output

1.deposit called
2.communicating with bank via the Super Fast transport

You can match using any type supported by annotations, e.g., Strings, classes, enums, ints, etc.

Exercise: Add a member String to the qualifier annotation. Change the injection point to discriminate using this new string member. Why do you think this is counter to what CDI stands for? Send me your solution on the CDI group mailing list. The first one to send gets put on the CDI wall of fame. (All others get honorable mentions.)

 

Conclusion

 

Dependency Injection (DI) refers to the process of supplying an external dependency to a software component.

CDI is the Java standard for dependency injection and interception (AOP). It is evident from the popularity of DI and AOP that Java needs to address DI and AOP so that it can build other standards on top of it. DI and AOP are the foundation of many Java frameworks. I hope you share my vision of CDI as a basis for other JSRs, Java frameworks and standards.

This article discussed CDI dependency injection in a tutorial format. It covers some of the features of CDI such as type safe annotations configuration, alternatives and more. There was an introduction level and and advacned level.

CDI is a foundational aspect of Java EE 6. It is or will be shortly supported by Caucho’s Resin, IBM’s !WebSphere, Oracle’s Glassfish, Red Hat’s JBoss and many more application servers. CDI is similar to core Spring and Guice frameworks. However CDI is a general purpose framework that can be used outside of JEE 6.

CDI is a rethink on how to do dependency injection and AOP (interception really). It simplifies it. It reduces it. It gets rid of legacy, outdated ideas.

CDI is to Spring and Guice what JPA is to Hibernate, and Toplink. CDI will co-exist with Spring and Guice. There are plugins to make them interoperate nicely. There is more integration option on the way.

This is just a brief taste. There is more to come.

Resources

 

 

About the Author

This article was written with CDI advocacy in mind by Rick Hightower with some collaboration from others. Rick Hightower has worked as a CTO, Director of Development and a Developer for the last 20 years. He has been involved with J2EE since its inception. He worked at an EJB container company in 1999. He has been working with Java since 1996, and writing code professionally since 1990. Rick was an early Spring enthusiast. Rick enjoys bouncing back and forth between C, Python, Groovy and Java development. Although not a fan of EJB 3, Rick is a big fan of the potential of CDI and thinks that EJB 3.1 has come a lot closer to the mark.

Rick Hightower is CTO of Mammatus and is an expert on Java and Cloud Computing. Rick is invovled in Java CDI advocacy and Java EE. CDI Implementations – Resin CandiSeam WeldApache OpenWebBeans

There are 53 code listings in this article

By default, CDI would look for a class that implements the ATMTransport interface, once it finds this it creates an instance and injects this instance of ATMTransport using the setter method setTransport. If we only had one possible instance ofATMTransport in our classpath, we would not need to annotate any of the ATMTransport implementations. Since we have three, namely, StandardAtmTransport, SoapAtmTransport, and JsonAtmTransport, we need to mark two of them as @Alternative‘s and one as @Default.

Step 3: Use the @Default annotation to annotate the StandardAtmTransport

At this stage of the example, we would like our default transport to be StandardAtmTransport; thus, we mark it as @Default as follows:

Code Listing: StandardAtmTransport using @Default

1.package org.cdi.advocacy;
2. 
3.import javax.enterprise.inject.Default;
4. 
5.@Default
6.public class StandardAtmTransport implements ATMTransport {
7....

It should be noted that a class is @Default by default. Thus marking it so is redundant; and not only that its redundant.

Step 4: Use the @Alternative to annotate the SoapAtmTransport, and JsonRestAtmTransport.

If we don’t mark the others as @Alternative, they are by default as far as CDI is concerned, marked as @Default. Let’s markJsonRestAtmTransport and SoapRestAtmTransport @Alternative so CDI does not get confused.

Code Listing: JsonRestAtmTransport using @Alternative

01.package org.cdi.advocacy;
02. 
03.import javax.enterprise.inject.Alternative;
04. 
05.@Alternative
06.public class JsonRestAtmTransport implements ATMTransport {
07. 
08....
09.}

Code Listing: SoapAtmTransport using @Alternative

1.package org.cdi.advocacy;
2. 
3.import javax.enterprise.inject.Alternative;
4. 
5.@Alternative
6.public class SoapAtmTransport implements ATMTransport {
7....
8.}

Step 5: Use the @Named annotation to make the AutomatedTellerMachineImpl easy to look up; give it the name “atm”

Since we are not using AutomatedTellerMachineImpl from a Java EE 6 application, let’s just use the beanContainer to look it up. Let’s give it an easy logical name like “atm”. To give it a name, use the @Named annotation. The @Named annotation is also used by JEE 6 application to make the bean accessible via the Unified EL (EL stands for Expression language and it gets used by JSPs and JSF components).

Here is an example of using @Named to give the AutomatedTellerMachineImpl the name “atm”as follows:

Code Listing: AutomatedTellerMachineImpl using @Named

01.package org.cdi.advocacy;
02. 
03.import java.math.BigDecimal;
04. 
05.import javax.inject.Inject;
06.import javax.inject.Named;
07. 
08.@Named("atm")
09.public class AutomatedTellerMachineImpl implements AutomatedTellerMachine {
10....
11. 
12.}

It should be noted that if you use the @Named annotations and don’t provide a name, then the name is the name of the class with the first letter lower case so this:

1.@Named
2.public class AutomatedTellerMachineImpl implements AutomatedTellerMachine {
3....
4. 
5.}

makes the name automatedTellerMachineImpl.

Step 6: Use the CDI beanContainer to look up the atm, makes some deposits and withdraws.

Lastly we want to look up the atm using the beanContainer and make some deposits.

Code Listing: AtmMain looking up the atm by name

01.package org.cdi.advocacy;
02. 
03....
04. 
05.public class AtmMain {
06. 
07....
08....
09. 
10.public static void main(String[] args) throws Exception {
11.AutomatedTellerMachine atm = (AutomatedTellerMachine) beanContainer
12..getBeanByName("atm");
13. 
14.atm.deposit(new BigDecimal("1.00"));
15. 
16.}
17. 
18.}

When you run it from the command line, you should get the following:

Output

1.deposit called
2.communicating with bank via Standard transport

You can also lookup the AtmMain by type and an optional list of Annotations as the name is really to support the Unified EL (JSPs, JSF, etc.).

Code Listing: AtmMain looking up the atm by type

01.package org.cdi.advocacy;
02. 
03....
04. 
05.public class AtmMain {
06. 
07....
08....
09. 
10.public static void main(String[] args) throws Exception {
11.AutomatedTellerMachine atm = beanContainer.getBeanByType(AutomatedTellerMachine.class);
12.atm.deposit(new BigDecimal("1.00"));
13.}
14. 
15.}

Since a big part of CDI is its type safe injection, looking up things by name is probably discouraged. Notice we have one less cast due to Java Generics.

If you remove the @Default from the StandardATMTransport, you will get the same output. If you remove the @Alternative from both of the other transports, namely, JsonATMTransport, and SoapATMTransport, CDI will croak as follows:

Output

1.Exception in thread "main" java.lang.ExceptionInInitializerError
2.Caused by: javax.enterprise.inject.AmbiguousResolutionException: org.cdi.advocacy.AutomatedTellerMachineImpl.setTransport:
3.Too many beans match, because they all have equal precedence. 
4.See the @Stereotype and <enable> tags to choose a precedence.  Beans:
5.ManagedBeanImpl[JsonRestAtmTransport, {@Default(), @Any()}]
6.ManagedBeanImpl[SoapAtmTransport, {@Default(), @Any()}]
7.ManagedBeanImpl[StandardAtmTransport, {@javax.enterprise.inject.Default(), @Any()}]
8....

CDI expects to find one and only one qualified injection. Later we will discuss how to use an alternative.

 

Using @Inject to inject via constructor args and fields

 

You can inject into fields,constructor arguments and setter methods (or any method really).

Here is an example of field injections:

Code Listing: AutomatedTellerMachineImpl.transport using @Inject to do field injection.

1....
2.public class AutomatedTellerMachineImpl implements AutomatedTellerMachine {
3. 
4.@Inject
5.private ATMTransport transport;

Continue reading… Click on the navigation links below the author bio to read the other pages of this article.

Be sure to check out part II of this series as well: Part 2 plugins and annotation processing !
Code Listing: AutomatedTellerMachineImpl.transport using @Inject to do constructor injection.

1....
2.public class AutomatedTellerMachineImpl implements AutomatedTellerMachine {
3. 
4.@Inject
5.public AutomatedTellerMachineImpl(ATMTransport transport) {
6.this.transport = transport;
7.}

This flexibility allows you to create classes that are easy to unit test.

Using simple @Produces

 

There are times when the creation of an object may be complex. Instead of relying on a constructor, you can delegate to a factory class to create the instance. To do this with CDI, you would use the @Produces from your factory class as follows:

Code Listing: TransportFactory.createTransport using @Produces to define a factory method

01.package org.cdi.advocacy;
02. 
03.import javax.enterprise.inject.Produces;
04. 
05.public class TransportFactory {
06. 
07.@Produces ATMTransport createTransport() {
08.System.out.println("ATMTransport created with producer");
09.return new StandardAtmTransport();
10.}
11. 
12.}

The factory method could use qualifiers just like a class declaration. In this example, we chose not to. TheAutomatedTellerMachineImpl does not need to specify any special qualifiers. Here is the AutomatedTellerMachineImpl that receives the simple producer.

Code Listing: AutomatedTellerMachineImpl receives the simple producer

01.import javax.inject.Inject;
02.import javax.inject.Named;
03. 
04.@Named("atm")
05.public class AutomatedTellerMachineImpl implements AutomatedTellerMachine {
06. 
07.@Inject
08.private ATMTransport transport;
09....

Check your understanding by looking at the output of running this with AtmMain.

Output

1.ATMTransport created with producer
2.deposit called
3.communicating with bank via Standard transport

 

Using @Alternative to select an Alternative

 

Earlier, you may recall, we defined several alternative transports, namely, JsonRestAtmTransport and SoapRestAtmTransport. Imagine that you are an installer of ATM machines and you need to configure certain transports at certain locations. Our previous injection points essentially inject the default which is the StandardRestAtmTransport transport.

If I need to install a different transport, all I need to do is change the /META-INF/beans.xml file to use the right transport as follows:

Code Listing: {classpath}/META-INF/beans.xml

2.xsi:schemaLocation="
3.http://java.sun.com/xml/ns/javaee
4.http://java.sun.com/xml/ns/javaee/beans_1_0.xsd">
5.<alternatives>
6.<class>org.cdi.advocacy.JsonRestAtmTransport</class>
7.</alternatives>
8.</beans>

You can see from the output that the JSON REST transport is selected.

Output

1.deposit called
2.communicating with bank via JSON REST transport

Alternatives codifies and simplifies a very normal case in DI, namely, you have different injected objects based on different builds or environments. The great thing about objects is they can be replaced (Grady Booch said this). Alternatives allow you to mark objects that are replacements for other objects and then activate them when you need them.

If the DI container can have alternatives, let’s mark them as alternatives. Think about it this way. I don’t have to document all of the alternatives as much. It is self documenting. If someone knows CDI and they know about Alternatives they will not be surprised. Alternatives really canonicalizes the way you select an Alternative.

You can think of CDI as a canonicalization of many patterns that we have been using with more general purpose DI frameworks. The simplification and canonicalization is part of the evolution of DI.

 

Code Listing: Using @Qualifier to inject different types

 

All objects and producers in CDI have qualifiers. If you do not assign a qualifier it by default has the qualifier @Default and@Any. It is like a TV crime show in the U.S., if you do not have money for a lawyer, you will be assigned one.

Qualifiers can be used to discriminate exactly what gets injected. You can write custom qualifiers.

Qualifiers work like garanimal tags for kids clothes, you match the qualifier from the injection target and the injection source, then that is the type that will be injected.

If the tags (Qualifiers) match, then you have a match for injection.

You may decide that at times you want to inject Soap or Json or the Standard transport. You don’t want to list them as an alternative. You actually, for example, always want the Json implementation in a certain case.

Here is an example of defining a qualifier for Soap.

Code Listing: Soap runtime qualifier annotation

01.package org.cdi.advocacy;
02. 
03.import java.lang.annotation.Retention;
04.import java.lang.annotation.Target;
05.import static java.lang.annotation.ElementType.*;
06.import static java.lang.annotation.RetentionPolicy.*;
07. 
08.import javax.inject.Qualifier;
09. 
10. 
11.@Qualifier @Retention(RUNTIME) @Target({TYPE, METHOD, FIELD, PARAMETER})
12.public @interface Soap {
13. 
14.}

Notice that a qualifier is just a runtime annotation that is marked with the @Qualifier annotation. The @Qualifier is an annotation that decorates a runtime annoation to make it a qualifier.

Then we would just mark the source of the injection point, namely, SoapAtmTransport with our new @Soap qualifier as follows:

Code Listing: SoapAtmTransport using new @Soap qualifier

01.package org.cdi.advocacy;
02. 
03.@Soap
04.public class SoapAtmTransport implements ATMTransport {
05. 
06.@Override
07.public void communicateWithBank(byte[] datapacket) {
08.System.out.println("communicating with bank via Soap transport");
09.}
10. 
11.}

Next time you are ready to inject a Soap transport we can do that by annotating the argument to the constructor as follows:

Code Listing: AutomatedTellerMachineImpl injecting SoapAtmTransport using new @Soap qualifier via constructor arg

1.public class AutomatedTellerMachineImpl implements AutomatedTellerMachine {
2. 
3.private ATMTransport transport;
4. 
5.@Inject
6.public AutomatedTellerMachineImpl(@Soap ATMTransport transport) {
7.this.transport = transport;
8.}

You could also choose to do this via the setter method for the property as follows:

Code Listing: AutomatedTellerMachineImpl injecting SoapAtmTransport using new @Soap qualifier via setter method arg

1.public class AutomatedTellerMachineImpl implements AutomatedTellerMachine {
2. 
3.private ATMTransport transport;
4. 
5.@Inject
6.public void setTransport(@Soap ATMTransport transport) {
7.this.transport = transport;
8.}

Continue reading… Click on the navigation links below the author bio to read the other pages of this article.

Be sure to check out part II of this series as well: Part 2 plugins and annotation processing !

 

[repost ]CDI(JSR 299)介绍——JavaEE平台的上下文与依赖注入

original:http://www.importnew.com/9597.html

CDI中最令人兴奋的功能是允许每个人在Java EE平台中编写强大的扩展性功能,甚至于改变其核心本身。这些扩展性功能是可以完全移植到任何支持CDI的环境中。本文是对CDI提供的主要功能进行一个概述,并通过一个Web应用示例来介绍框架的运行机制。

目前有三种实现CDI的方式: JBoss Weld(参考实现)、 Caucho CanDIApache OpenWebBeans。已经有几个框架提供了CDI的扩展性功能,如:Apache DeltaSpike、 JBoss Seam 3、和 Apache MyFaces CODI。

一些历史

CDI规范的产生是为了填充后端的EJB和前端视图层的JSF之间的间隙,第一个初稿版本只是针对J2EE,但是在创立规范的过程中很明显的发现大多数的功能是适用于任何Java环境的,包括J2SE。
同时,无论是Guice还是Spring的社区都已经开始努力将 “JSR-330:Java依赖注入作为基本的” (代号 AtInject )作为基本的注入方式。考虑到没有任何新的依赖注入容器实现了实际的注入API,AtInject”和CDI专家组密切合作以确保在依赖注入框架中提出一个通用的解决方案。因此CDI使用了AtInject的注解规范,这就表明每一个CDI的实现都完全符合AtInject的规范,比如Guice和Spring。最终CDI和AtInject都被包含在了J2EE(JSR-316)和几乎所有的J2EE服务器中。

CDI 主要特性

在看示例代码之前,让我们来快速的浏览一遍CDI的一些主要特性

  • 类型安全:CDI根据Java类型来注入对象,用以代替之前的根据名称来注入对象。当类型不能充分唯一判断出注入对象时,我们可以使用@Qualifier注解来指定注入。这可以让编译器更容易发现错误,并提供更便捷的重构
  • POJO:几乎每一个Java对象都可以使用CDI来注入!包括EJB和JNDI的资源、持久化对象和已经实现过的工厂方法的任何对象。
  • 可扩展性:每一个CDI容器,我们都能很方便的对其增加扩展性功能,增加的扩展性功能可以运行在每一个CDI容器和无论哪个厂商的J2EE 6 服务器中,这一特性通过精心设计的SPI(服务供应接口)得以实现,并成为JSR-299规范的一部分。
  • 拦截器:它可以很容易的实现你自己的拦截器。因为JSR-299提供了很便捷的方式,他们现在也可以运行在每一个CDI容器和 J2EE 6 服务器中。这是通过指定一个实现了一部分JSR-299的SPI(服务提供接口)来实现的。
  • 可修饰性:它允许动态的扩展已存在的接口实现和代码切面。
  • 事件:CDI指定了一个低耦合的类型安全机制来发送和接受事件。
  • 集成EL表达式:EL 2.2 具有很强的功能和高度的灵活性,CDI提供了对它的插件式支持。

CDI代码示例

让我们一起来看看一个简单的web程序,这个程序可以让你使用一个Web表单发送邮件–特别简单。我们只提供了代码片段,但是应该足够展示CDI的使用重点了。

对于我们的发邮件的程序,我们需要一个具有“application-scoped”标示的MailService对象,这个对象应该是单例模式的(为了保证容器每次获取和注入进来的都是同一个对象)。replyTo的值由 同样具有“application-scoped”标示的ConfigurationService提供。接下来我们可以看到第一次注入, configuration 字段是不能由程序代码设置的,而是由CDI注入的。@Inject注解表示CDI会负责对它的注入。

代码1

1
2
3
4
5
6
7
8
9
@ApplicationScoped
public class MyMailService implements MailService {
 private @Inject ConfigurationService configuration;
 public send(String from, String to, String body) {
   String replyTo = configuration.getReplyToAddress();
   ... // send the email
 }
}

我们的程序可以识别当前用户(注意:他不会尝试进行任何的身份校验)以及当前用户HTTP session的作用域。CDI提供了对Session作用域的支持,在同一个作用域下获取到的是同一个实例对象(在Web应用中)。

默认情况下,CDI对象是不能在JSF表达式中使用的。为了兼容JSF和EL表达式的使用,我们增加了@Named注解:

代码2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@SessionScoped
@Named
public class User {
 public String getName() {..}
 ..
}
//web页面已经使用JSF实现. 我们建议你使用一个Controller类:
@RequestScoped
@Named
public class Mail {
 private @Inject MailService mailService;
 private @Inject User user;
 private String text; // + getter and setter
 private String recipient; // + getter and setter
 public String sendMail() {
   mailService.send(user.getName(), recipient, text);
   return "messageSent"; // forward to 'message sent' JSF2 page
 }
}

目前为止唯一缺少的就是JSF页面代码了,如下:

sendMail.xhtml

1
2
3
4
5
6
7
8
9
10
<h:form>
 <h:outputLabel value="Username" for="username"/>
 <h:outputText id="username" value="#{user.name}"/>
 <h:outputLabel value="Recipient" for="recipient"/>
 <h:inputText id="recipient" value="#{mail.recipient}"/>
 <h:outputLabel value="Body" for="body"/>
 <h:inputText id="body" value="#{mail.body}"/>
 <h:commandButton value="Send" action="#{mail.send}"/>
</h:form>

现在我们已经有一个可以正常运行的程序了,让我们来继续探索我们会使用到的CDI功能吧。


原文链接: jaxenter 翻译: ImportNew.com - 胡 劲寒
译文链接: http://www.importnew.com/9597.html

[repost ]Running Batch jobs in J2SE applications

original:http://www.mastertheboss.com/javaee/batch-api/running-batch-jobs-in-j2se-applications

This tutorial shows how you can run the Java Batch API (JSR 352) as part of a J2SE application.
The Java Batch API (JSR 352) allows executing Batch activities based on a Job Specification Language (JSL) using two main Programming models: Chunk Steps or Batchlets. I’ve already blogged some examples of Chunk steps  and BatchLets  which are designed to run on WildFly application server.
The implementation of Java Batch API (JSR 352) is provided by a project named JBeret which allows also executing Batch activities as part of Java standard edition applications. In this tutorial we will see a basic example of how to run a Batchlet from within a J2SE application.

Defining the Batch Job

The first step is defining the job via the Job Specification Language (JSL). Let’s create this file named simplebatchlet.xml in the foldersrc\main\resources\META-INF\batch-jobs of a Maven project:

1
2
3
4
5
6
7
8
9
10
11
<job id="simplebatchlet" xmlns="http://xmlns.jcp.org/xml/ns/javaee"
    version="1.0">
    <step id="step1">
        <properties>
            <property name="file" value="/home/jboss/log.txt" />
            <property name="destination" value="/var/opt/log.txt" />
        </properties>
        <batchlet ref="sampleBatchlet" />
    </step>
</job>

In this simple JSL file we are executing a Batchlet named “sampleBatchlet” as part of “step1″ which takes also two properties. We will use these properties to copy a file from a source to a destination.

Defining the Batchlet

Here is the Batchlet which is a CDI @Named Bean that collects the properties from the StepContext and uses the Java 7 Files API to copy files from the source to the destination:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package com.mastertheboss.jberet;
import javax.batch.api.AbstractBatchlet;
import javax.inject.Inject;
import javax.batch.runtime.context.*;
import javax.inject.Named;
import java.io.*;
import java.nio.file.Files;
 
@Named
public class SampleBatchlet extends AbstractBatchlet {
    @Inject StepContext stepContext;
 
    @Override
    public String process() {
         String source = stepContext.getProperties().getProperty("source");
         String destination = stepContext.getProperties().getProperty("destination");
    try {
         Files.copy(new File(source).toPath(), new File(destination).toPath());
         System.out.println("File copied!");
         return "COMPLETED";
     } catch (IOException e) {
       e.printStackTrace();
     }
      return "FAILED";
    }
}

You can trigger the execution of your job with simple main Java class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
package com.mastertheboss.jberet;
import javax.batch.operations.JobOperator;
import javax.batch.operations.JobSecurityException;
import javax.batch.operations.JobStartException;
import javax.batch.runtime.BatchRuntime;
public class Main {
    public static void main(String[] args) {
        try {
            JobOperator jo = BatchRuntime.getJobOperator();
            long id = jo.start("simplebatchlet", null);
            System.out.println("Batchlet submitted: " + id);
            Thread.sleep(5000);
        } catch (Exception ex) {
            System.out.println("Error submitting Job! " + ex.getMessage());
            ex.printStackTrace();
        }
    }
}

Compiling the project

In order to run our project, we need to include in our pom.xml a set of dependencies which include javax.batch Batch API, JBeret core and its related dependencies, Weld container API and its related dependencies:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 <a href="http://maven.apache.org/maven-v4_0_0.xsd">http://maven.apache.org/maven-v4_0_0.xsd</a>">
   <modelVersion>4.0.0</modelVersion>
   <groupId>com.mastertheboss.jberet</groupId>
   <artifactId>batch-job</artifactId>
   <packaging>jar</packaging>
   <version>1.0-SNAPSHOT</version>
   <name>batch-job</name>
   <url><a href="http://maven.apache.org</url>">http://maven.apache.org</url></a>
   <repositories>
      <repository>
         <id>jboss-public-repository-group</id>
         <name>JBoss Public Repository Group</name>
         <url><a href="http://repository.jboss.org/nexus/content/groups/public/</url>">http://repository.jboss.org/nexus/content/groups/public/</url></a>
      </repository>
   </repositories>
   <dependencies>
      <dependency>
         <groupId>org.jboss.spec.javax.batch</groupId>
         <artifactId>jboss-batch-api_1.0_spec</artifactId>
         <version>1.0.0.Final</version>
      </dependency>
      <dependency>
         <groupId>org.jberet</groupId>
         <artifactId>jberet-core</artifactId>
         <version>1.0.2.Final</version>
 
      </dependency>
      <dependency>
         <groupId>org.jberet</groupId>
         <artifactId>jberet-support</artifactId>
         <version>1.0.2.Final</version>
 
      </dependency>
 
      <dependency>
         <groupId>org.jboss.spec.javax.transaction</groupId>
         <artifactId>jboss-transaction-api_1.2_spec</artifactId>
         <version>1.0.0.Final</version>
      </dependency>
      <dependency>
         <groupId>org.jboss.marshalling</groupId>
         <artifactId>jboss-marshalling</artifactId>
         <version>1.4.2.Final</version>
      </dependency>
      <dependency>
         <groupId>org.jboss.weld</groupId>
         <artifactId>weld-core</artifactId>
         <version>2.1.1.Final</version>
      </dependency>
      <dependency>
         <groupId>org.jboss.weld.se</groupId>
         <artifactId>weld-se</artifactId>
         <version>2.1.1.Final</version>
      </dependency>
      <dependency>
         <groupId>org.jberet</groupId>
         <artifactId>jberet-se</artifactId>
         <version>1.0.2.Final</version>
      </dependency>
   </dependencies>
   <build>
      <plugins>
         <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>1.2.1</version>
            <executions>
               <execution>
                  <goals>
                     <goal>java</goal>
                  </goals>
               </execution>
            </executions>
            <configuration>
               <mainClass>com.mastertheboss.jberet.Main</mainClass>
            </configuration>
         </plugin>
      </plugins>
   </build>
</project>

Optionally you can include also some other dependencies in case you need an XML processor, or the streaming JSON processor:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<dependency>
    <groupId>com.fasterxml</groupId>
    <artifactId>aalto-xml</artifactId>
     <version>0.9.9</version>
</dependency>
<dependency>
    <groupId>org.codehaus.woodstox</groupId>
    <artifactId>stax2-api</artifactId>
    <version>3.1.4</version>
</dependency>
        
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.4.1</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.dataformat</groupId>
    <artifactId>jackson-dataformat-xml</artifactId>
    <version>2.4.1</version>
</dependency>

You can execute your application with:

1
mvn clean install exec:java

After some INFO messages you should check on the Console that:
Batchlet submitted: 1
File Copied!

Configuring JBeret engine

When using Batch Jobs within WildFly container you can configure Jobs persistence and thread pools via the batch subsystem. When running as standalone application you can do it via a file named jberet.properties which has to be placed in src\main\resources of your Maven project.
Here follows a sample jberet.properties file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Optional, valid values are jdbc (default), mongodb and in-memory
job-repository-type = jdbc
# Optional, default is jdbc:h2:~/jberet-repo for h2 database as the default job repository DBMS.
# For h2 in-memory database, db-url = jdbc:h2:mem:test;DB_CLOSE_DELAY=-1
# For mongodb, db-url includes all the parameters for MongoClientURI, including hosts, ports, username, password,
# Use the target directory to store the DB
db-url = jdbc:h2:./target/jberet-repo
db-user =sa
db-password =sa
db-properties =
# Configured: java.util.concurrent.ThreadPoolExecutor is created with thread-related properties as parameters.
thread-pool-type =
# New tasks are serviced first by creating core threads.
# Required for Configured type.
thread-pool-core-size =
# If all core threads are busy, new tasks are queued.
# int number indicating the size of the work queue. If 0 or negative, a java.util.concurrent.SynchronousQueue is used.
# Required for Configured type.
thread-pool-queue-capacity =
# If queue is full, additional non-core threads are created to service new tasks.
# int indicating the maximum size of the thread pool.
# Required for Configured type.
thread-pool-max-size =
# long number indicating the number of seconds a thread can stay idle.
# Required for Configured type.
thread-pool-keep-alive-time =
# Optional, valid values are true and false, defaults to false.
thread-pool-allow-core-thread-timeout =
# Optional, valid values are true and false, defaults to false.
thread-pool-prestart-all-core-threads =
# Optional, fully-qualified name of a class that implements java.util.concurrent.ThreadFactory.
# This property should not be needed in most cases.
thread-factory =
# Optional, fully-qualified name of a class that implements java.util.concurrent.RejectedExecutionHandler.
# This property should not be needed in most cases.
thread-pool-rejection-policy =

As you can see, this file largely relies on defaults for many variables like the thread pool. We have anyway applied a change in the job-repository-type to persist jobs on a DB (H2 DB). In this case we will need adding the JDBC Driver API to our Maven project as follows:

1
2
3
4
5
      <dependency>
         <groupId>com.h2database</groupId>
         <artifactId>h2</artifactId>
         <version>1.4.178</version>
      </dependency>

That’s all! enjoy Batch API using jBeret!

Acknowledgments: I’d like to express my gratitude to Cheng Fang (JBeret project Lead) for providing useful insights for writing this article