Discuss New Concept,New Technic,New Tools, Including EAI,BPM,SOA,Tibco,IBM MQ,Tuxedo, Cloud,Hadoop,NoSQL,J2EE,Ruby,Scala,Python, Performance,Scalability,Distributed,HA, Social Network,Machine Learning.
Mar 102012
 
 [event ]WISE 2012 Challenge  March 10, 2012  Posted by on March 10, 2012 at 10:07 pm Arch-Webapp, Event, IR Tagged with: , , , ,  Add comments

original:http://www.wise2012.cs.ucy.ac.cy/challenge.html

WISE 2012 Challenge

1. Introduction

WISE 2012 Challenge is based on a dataset collected from one of the most popular micro-blog service (http://weibo.com). The challenge has two tracks: 1) the performance track, and 2) the mining track. Attendees may attend one or both tracks. Selected reports will be published in conference proceedings after review.

 

Important dates:

  • Attendance registration deadline: 11th May 2012
  • Result/report submission deadline: 18th May 2012
  • Winners notified: 13th July 2012
  • Report camera-ready due: 27 July 2012

2. Submission guideline

Attendees may attend one or both tracks. Two separate submissions should be sent if both tracks are attended. Each submission should contain two parts:

1) Results:

Results should be submitted to wise2012challenge@gmail.com by 18th May, 2012 following the specification provided in task description.

2) Report:

Report should be submitted via the WISE 2012 submission system. Attendees should register their submission by 11th May, 2012, and submit the final report by 18th May, 2012. The report should follow the WISE 2012 research paper format requirements. Details of how the attendees finish the challenge tasks should be introduced in the report, while the results should be summarized.

3. The dataset

The original data was crawled from Sina Weibo (http://weibo.com), a popular micro-blogging service in China, via the API provided. The dataset distributed in WISE 2012 Challenge is preprocessed as follows:

1) User IDs and message IDs are anonymized.

2) Content of tweets are removed, based on Sina Weibo’s Terms of Services.

3) Some tweets are annotated with events. For each event, the terms that are used to identify the event and a link to Wikipedia (http://wikipedia.org) page containing descriptions to the event are given. The information of events are given in the file events.txt.

 

The dataset that to be used in both tracks contains two sets of files:

1) Tweets: It includes basic information about tweets (time, user ID, message ID etc.), mentions (user IDs appearing in tweets), re-tweet paths, and whether containing links.

2) Followship network: It includes the following network of users (based on user IDs).

In addition, a small testing dataset that should be used in the mining track is provided. It contains one file, which shares the same format of the tweets file introduced above. A small part of re-tweeting activities of thirty-three tweets of six events are given in the testing file.

It should be noted that the dataset is not complete, yet is only a sample of the whole data in the micro-blogging service.

The details of dataset format are given in Appendix 1: Data format.

4. The performance track (T1)

Attendees are required to build a system for evaluating queries over the dataset. Nineteen typical queries should be covered and corresponding interfaces in BSMA performance testing tool should be implemented. The target is to achieve low response time and high throughput reported by BSMA performance testing tool.

Result submission specification:

1) Results should be submitted via email to wise2012challenge@gmail.com

2) Email title should be: [T1] xxx Part:y/z. In which ‘xxx’ denotes the paper id assigned by the paper submission system in registration, ‘z’ is the total number of emails for submission, while ‘y’ denotes the sequential number of the email in the submission.

3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB.

4) The attachment should be in tar.gz or zip format.

5) The attachment should contain all 1344 result files generated by the performance testing tool, without any modifications (including the file names), in the root directory of the compressed package.

The typical queries are introduced in Appendix 2: T1: Queries.

The BSMA performance testing tool manual is given in Appendix 3: T1: BSMA performance testing tool manual.

5. The mining track (T2)

In T2, it is required to predict the re-tweeting activities of thirty-three tweets of six events. For each of these six events, only tweets (and re-tweets) before a given timestamp are given in the file of Tweets. Thirty-three tweets are given in the file of Tests. For each of them, the event that it belongs to is given. As in Tweets, only information of re-tweeting before the timestamp is given. Attendees are required to predict two measurements at the time that the original tweet is published 30 days. These two measurements are:

1) M1: The number of times that the original tweet is re-tweeted. If a user re-tweet (or called re-post, or forward) a tweet twice at different timestamps, it should be counted two times.

2) M2: The number of times of possible-view of the original tweet. The number of possible-view of one re-tweet activity is defined as the number of followers of the user who conduct the re-tweet action. The number of times of possible-view of a tweet is defined as the sum of all possible-view numbers of re-tweet actions.

It should be noted that all re-tweeting actions in a re-tweeting chain should be counted in the root of the chain.

Result submission specification:

1) Results should be submitted via email to wise2012challenge@gmail.com

2) Email title should be: [T2] xxx Part:y/z. In which ‘xxx’ denotes the paper id assigned by the paper submission system in registration, ‘z’ is the total number of emails for submission, while ‘y’ denotes the sequential number of the email in the submission.

3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB.

4) The attachment should be in plain text format with thirty-three rows, in which each row contains three fields: the message ID of the original tweet, the predicted M1 value, and the predicted M2 value.

6. Downloads:

Appendix 1: Data format
A1.txt http://d.yun.io/SL3nsw

Appendix 2: T1: Queries
A2.pdf http://d.yun.io/oOf4Jv

Appendix 3: T1: BSMA performance testing tool manual
A3.pdf http://d.yun.io/_hEgMF

Tweets: in seven compressed files (Please note that these files are quite large, and may take quite a long time to download.):

file name size (bytes) md5 checksum download link
finalmicroblogs.z01 2147483648 5CB2A0FFB857CD5A5F6AFBDA63EFE496 http://d.yun.io/DEmikA
finalmicroblogs.z02 2147483648 B529EC8B46A2BC18ABAB5C2791A65631 http://d.yun.io/4lSHNp
finalmicroblogs.z03 2147483648 B6CEBD96A0F61C691DDB1CFFCC37F37E http://d.yun.io/tOLTNn
finalmicroblogs.z04 2147483648 5BA1D9CEB36402F8A95BD6E04BE18185 http://d.yun.io/v!Fmmu
finalmicroblogs.z05 2147483648 9DEDB0CFD01D81B972966FDC765A962A http://d.yun.io/rOJlaC
finalmicroblogs.z06 2147483648 9E210BDD83AC18911016D95E178391FE http://d.yun.io/NBsmku
finalmicroblogs.zip 803261976 203C4765A7F390DE57888EE1C76E69B7 http://d.yun.io/!XJZvo

Followships (Please note that the file is quite large, and may take quite a long time to download):

file name size (bytes) md5 checksum download link
socialnetwork.zip 3433604665 0EFA7F06628DF275F347570FD17BD131 http://d.yun.io/8B4fYB

Events:

events.txt http://d.yun.io/FeuSlE

Testing:

file name size (bytes) md5 checksum download link
eventForTest.zip 963244 6710C666B07DCFA04967ECB419D4F3A6 http://d.yun.io/17xCCy

BSMA performance testing tool

BSMA.zip http://d.yun.io/1JNFWn

7. Contact

wise2012challenge@gmail.com

newitfarmer

You must log in to post a comment.