original:http://www.wise2012.cs.ucy.ac.cy/challenge.html
WISE 2012 Challenge
1. Introduction
WISE 2012 Challenge is based on a dataset collected from one of the most popular micro-blog service (http://weibo.com). The challenge has two tracks: 1) the performance track, and 2) the mining track. Attendees may attend one or both tracks. Selected reports will be published in conference proceedings after review.
Important dates:
- Attendance registration deadline: 11th May 2012
- Result/report submission deadline: 18th May 2012
- Winners notified: 13th July 2012
- Report camera-ready due: 27 July 2012
2. Submission guideline
Attendees may attend one or both tracks. Two separate submissions should be sent if both tracks are attended. Each submission should contain two parts:
1) Results:
Results should be submitted to wise2012challenge@gmail.com by 18th May, 2012 following the specification provided in task description.
2) Report:
Report should be submitted via the WISE 2012 submission system. Attendees should register their submission by 11th May, 2012, and submit the final report by 18th May, 2012. The report should follow the WISE 2012 research paper format requirements. Details of how the attendees finish the challenge tasks should be introduced in the report, while the results should be summarized.
3. The dataset
The original data was crawled from Sina Weibo (http://weibo.com), a popular micro-blogging service in China, via the API provided. The dataset distributed in WISE 2012 Challenge is preprocessed as follows:
1) User IDs and message IDs are anonymized.
2) Content of tweets are removed, based on Sina Weibo’s Terms of Services.
3) Some tweets are annotated with events. For each event, the terms that are used to identify the event and a link to Wikipedia (http://wikipedia.org) page containing descriptions to the event are given. The information of events are given in the file events.txt.
The dataset that to be used in both tracks contains two sets of files:
1) Tweets: It includes basic information about tweets (time, user ID, message ID etc.), mentions (user IDs appearing in tweets), re-tweet paths, and whether containing links.
2) Followship network: It includes the following network of users (based on user IDs).
In addition, a small testing dataset that should be used in the mining track is provided. It contains one file, which shares the same format of the tweets file introduced above. A small part of re-tweeting activities of thirty-three tweets of six events are given in the testing file.
It should be noted that the dataset is not complete, yet is only a sample of the whole data in the micro-blogging service.
The details of dataset format are given in Appendix 1: Data format.
4. The performance track (T1)
Attendees are required to build a system for evaluating queries over the dataset. Nineteen typical queries should be covered and corresponding interfaces in BSMA performance testing tool should be implemented. The target is to achieve low response time and high throughput reported by BSMA performance testing tool.
Result submission specification:
1) Results should be submitted via email to wise2012challenge@gmail.com
2) Email title should be: [T1] xxx Part:y/z. In which ‘xxx’ denotes the paper id assigned by the paper submission system in registration, ‘z’ is the total number of emails for submission, while ‘y’ denotes the sequential number of the email in the submission.
3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB.
4) The attachment should be in tar.gz or zip format.
5) The attachment should contain all 1344 result files generated by the performance testing tool, without any modifications (including the file names), in the root directory of the compressed package.
The typical queries are introduced in Appendix 2: T1: Queries.
The BSMA performance testing tool manual is given in Appendix 3: T1: BSMA performance testing tool manual.
5. The mining track (T2)
In T2, it is required to predict the re-tweeting activities of thirty-three tweets of six events. For each of these six events, only tweets (and re-tweets) before a given timestamp are given in the file of Tweets. Thirty-three tweets are given in the file of Tests. For each of them, the event that it belongs to is given. As in Tweets, only information of re-tweeting before the timestamp is given. Attendees are required to predict two measurements at the time that the original tweet is published 30 days. These two measurements are:
1) M1: The number of times that the original tweet is re-tweeted. If a user re-tweet (or called re-post, or forward) a tweet twice at different timestamps, it should be counted two times.
2) M2: The number of times of possible-view of the original tweet. The number of possible-view of one re-tweet activity is defined as the number of followers of the user who conduct the re-tweet action. The number of times of possible-view of a tweet is defined as the sum of all possible-view numbers of re-tweet actions.
It should be noted that all re-tweeting actions in a re-tweeting chain should be counted in the root of the chain.
Result submission specification:
1) Results should be submitted via email to wise2012challenge@gmail.com
2) Email title should be: [T2] xxx Part:y/z. In which ‘xxx’ denotes the paper id assigned by the paper submission system in registration, ‘z’ is the total number of emails for submission, while ‘y’ denotes the sequential number of the email in the submission.
3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB.
4) The attachment should be in plain text format with thirty-three rows, in which each row contains three fields: the message ID of the original tweet, the predicted M1 value, and the predicted M2 value.
6. Downloads:
Appendix 1: Data format
A1.txt http://d.yun.io/SL3nsw
Appendix 2: T1: Queries
A2.pdf http://d.yun.io/oOf4Jv
Appendix 3: T1: BSMA performance testing tool manual
A3.pdf http://d.yun.io/_hEgMF
Tweets: in seven compressed files (Please note that these files are quite large, and may take quite a long time to download.):
| file name | size (bytes) | md5 checksum | download link |
| finalmicroblogs.z01 | 2147483648 | 5CB2A0FFB857CD5A5F6AFBDA63EFE496 | http://d.yun.io/DEmikA |
| finalmicroblogs.z02 | 2147483648 | B529EC8B46A2BC18ABAB5C2791A65631 | http://d.yun.io/4lSHNp |
| finalmicroblogs.z03 | 2147483648 | B6CEBD96A0F61C691DDB1CFFCC37F37E | http://d.yun.io/tOLTNn |
| finalmicroblogs.z04 | 2147483648 | 5BA1D9CEB36402F8A95BD6E04BE18185 | http://d.yun.io/v!Fmmu |
| finalmicroblogs.z05 | 2147483648 | 9DEDB0CFD01D81B972966FDC765A962A | http://d.yun.io/rOJlaC |
| finalmicroblogs.z06 | 2147483648 | 9E210BDD83AC18911016D95E178391FE | http://d.yun.io/NBsmku |
| finalmicroblogs.zip | 803261976 | 203C4765A7F390DE57888EE1C76E69B7 | http://d.yun.io/!XJZvo |
Followships (Please note that the file is quite large, and may take quite a long time to download):
| file name | size (bytes) | md5 checksum | download link |
| socialnetwork.zip | 3433604665 | 0EFA7F06628DF275F347570FD17BD131 | http://d.yun.io/8B4fYB |
Events:
events.txt http://d.yun.io/FeuSlE
Testing:
| file name | size (bytes) | md5 checksum | download link |
| eventForTest.zip | 963244 | 6710C666B07DCFA04967ECB419D4F3A6 | http://d.yun.io/17xCCy |
BSMA performance testing tool
BSMA.zip http://d.yun.io/1JNFWn
7. Contact
You must log in to post a comment.