CAAI & Bytedance Inc. $20,000 2618 Team2835 participants
Byte Cup 2018 International Machine Learning Contest
2018-08-17 - Launch
2018-12-06 - Team Merger Deadline
2018-12-06 - Close
ICON Home     Competitions    


Update 2018/12/05


The test dataset phase has started. Please download the test dataset and submit at

We have three clarifications:

1)  the article no. 367 doesn't have any content, you can leave it empty or add in any content, this piece of data won't be calculated towards the final score;

2) the previous evaluation had some problems, we will re-calculate scores for all previous submission on Dec. 5.

3) we have correct the evaluation metric, and extend the final deadline for 24 hours (now at 23:59 Dec. 6, UTC time).




Byte Cup 2018 International Machine Learning Contest is a global machine learning competition designed to promote academic research with its applications in machine learning. In 2016, the Byte Cup was held for the first time, attracting more than 1,000 teams from all over the world to participate in the task of “seeking potential answer contributors for Toutiao's expert-users”.


This year, Bytedance Inc., with the Chinese Association of Artificial Intelligence and IEEE China, will organize Byte Cup 2018. From this August, participants will face a more challenging machine learning problem.


The topic of Byte Cup 2018 is to automatically generate text captions (or titles). Since the birth of the Internet, the amount of textual information generated and consumed by human users has increased on a tremendous level. The mobile Internet allows everyone to receive and create the latest information anytime, anywhere. The overload of content information makes machine summarization very important. First, it can help to a quick and easy browsing. Secondly, according to the data of Bytedance's products, the reading counts of articles follow the power law: a large amount of content is read by very few people. If this part of the content can be automatically generated by machines, the cost can be greatly reduced. As the result, automatic summarization and title generation are also important research topics in the field of natural language processing.





TopBuzz is an all-in-one content discovery and recommendation mobile platform powered by machine learning algorithms. At TopBuzz, users can discover and enjoy a wide range of trending videos, articles, GIFs and more.


All Participants can use and only use the data from the organizers to build models and generate titles for articles in the training dataset. During the 3-month competition phase, participants can freely submit the result file and receive a score on the validation set. After that participants will test their models on the final test dataset for 5 days. The best score on the test set will determine the final leaderboard.


All data from training, validation and test set are from TopBuzz, a Bytedance’s product, and other open sources. Multiple titles for all articles in both validation and test set have been labeled by human editors.


Discussion Board


All participants can discuss the related topics in the discussion board[link], or can send email to We also have a wechat group for this competition, please add wechat id: shujujingsai to apply to join the group, with the real name and oragnizations.



Byte Cup 2018 International Machine Learning Contest


2835 participants


Final Submissions