MOTIVATION
The overarching goal of academic data mining is to deepen our comprehension of the development, nature, and trends of science. It offers the potential to unlock enormous scientific, technological, and educational value. For example, deep mining from academic data can assist governments in making scientific policies, support companies in talent discovery, and help researchers acquire new knowledge more efficiently.
The landscape of academic data mining is rich with entity-centric applications, such as paper retrieval, expert finding, and venue recommendation. However, community efforts to advance academic graph mining have been severely limited by the lack of a suitable public benchmark. For KDD Cup 2024, we present Open Academic Graph Challenge (OAG-Challenge), a collection of three realistic and challenging datasets for advancing the state-of-the-art in academic graph mining.
Overview of OAG-Challenge
OAG-Challenge currently includes three tasks, each of which is designed to evaluate a specific aspect of academic graph mining. For the design principle of OAG-Challenge, we aim to include representative tasks that cover the life cycle of academic graph mining. Firstly, we identify valuable and challenging tasks in the construction process of academic graphs, such as author name disambiguation (AND). Then, powered by the academic graph, academic applications explore tasks beyond the academic graph itself and study knowledge acquisition and cognitive impact, such as academic question answering (AQA) and paper source tracing (PST).
WhoIsWho-IND: Given the paper assignments of each author and paper metadata, the goal is to detect paper assignment errors for each author.
AQA: Given professional questions and a pool of candidate papers, the objective is to retrieve the most relevant papers to answer these questions.
PST: Given the full texts of each paper, the goal is to automatically trace the most significant references that have inspired a given paper.
UPDATES
▪ April 29th, 2024. The slides of official competition analyses are available at [Download Link].

▪ April 16th, 2024. Baseline codes for three tracks are available! [WhoIsWho-IND Codes] [AQA Codes] [PST Codes]

▪ April 3rd, 2024: The GLM-4 API token recharge progress can be queried at [here]. Should you encounter any issues, feel free to address your concerns on the Discussion Board provided on the respective competition's website.

▪ March 28th, 2024: GLM-4 API tokens are distributed every Thursday (no later than 23:59 AOE Time). We kindly encourage you to submit your application as early as possible to ensure a smooth process. To recharge successfully, please sign up at https://open.bigmodel.cn/ first.

▪ March 20th, 2024: OAG-Challenge at KDD Cup 2024 started!
TIMELINE
March 20th, 2024: Start of KDD Cup 2024
May 31st, 2024: Team Merge Deadline
May 31st, 2024: Release test data. All participants have 7 days to submit their results.
June 7th, 2024: All tracks end.
June 14th, 2024: Announcement of the KDD Cup winner.
RULES
Code and Report Submissions
For the winning solutions of the final leaderboard, we require public code submission through the GitHub repo. The repo should contain
▪ All the code to reproduce your results (including data pre-processing and model training/inference) and save the test submission.
▪ README.md that contains all the instructions to run the code (from data pre-processing to model inference on test data).
In addition, we require a short technical report that describes your approach. The link can be either Arxiv or PDF uploaded to your GitHub repository.
Use of Large Language Models (LLMs) and API
For all tracks, pre-trained models that have been open-sourced before the end of the competition are allowed to be used.
WhoIsWho and IND allow the use of APIs. After a valid submission to the validation set, participating teams can obtain a free quota of 1 million tokens for the GLM-4 API [How To Get].
Since AQA dataset was collected from QA platforms, AQA task doesn't allow the use of APIs.
AWARDS
Awards are allocated $10,000 for each track.

▪ Gold Medal (1st Place): $3,000
▪ Silver Medal (2nd Place): $2,000
▪ Bronze Medal (3rd Place): $1,000
▪ Honorable Prizes (4th – 11th Place): $500, each team.
ORGANIZERS
Tsinghua University, Knowledge Engineering Group (KEG) and Zhipu AI
OAG-Challenge Team:
Fanjin Zhang (Tsinghua University), Shijie Shi (Zhipu AI), Kun Cao (Zhipu AI), Bo Chen (Tsinghua University)
Steering Committee (in alphabetical order):
Yuxiao Dong, Cho-Jui Hsieh, Jie Tang, Steffen Staab, Yizhou Sun
Reference
Details about our datasets and initial baseline analysis are described in our OAG-Bench paper. If you use OAG-Challenge in your work, please cite our paper. (Bibtex below)

@article{zhang2024oag,
    title={OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining},
    author={Fanjin Zhang and Shijie Shi and Yifan Zhu and Bo Chen and Yukuo Cen and Jifan Yu and Yelin Chen and Lulu Wang and Qingfei Zhao and Yuqing Cheng and Tianyi Han and Yuwei An and Dan Zhang and Weng Lam Tam and Kun Cao and Yunhe Pang and Xinyu Guan and Huihui Yuan and Jian Song and Xiaoyan Li and Yuxiao Dong and Jie Tang},
    journal={arXiv preprint arXiv:2402.15810},
     year={2024}
  }