The overarching goal of academic data mining is to deepen our comprehension of the development, nature, and trends of science. It offers the potential to unlock enormous scientific, technological, and educational value. For example, deep mining from academic data can assist governments in making scientific policies, support companies in talent discovery, and help researchers acquire new knowledge more efficiently.
The landscape of academic data mining is rich with entity-centric applications, such as paper retrieval, expert finding, and venue recommendation. However, community efforts to advance academic graph mining have been severely limited by the lack of a suitable public benchmark. For KDD Cup 2024, we present Open Academic Graph Challenge (OAG-Challenge), a collection of three realistic and challenging datasets for advancing the state-of-the-art in academic graph mining.
Overview of OAG-Challenge
OAG-Challenge currently includes three tasks, each of which is designed to evaluate a specific aspect of academic graph mining. For the design principle of OAG-Challenge, we aim to include representative tasks that cover the life cycle of academic graph mining. Firstly, we identify valuable and challenging tasks in the construction process of academic graphs, such as author name disambiguation (AND). Then, powered by the academic graph, academic applications explore tasks beyond the academic graph itself and study knowledge acquisition and cognitive impact, such as academic question answering (AQA) and paper source tracing (PST).
WhoIsWho-IND: Given the paper assignments of each author and paper metadata, the goal is to detect paper assignment errors for each author.
AQA: Given professional questions and a pool of candidate papers, the objective is to retrieve the most relevant papers to answer these questions.
PST: Given the full texts of each paper, the goal is to automatically trace the most significant references that have inspired a given paper.
Submission Guidelines
The objective of this workshop is to discuss the winning solutions of OAG-Challenge at KDD Cup 2024. This submission is single-blind (author names and affiliations should be listed). All participants listed in the Top-11 leaderboard will have a guaranteed opportunity for an in-person oral or poster presentation. Other submissions will be evaluated by a committee based on their novelty and insights.
Important Dates:
Full Paper Submission Deadline: The deadline for the submissions is July 20, 2024 (Anywhere on Earth time).
Notification of Acceptance: August 1, 2024.

Please note that the KDD Cup workshop will not have formal proceedings. Authors retain full rights to submit or publish their papers at other venues.

Submission Website:
Submission Requirements
Format: Submissions must be in PDF format.
▪ Submissions for each task: Maximum of 4 pages (including all content and references).
Note: Teams winning at multiple tracks are required to submit separate reports for each track.

Templates: Please use the ACM Conference templates (two-column format). One recommended setting for LaTeX files is: \documentclass[sigconf, review]{acmart}.Template guidelines are here:

Reproducibility Supplement: Authors may include an optional one-page supplement focused on reproducibility at the end of their submitted paper. This page must be part of the same PDF file.

Author Information: After the submission deadline, the names and order of authors cannot be changed.

It would be great if you could cite our dataset paper available at ArXiv.

  title={OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining},
  author={Fanjin Zhang and Shijie Shi and Yifan Zhu and Bo Chen and Yukuo Cen and Jifan Yu and Yelin Chen and Lulu Wang and Qingfei Zhao and Yuqing Cheng and Tianyi Han and Yuwei An and Dan Zhang and Weng Lam Tam and Kun Cao and Yunhe Pang and Xinyu Guan and Huihui Yuan and Jian Song and Xiaoyan Li and Yuxiao Dong and Jie Tang},
  booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},

Contact Information: For any questions, please contact: .
▪ To be eligible for the possible awards, winners who place in the Top 15 on the TEST leaderboard must open-source their solutions. The public GitHub repository must include all code necessary to reproduce the results. Placeholder repositories will not be accepted. The submission deadline is 23:59 June 12, 2024, AOE Time. Please reply your (Track, Team Name, Rank, Github URL) in these threads (,,
The authors are responsible for addressing any inquiry about their code.
Please add README and provide sufficient information, including:
  Instructions: Detailed steps and exact commands required to reproduce the submitted result.
  Method Introduction: A brief overview of your methods.
  Good examples are [WhoIsWho-IND Codes] [AQA Codes] [PST Codes].

▪ May 30th, 2024
  1. Please ensure to accurately fill in the parameter count and GPU memory of your submissions for the validation and test set leaderboards, otherwise it will affect the final ranking of your team.
  2. The deadline for applying for the GLM-4 API Token is 2024-6-5 23:59 AOE time. Teams needing to apply should submit their application through

▪ April 29th, 2024. The slides of official competition analyses are available at [Download Link].

▪ April 16th, 2024. Baseline codes for three tracks are available! [WhoIsWho-IND Codes] [AQA Codes] [PST Codes]

▪ April 3rd, 2024: The GLM-4 API token recharge progress can be queried at [here]. Should you encounter any issues, feel free to address your concerns on the Discussion Board provided on the respective competition's website.

▪ March 28th, 2024: GLM-4 API tokens are distributed every Thursday (no later than 23:59 AOE Time). We kindly encourage you to submit your application as early as possible to ensure a smooth process. To recharge successfully, please sign up at first.

▪ March 20th, 2024: OAG-Challenge at KDD Cup 2024 started!
March 20th, 2024: Start of KDD Cup 2024
May 31st, 2024: Team Merge Deadline
May 31st, 2024: Release test data. All participants have 7 days to submit their results.
June 7th, 2024: All tracks end.
June 14th, 2024: Announcement of the KDD Cup winner.
Code and Report Submissions
For the winning solutions of the final leaderboard, we require public code submission through the GitHub repo. The repo should contain
▪ All the code to reproduce your results (including data pre-processing and model training/inference) and save the test submission.
▪ that contains all the instructions to run the code (from data pre-processing to model inference on test data).
In addition, we require a short technical report that describes your approach. The link can be either Arxiv or PDF uploaded to your GitHub repository.
Use of Large Language Models (LLMs) and API
For all tracks, pre-trained models that have been open-sourced before the end of the competition are allowed to be used.
WhoIsWho and IND allow the use of APIs. After a valid submission to the validation set, participating teams can obtain a free quota of 1 million tokens for the GLM-4 API [How To Get].
Since AQA dataset was collected from QA platforms, AQA task doesn't allow the use of APIs.
Awards are allocated $10,000 for each track.

▪ Gold Medal (1st Place): $3,000
▪ Silver Medal (2nd Place): $2,000
▪ Bronze Medal (3rd Place): $1,000
▪ Honorable Prizes (4th – 11th Place): $500, each team.
Tsinghua University, Knowledge Engineering Group (KEG) and Zhipu AI
OAG-Challenge Team:
Fanjin Zhang (Tsinghua University), Shijie Shi (Zhipu AI), Kun Cao (Zhipu AI), Bo Chen (Tsinghua University)
Steering Committee (in alphabetical order):
Yuxiao Dong, Cho-Jui Hsieh, Jie Tang, Steffen Staab, Yizhou Sun
Details about our datasets and initial baseline analysis are described in our OAG-Bench paper. If you use OAG-Challenge in your work, please cite our paper. (Bibtex below)

    title={OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining},
    author={Fanjin Zhang and Shijie Shi and Yifan Zhu and Bo Chen and Yukuo Cen and Jifan Yu and Yelin Chen and Lulu Wang and Qingfei Zhao and Yuqing Cheng and Tianyi Han and Yuwei An and Dan Zhang and Weng Lam Tam and Kun Cao and Yunhe Pang and Xinyu Guan and Huihui Yuan and Jian Song and Xiaoyan Li and Yuxiao Dong and Jie Tang},
    journal={arXiv preprint arXiv:2402.15810},