rllm.datasets

Graph Datasets

Heterogeneous Graph Datasets

DBLP

DBLP is a heterogeneous graph containing four types of entities, as collected in the MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding paper.

IMDB

IMDB is a heterogeneous graph containing three types of entities, as collected in the MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding paper.

Homogeneous Graph Datasets

PlanetoidDataset

The citation network datasets from the Revisiting Semi-Supervised Learning with Graph Embeddings paper, which include "Cora", "CiteSeer" and "PubMed".

TAPEDataset

The citation network datasets, include cora and pubmed, collected from paper Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning paper.

TAGDataset

Three text-attributed-graph datasets, including cora from Automating the Construction of Internet Portals, pubmed from Collective Classification in Network Data and citeseer from CiteSeer: an automatic citation indexing system paper.

Table Datasets

Single Table Datasets

Titanic

The Titanic dataset is a widely-used dataset for machine learning and statistical analysis, as featured in the Titanic: Machine Learning from Disaster competition on Kaggle.

Adult

The Adult dataset is a dataset from a classic data mining project, which was extracted from the 1994 Census database.

BankMarketing

The Bank Marketing dataset is related to direct marketing campaigns of a Portuguese banking institution.

ChurnModelling

The Churn Modelling dataset is used to predict which customers are likely to churn from the organization by analyzing various attributes and applying machine learning and deep learning techniques.

Multi-Table Datasets

TACM12KDataset

TACM12KDataset is a multi-table relational dataset containing 4 tables, as collected in the rLLM: Relational Table Learning with LLMs paper.

TLF2KDataset

TLF2KDataset is a multi-table relational dataset containing 3 tables, as collected in the rLLM: Relational Table Learning with LLMs paper.

TML1MDataset

TML1MDataset is a multi-table relational dataset containing 3 tables, as collected in the rLLM: Relational Table Learning with LLMs paper.

RelBenchDataset

Override methods for RelBench datasets.

RelF1Dataset

A wrapper for rel-f1 dataset in RelBench benchmark from RelBench: A Benchmark for Deep Learning on Relational Databases paper, which contains Formula 1 racing data with 9 tables and 3 tasks.

RelBenchTask

RelBenchTaskType

An enumeration.

RelBenchTableMeta