rllm.datasets.RelBenchDataset

class rllm.datasets.RelBenchDataset(root: str, force_reload: bool | None = False)[source]

Bases: Dataset

Override methods for RelBench datasets.

Subclasses need to assign the following properties after processing:

self._task_dict: Dict[str, RelBenchTask] self._table_dict: Dict[str, TableData] self._hdata: HeteroGraphData self._tabledata_stats_dict: Dict[str, Any] self._table_meta_dict: Dict[str, RelBenchTableMeta]

download()[source]

Download and unzip raw files.

property has_download

check whether data has been downloaded

property has_process

check whether data has been processed

load_all()[source]

Force load all cached properties.

make_pkey_fkey_graph() Tuple[HeteroGraphData, Dict][source]

Make primary key - foreign key graph for the dataset.

This method lazy materializes each TableData, saves them to processed_dir, and constructs the HeteroGraphData based on pkey-fkey relations.

Returns:

Heterogeneous graph data. Dict: table_name -> TableData.metadata

Return type:

HeteroGraphData

process()[source]

process the datasets to self.processed_dir

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir

validate_dataset()[source]

Validate the integrity of downloaded files. 1. validate primary keys 2. validate foreign keys (correct if necessary)