rllm.datasets.TAGDataset¶

class rllm.datasets.TAGDataset(cached_dir: str, file_name: str, transform: Callable | None = None, use_cache: bool = True, force_reload: bool | None = False)[source]¶

Bases: Dataset

Three text-attributed-graph datasets, including cora from Automating the Construction of Internet Portals, pubmed from Collective Classification in Network Data and citeseer from CiteSeer: an automatic citation indexing system paper. This dataset also contains cached LLM predictions and confidences provided by the paper Label-free Node Classification on Graphs with Large Language Models (LLMS).

Parameters:

cached_dir (str) – Root directory where dataset should be saved.
file_name (str) – The name of dataset, e.g., cora and pubmed.
transform (callable, optional) – A function/transform that takes in an GraphData object and returns a transformed version. The data object will be transformed before every access. (default: None)
use_preds (bool) – If set to False, cached pesudo-labels annotated by gpt will not be loaded.
forced_reload (bool) – If set to True, this dataset will be re-process again.

download()[source]¶: download data from url to ‘./cached_dir/{dataset}/raw/’.

process()[source]¶: process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames¶: file names in the self.processed_dir

property raw_filenames¶: file names in the self.raw_dir