rllm.datasets.TAPEDataset

class rllm.datasets.TAPEDataset(cached_dir: str, file_name: str, transform: Callable | None = None, use_text: bool | None = True, use_gpt: bool | None = True, use_preds: bool | None = True, topk: int | None = 5, force_reload: bool | None = False)[source]

Bases: Dataset

The citation network datasets, include cora and pubmed, collected from paper Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning paper.

Parameters:
  • cached_dir (str) – Root directory where dataset should be saved.

  • file_name (str) – The name of dataset, e.g., cora and pubmed.

  • transform (callable, optional) – A function/transform that takes in an GraphData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • use_text (bool) – If set to False, original text will not be loaded.

  • use_gpt (bool) – If set to False, gpt explanations will not be loaded.

  • use_preds (bool) – If set to False, pesudo-labels annotated by gpt will not be loaded.

  • topk (int) – the top-k pesudo-labels to be loaded, the default value is 5.

  • forced_reload (bool) – If set to True, this dataset will be re-process again.

download()[source]

download data from url to ‘./cached_dir/{dataset}/raw/’.

process()[source]

process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir