rllm.datasets.TACM12KDataset¶
- class rllm.datasets.TACM12KDataset(cached_dir: str, force_reload: bool | None = False)[source]¶
Bases:
DatasetTACM12KDataset is a multi-table relational dataset containing 4 tables, as collected in the rLLM: Relational Table Learning with LLMs paper.
It includes four tables: papers, authors, citations and writings. The papers table includes publication information of papers. The authors table includes author information. The citations table includes citation (i.e., <paper, paper>) information between papers. The writings table includes <author, write, paper> relationship between authors and papers. The default task is to predict the conference of papers.
- Parameters:
cached_dir (str) – Root directory where dataset should be saved.
forced_reload (bool) – If set to True, this dataset will be re-process again.
Table1: papers --------------- Statics: Name Papers Features Size 12,499 5 Table2: authors ------------------ Statics: Name Authors Features Size 17,431 3 Table3: citations ------------------ Statics: Name Citations Features edges 30,789 2 Table4: writings ------------------ Statics: Name Writings Features edges 37,055 2
- property processed_filenames¶
file names in the self.processed_dir
- property raw_filenames¶
file names in the self.raw_dir