rllm.datasets.TACM12KDataset

class rllm.datasets.TACM12KDataset(cached_dir: str, force_reload: bool | None = False)[source]

Bases: Dataset

TACM12KDataset is a multi-table relational dataset containing 4 tables, as collected in the rLLM: Relational Table Learning with LLMs paper.

It includes four tables: papers, authors, citations and writings. The papers table includes publication information of papers. The authors table includes author information. The citations table includes citation (i.e., <paper, paper>) information between papers. The writings table includes <author, write, paper> relationship between authors and papers. The default task is to predict the conference of papers.

Parameters:
  • cached_dir (str) – Root directory where dataset should be saved.

  • forced_reload (bool) – If set to True, this dataset will be re-process again.

Table1: papers
---------------
    Statics:
    Name        Papers      Features
    Size        12,499      5

Table2: authors
------------------
    Statics:
    Name        Authors     Features
    Size        17,431      3

Table3: citations
------------------
    Statics:
    Name        Citations   Features
    edges       30,789      2

Table4: writings
------------------
    Statics:
    Name        Writings    Features
    edges       37,055      2
download()[source]

download the datasets to self.raw_dir

process()[source]

process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir