rllm.datasets.TML1MDataset

class rllm.datasets.TML1MDataset(cached_dir: str, force_reload: bool | None = False, transform=None)[source]

Bases: Dataset

TML1MDataset is a multi-table relational dataset containing 3 tables, as collected in the rLLM: Relational Table Learning with LLMs paper.

It includes three tables: users, movies and ratings tables. The users table includes information about users, such as gender and occupation. The movies table contains information about movies, such as duration and plot. The ratings table represents the interaction information between the user and movie tables. In addition, the embeddings of movies table using all-MiniLM-L6-v2 model are also provided. The default task of this dataset is to predict user’s age.

Parameters:
  • cached_dir (str) – Root directory where dataset should be saved.

  • forced_reload (bool) – If set to True, this dataset will be re-process again.

Table1: users
---------------
    Statics:
    Name        Users     Features
    Size        6,040     5

Table2: movies
------------------
    Statics:
    Name        Movies     Features
    nodes       3,883      11

Table3: ratings
------------------
    Statics:
    Name        Ratings     Features
    nodes       1,000,209   4
download()[source]

download the datasets to self.raw_dir

process()[source]

process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir