rllm.datasets.TML1MDataset¶
- class rllm.datasets.TML1MDataset(cached_dir: str, force_reload: bool | None = False, transform=None)[source]¶
Bases:
DatasetTML1MDataset is a multi-table relational dataset containing 3 tables, as collected in the rLLM: Relational Table Learning with LLMs paper.
It includes three tables: users, movies and ratings tables. The users table includes information about users, such as gender and occupation. The movies table contains information about movies, such as duration and plot. The ratings table represents the interaction information between the user and movie tables. In addition, the embeddings of movies table using all-MiniLM-L6-v2 model are also provided. The default task of this dataset is to predict user’s age.
- Parameters:
cached_dir (str) – Root directory where dataset should be saved.
forced_reload (bool) – If set to True, this dataset will be re-process again.
Table1: users --------------- Statics: Name Users Features Size 6,040 5 Table2: movies ------------------ Statics: Name Movies Features nodes 3,883 11 Table3: ratings ------------------ Statics: Name Ratings Features nodes 1,000,209 4
- property processed_filenames¶
file names in the self.processed_dir
- property raw_filenames¶
file names in the self.raw_dir