rllm.datasets.TLF2KDataset

class rllm.datasets.TLF2KDataset(cached_dir: str, force_reload: bool | None = False)[source]

Bases: Dataset

TLF2KDataset is a multi-table relational dataset containing 3 tables, as collected in the rLLM: Relational Table Learning with LLMs paper.

It contains three tables: users, movies and ratings. The artists table includes information about artists, such as location and genre. The user_artists table contains the interaction between the user and artist as format: [user, artist, listening_count]. The user_friends table represents bi-directional friendship between users. The default task of this dataset is to predict artists’s genre.

Parameters:
  • cached_dir (str) – Root directory where dataset should be saved.

  • forced_reload (bool) – If set to True, this dataset will be re-process again.

Table1: artists
---------------
    Statics:
    Name        Users     Features
    Size        9,047     10

Table2: user_artists
------------------
    Statics:
    Name        Movies     Features
    nodes       80,009     3

Table3: user_friends
------------------
    Statics:
    Name        Ratings     Features
    nodes       12,717      2
download()[source]

download the datasets to self.raw_dir

process()[source]

process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir