rllm.datasets.IMDB

class rllm.datasets.IMDB(cached_dir: str, transform: Callable | None = None, force_reload: bool | None = False)[source]

Bases: Dataset

IMDB is a heterogeneous graph containing three types of entities, as collected in the MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding paper.

The movies are divided into three classes (action, comedy, drama) according to their genre. Movie features correspond to elements of a bag-of-words representation of its plot keywords.

Parameters:
  • cached_dir (str) – Root directory where dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an HeteroGraphData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • forced_reload (bool) – If set to True, this dataset will be re-process again.

Statics:
Name    movie     actors      directors
nodes   4,278       5,257      2,081
download()[source]

download data from url to ‘./cached_dir/{dataset}/raw/’.

process()[source]

process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir