rllm.datasets.DBLP

class rllm.datasets.DBLP(cached_dir: str, transform: Callable | None = None, forced_reload: bool | None = False)[source]

Bases: Dataset

DBLP is a heterogeneous graph containing four types of entities, as collected in the MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding paper.

The authors are divided into four research areas (database, data mining, artificial intelligence, information retrieval). Each author is described by a bag-of-words representation of their paper keywords.

Parameters:
  • cached_dir (str) – Root directory where dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an HeteroGraphData object and returns a transformed version. The data object will be transformed before every access.

  • forced_reload (bool) – If set to True, this dataset will be re-process again.

Statics:
Name    authors     papers      terms       conferences
nodes   4,057       14,328      7,723       20
download()[source]

download data from url to ‘./cached_dir/{dataset}/raw/’.

process()[source]

process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir