rllm.datasets.PlanetoidDataset

class rllm.datasets.PlanetoidDataset(cached_dir: str, file_name: str, transform: Callable | None = None, split: str = 'public', num_train_per_class: int = 20, num_val: int = 500, num_test: int = 1000, force_reload: bool | None = False)[source]

Bases: Dataset

The citation network datasets from the Revisiting Semi-Supervised Learning with Graph Embeddings paper, which include "Cora", "CiteSeer" and "PubMed". Nodes represent documents and edges represent citation links.

Parameters:
  • cached_dir (str) – Root directory where dataset should be saved.

  • file_name (str) – The name of dataset, e.g., cora, citeseer and pubmed.

  • transform (callable, optional) – A function/transform that takes in an GraphData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • split (str, optional) – The type of dataset split (public, full, geom-gcn, random). If set to public, the split will be the public fixed split from the Revisiting Semi-Supervised Learning with Graph Embeddings paper. If set to full, all nodes except those in the validation and test sets will be used for training (as in the FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling paper). If set to geom-gcn, the 10 public fixed splits from the Geom-GCN: Geometric Graph Convolutional Networks paper are given. If set to random, train, validation, and test sets will be randomly generated, according to num_train_per_class, num_val and num_test. (default: public)

  • num_train_per_class (int, optional) – The number of training samples per class in case of random split. (default: 20)

  • num_val (int, optional) – The number of validation samples in case of random split. (default: 500)

  • num_test (int, optional) – The number of test samples in case of random split. (default: 1000)

  • forced_reload (bool) – If set to True, this dataset will be re-process again.

Statics:
Name        Cora    CiteSeer    PubMed
nodes       2708    3327        19717
edges       10556   9104        88648
features    1433    3703        500
classes     7       6           3
download()[source]

download data from url to ‘./cached_dir/{dataset}/raw/’.

process()[source]

process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir