rllm.datasets.PlanetoidDataset¶
- class rllm.datasets.PlanetoidDataset(cached_dir: str, file_name: str, transform: Callable | None = None, split: str = 'public', num_train_per_class: int = 20, num_val: int = 500, num_test: int = 1000, force_reload: bool | None = False)[source]¶
Bases:
DatasetThe citation network datasets from the Revisiting Semi-Supervised Learning with Graph Embeddings paper, which include
"Cora","CiteSeer"and"PubMed". Nodes represent documents and edges represent citation links.- Parameters:
cached_dir (str) – Root directory where dataset should be saved.
file_name (str) – The name of dataset, e.g., cora, citeseer and pubmed.
transform (callable, optional) – A function/transform that takes in an GraphData object and returns a transformed version. The data object will be transformed before every access. (default: None)
split (str, optional) – The type of dataset split (public, full, geom-gcn, random). If set to public, the split will be the public fixed split from the Revisiting Semi-Supervised Learning with Graph Embeddings paper. If set to full, all nodes except those in the validation and test sets will be used for training (as in the FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling paper). If set to geom-gcn, the 10 public fixed splits from the Geom-GCN: Geometric Graph Convolutional Networks paper are given. If set to random, train, validation, and test sets will be randomly generated, according to num_train_per_class, num_val and num_test. (default: public)
num_train_per_class (int, optional) – The number of training samples per class in case of random split. (default: 20)
num_val (int, optional) – The number of validation samples in case of random split. (default: 500)
num_test (int, optional) – The number of test samples in case of random split. (default: 1000)
forced_reload (bool) – If set to True, this dataset will be re-process again.
Statics: Name Cora CiteSeer PubMed nodes 2708 3327 19717 edges 10556 9104 88648 features 1433 3703 500 classes 7 6 3
- property processed_filenames¶
file names in the self.processed_dir
- property raw_filenames¶
file names in the self.raw_dir