rllm.datasets.Titanic¶

class rllm.datasets.Titanic(cached_dir: str, forced_reload: bool | None = False, transform=None, tokenizer_config=None)[source]¶

Bases: Dataset

The Titanic dataset is a widely-used dataset for machine learning and statistical analysis, as featured in the Titanic: Machine Learning from Disaster competition on Kaggle.

The dataset contains various features related to the passengers aboard the Titanic, and the task is to predict whether a passenger survived.

Parameters:

cached_dir (str) – Root directory where dataset should be saved.
forced_reload (bool) – If set to True, this dataset will be re-process again.

Statics:
Name   Passengers  Features
Size   891         12

download()[source]¶: download the datasets to self.raw_dir

process()[source]¶: process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames¶: file names in the self.processed_dir

property raw_filenames¶: file names in the self.raw_dir