rllm.datasets.Titanic

class rllm.datasets.Titanic(cached_dir: str, forced_reload: bool | None = False, transform=None, tokenizer_config=None)[source]

Bases: Dataset

The Titanic dataset is a widely-used dataset for machine learning and statistical analysis, as featured in the Titanic: Machine Learning from Disaster competition on Kaggle.

The dataset contains various features related to the passengers aboard the Titanic, and the task is to predict whether a passenger survived.

Parameters:
  • cached_dir (str) – Root directory where dataset should be saved.

  • forced_reload (bool) – If set to True, this dataset will be re-process again.

Statics:
Name   Passengers  Features
Size   891         12
download()[source]

download the datasets to self.raw_dir

process()[source]

process data and save to ‘./cached_dir/{dataset}/processed/’.

property processed_filenames

file names in the self.processed_dir

property raw_filenames

file names in the self.raw_dir