rllm.preprocessing¶

DataFrame to Tensor¶

Convert a typed DataFrame into model-ready tensor features by dispatching each column with ColType.

Configuration for text tokenization across preprocessing utilities.

`process_tokenized_column`	Tokenize a single text column into ids and attention masks.
`tokenize_strings`	Tokenize a list of strings and build batched model inputs.
`standardize_tokenizer_output`	Standardize tokenizer outputs into `(input_ids, attention_mask)`.
`tokenize_merged_cols`	Merge all text columns per row and then tokenize.
`save_column_name_tokens`	Tokenize all column names once and cache their token tensors.

Configuration for text embedding in preprocessing pipelines.

Embed a text column into dense vector representations.

Convert a timestamp column into structured time-component tensors.

Configuration for missing-value imputation by column type.

Fill missing values based on column type.