rllm.preprocessing

DataFrame to Tensor

df_to_tensor

Convert a typed DataFrame into model-ready tensor features by dispatching each column with ColType.

Text Tokenize

TokenizerConfig

Configuration for text tokenization across preprocessing utilities.

process_tokenized_column

Tokenize a single text column into ids and attention masks.

tokenize_strings

Tokenize a list of strings and build batched model inputs.

standardize_tokenizer_output

Standardize tokenizer outputs into (input_ids, attention_mask).

tokenize_merged_cols

Merge all text columns per row and then tokenize.

save_column_name_tokens

Tokenize all column names once and cache their token tensors.

Word Embedding

TextEmbedderConfig

Configuration for text embedding in preprocessing pipelines.

embed_text_column

Embed a text column into dense vector representations.

Timestamp

TimestampPreprocessor

Convert a timestamp column into structured time-component tensors.

Fillna

FillNAConfig

Configuration for missing-value imputation by column type.

fillna_by_coltype

Fill missing values based on column type.