rllm.preprocessing¶
DataFrame to Tensor¶
Convert a typed DataFrame into model-ready tensor features by dispatching each column with |
Text Tokenize¶
Configuration for text tokenization across preprocessing utilities. |
Tokenize a single text column into ids and attention masks. |
|
Tokenize a list of strings and build batched model inputs. |
|
Standardize tokenizer outputs into |
|
Merge all text columns per row and then tokenize. |
|
Tokenize all column names once and cache their token tensors. |
Word Embedding¶
Configuration for text embedding in preprocessing pipelines. |
Embed a text column into dense vector representations. |
Timestamp¶
Convert a timestamp column into structured time-component tensors. |
Fillna¶
Configuration for missing-value imputation by column type. |
Fill missing values based on column type. |