rllm.preprocessing.tokenize_merged_cols¶
- rllm.preprocessing.tokenize_merged_cols(df: DataFrame, col_types: dict, tokenizer_config: TokenizerConfig, target_col: str | None = None) tuple | None[source]¶
Merge all text columns per row and then tokenize. Depending on configuration, each text segment may include its column name as a prefix before row-wise concatenation. If no eligible text column exists, the function returns
None.- Parameters:
df (DataFrame) – Input table.
col_types (dict) – Mapping of column name to
ColType.tokenizer_config (TokenizerConfig) – Tokenizer configuration.
target_col (Optional[str]) – Target column excluded from text merge.
- Returns:
(input_ids, attention_mask)with shape \((B, L)\) if text columns exist; otherwiseNone.- Return type:
Optional[tuple]