rllm.preprocessing.tokenize_merged_cols¶

rllm.preprocessing.tokenize_merged_cols(df: DataFrame, col_types: dict, tokenizer_config: TokenizerConfig, target_col: str | None = None) → tuple | None[source]¶

Merge all text columns per row and then tokenize. Depending on configuration, each text segment may include its column name as a prefix before row-wise concatenation. If no eligible text column exists, the function returns None.

Parameters:

df (DataFrame) – Input table.
col_types (dict) – Mapping of column name to ColType.
tokenizer_config (TokenizerConfig) – Tokenizer configuration.
target_col (Optional[str]) – Target column excluded from text merge.

Returns:

(input_ids, attention_mask) with shape \((B, L)\) if text columns exist; otherwise None.

Return type:

Optional[tuple]