This function calculates various association metrics (PMI, Dice's Coefficient, Lambda-Rank) for bigrams in a given corpus. The data frame must contain document and token indices, as well as a 'type' variable representing the tokens.
Usage
calc_assoc_metrics(
data,
doc_index,
token_index,
type,
association = "all",
verbose = FALSE
)Arguments
- data
A data frame containing the corpus.
- doc_index
A string name of the column in 'data' which represents the document index.
- token_index
A string name of the column in 'data' which represents the token index.
- type
A string name of the column in 'data' which represents the tokens or terms.
- association
A character vector specifying which metrics to calculate. Can be any combination of 'pmi' (Pointwise Mutual Information), 'dice_coeff' (Dice's Coefficient), 'g_score' (G-score), or 'all' (calculate all metrics). Default is 'all'.
- verbose
A logical value indicating whether to keep the intermediate probability columns ('p_xy', 'p_x', 'p_y') in the result. Default is FALSE.