Skip to contents

This function takes a data frame and creates a data dictionary. The data dictionary includes the variable name, a human-readable name, the variable type, and a description. If a model is specified, the function uses OpenAI's API to generate the information based on the characteristics of the data frame.

Usage

create_data_dictionary(
  data,
  file_path,
  model = NULL,
  sample_n = 5,
  grouping = NULL,
  force = FALSE
)

Arguments

data

A data frame to create a data dictionary for.

file_path

The file path to save the data dictionary to.

model

The ID of the OpenAI chat completion models to use for generating descriptions (see openai::list_models()). If NULL (default), a scaffolding for the data dictionary is created.

sample_n

The number of rows to sample from the data frame to use as input for the model. Default NULL.

grouping

A character vector of column names to group by when sampling rows from the data frame for the model. Default NULL.

force

If TRUE, overwrite the file at file_path if it already exists. Default FALSE.

Value

A data frame containing the variable name, human-readable name, variable type, and description for each variable in the input data frame.