Data Processing Methods

Algorithmic transformation

Definition: A finite set of unambiguous instructions performed in a prescribed sequence to achieve a goal, especially a mathematical rule or procedure used to compute a desired result.

Application: To harmonize same measures (continuous variables, categorical, or both) with different but combinable ranges or categories.

Example: Processing ‘number of cigarettes per day’ in numerical form into ‘number of cigarettes per day’ in categorical form (1: <10 cigarettes, 2: 10-20 cigarettes, 3: 21+ cigarettes).

Simple calibration model

Definition: A mathematical model that transforms one continuous measure into another continuous measure (or vice versa) to operate at the same unit.

Application: To harmonize same metric measures using different scales with calibration model. The calibration model can either be known or estimated from data in case multiple measure are measured on a representative set of participants.

Example: Processing ‘height, measured in cm into ‘height’ in inches.

Standardization model

Definition: A model that centralizes, standardizes, or normalizes data to bring all variables into proportion with one another, with or without stratification or regression of other variables.

Application: To harmonize same constructs measured using different scales with no known calibration method or bridging items. Normalization is typically used for discrete interval measures.

Example: Processing memory measures with no common items into standardized measure like C-scores or T-scores.

Latent variable model

Definition: A statistical model that relates a set of manifest variables to a set of latent variables. Latent variables are constructs that are not directly observed but are rather inferred (through mathematical model) from manifest variable that are observed (directly measured). E.g. factor analysis and latent trait analysis (IRT item response theory).

Application: To harmonize same constructs measured using different scales with no known calibration method but with bridging items present.

Example: Processing memory measures with common items according to latent variable model.

Multiple imputation models

Definition: A statistical technique for imputing missing values with a set of plausible values that represent the uncertainty about the right values to impute.

Application: To harmonize datasets (and not variables) with the exact same set of variables using bridging variables.

Example: Processing activities of daily living measures by imputing missing items.