The study introduces G-AUDIT (Generalized Attribute Utility and Detectability-Induced bias Testing) as a data modality-agnostic auditing tool, meaning it can be applied across different types of medical datasets, including images, text-based electronic health records (EHRs), and structured tabular data. G-AUDIT systematically examines how dataset attributes – such as demographic information, clinical site variations, and imaging protocols – affect AI decision-making. Unlike previous methods that focus solely on social or ethical biases, this framework identifies shortcut learning in AI models, where an algorithm relies on unintended correlations rather than clinically relevant features. By quantifying both utility (the strength of a feature’s association with task outcomes) and detectability (how easily an AI model can infer this feature from the data itself), G-AUDIT provides actionable insights into dataset biases before model training begins.
