Large language models (LLMs) are integrated into many modern technologies, from auto-complete suggestions to tools like Gemini, Copilot, ChatGPT, and DALL-E. These models are trained on vast datasets of text and images collected from the internet and private sources. However, new research from David Evans at the University of Virginia School of Engineering and Applied Science suggests that a widely used method for assessing whether an LLM’s training data is at risk of exposure may not be as reliable as once thought.
Presented at the Conference for Language Modeling, Evans and his team revealed that membership inference attacks (MIAs)—a method used to test privacy risks—“barely outperform random guessing for most settings across varying LLM sizes and domains.”
LLMs are built using a “vacuum cleaner” approach, gathering massive amounts of text and images from sources such as internet archives, private repositories, emails, and more. This raises concerns about privacy for both content creators and those training the models. MIAs are commonly used to audit how much specific training data an LLM might be leaking. For example, if an AI-generated image in “the style of” Monet appears to draw directly from Monet’s actual paintings, it could indicate that the model memorized parts of its training data. Similarly, MIAs can reveal whether text outputs contain verbatim excerpts from training datasets.
A Closer Look at the Research
The UVA team conducted a large-scale evaluation of five commonly used MIAs, training these adversarial tools on “The Pile,” an open-source dataset released by EleutherAI in December 2020. The Pile includes subsets of data from 22 popular sources such as Wikipedia, PubMed abstracts, U.S. Patent and Trademark Office archives, YouTube subtitles, and Google DeepMind mathematics.
However, the study found that current MIA methods struggle to effectively measure membership inference in LLMs. One challenge is identifying a good representative set of non-member candidates for experiments. Unlike traditional models, where data like records is more straightforward to classify, language datasets are more fluid. Minor word choice differences or contextual shifts can create ambiguity in determining whether a specific sentence belongs to the training data.
The team noted that prior research showing MIAs as effective may instead be demonstrating distribution inference, which measures broad patterns rather than specific membership.
Advancing MIA Research
To address these limitations, the team developed MIMIR, a Python-based, open-source project designed to enable other researchers to conduct more accurate membership inference tests. MIMIR aims to reveal new insights into how LLMs manage training data privacy and inspire more effective privacy auditing tools.
David Evans and his team hope that MIMIR will contribute to refining LLM training methods, ensuring that these powerful technologies respect both ethical considerations and user privacy.