AI Detects Personal Data from Social Media Posts

Sarah Ruivivar

Nov 7, 20232 min read

Ever wondered how much your social media posts reveal about you?

According to a recent study, Large Language Models (LLMs) like GPT-4 can identify personal details such as age, location, gender, and income with up to 85% accuracy, just by analysing social media posts.

Researchers Robin Staab and Mark Vero at ETH Zurich in Switzerland conducted an intriguing experiment. They tasked nine LLMs to sift through a database of Reddit posts to pick up identifying information based on users' writing styles.

The researchers randomly selected 1500 profiles of users who engaged on the platform. They then narrowed these down to 520 users for whom they could confidently identify attributes like place of birth, income bracket, gender, and location, either from their profiles or posts.

When presented with the posting history of these users, some of the LLMs were able to identify many of these attributes with a high degree of accuracy. GPT-4 achieved the highest overall accuracy with 85%, while LlaMA-2-7b, a comparatively low-powered LLM, was the least accurate model with 51%.

"It tells us that we give a lot of our personal information away on the internet without thinking about it," says Staab. "Many people would not assume that you can directly infer their age or their location from how they write, but LLMs are quite capable."

Sometimes, personal details were explicitly stated in the posts. For example, some users post their income in forums about financial advice. But the AIs also picked up on subtler cues, like location-specific slang, and could estimate a salary range from a user’s profession and location. Some characteristics were easier for the AIs to discern than others. GPT-4 was 97.8% accurate at guessing gender, but only 62.5% accurate on income.

"We’re only just beginning to understand how privacy might be affected by the use of LLMs," says Alan Woodward, at the University of Surrey, UK. This research highlights the importance of being aware of the digital footprint we leave behind and the potential implications for privacy.

Made with TRUST_AI - see the Charter: https://www.modelprop.co.uk/trust-ai

ModelProp

AI Detects Personal Data from Social Media Posts

Recent Posts

Comments