OpenAI’s Latest Model Boosts Safety

Sarah Ruivivar

Jul 24, 20242 min read

Image credits: ThisisEngineering / Unsplash

Have you seen those hilarious memes where someone tells a bot to “ignore all previous instructions” and then breaks it in the most amusing ways?

Well, OpenAI is putting an end to that with its latest model, GPT-4o Mini.

The trick works like this: imagine you have an AI bot designed to direct you to The Verge’s stellar reporting. If you tell it to “forget all previous instructions,” it would abandon its original mission and start doing whatever you ask, like writing a poem about printers. Funny, right? But not very practical.

To tackle this, OpenAI researchers have developed a technique called “instruction hierarchy.” This method prioritises the developer’s original instructions over any mischievous prompts you might inject. So, no more tricking the bot into doing your bidding!

Want to learn more about AI's impact on the world in general and property in particular? Join us on our next Webinar! Click here to register

Olivier Godement from OpenAI explains that this new technique ensures the model follows the system message first, making it much harder to misuse. This is particularly important as OpenAI aims to create fully automated agents to manage your digital life. Imagine an email-writing agent being tricked into sending your inbox contents to a third party—yikes!

Existing large language models (LLMs) can’t differentiate between user prompts and developer instructions. But with instruction hierarchy, system instructions get top priority, while misaligned prompts are treated with the disdain they deserve. The model is trained to detect and ignore bad prompts, making it safer than ever.

This safety update is crucial for OpenAI, especially given recent concerns about their safety and transparency practices. Trust in OpenAI has been shaky, but with GPT-4o Mini, they’re taking a significant step towards regaining it.

So, if you’re planning to be a rascal with AI bots, think again. GPT-4o Mini is here to keep things on track!

Want to learn more about AI's impact on the world in general and property in particular? Join us on our next Webinar! Click here to register

Made with TRUST_AI - see the Charter: https://www.modelprop.co.uk/trust-ai

ModelProp

OpenAI’s Latest Model Boosts Safety

Recent Posts

Comments