Biosecurity Boost for OpenAI’s New AI Models

OpenAI states that it implemented a new system to observe its newest AI logic models, o3 and o4-mini, for prompts associated to biological and chemical risks. The system aims to stop the models from giving guidance that could guide someone on performing potentially dangerous assaults, based on OpenAI’s security document.

O3 and o4-mini represent a significant capacity rise over OpenAI’s prior models, the company states, and therefore present new dangers in the possession of malicious agents. Based on OpenAI’s inside benchmarks, o3 is more proficient at responding to queries about creating certain kinds of biological risks in specific. For this cause — and to lessen other risks — OpenAI created the new monitoring system, which the company calls as a “safety-focused logic monitor.”

The monitor, custom-trained to reason about OpenAI’s content guidelines, operates on top of o3 and o4-mini. It’s intended to detect prompts linked to biological and chemical risk and guide the models to decline to extend guidance on those topics.

To set a baseline, OpenAI had testers use around 1,000 hours marking “unsafe” biorisk-related discussions from o3 and o4-mini. During a trial in which OpenAI imitated the “prevention logic” of its safety monitor, the models declined to react to risky prompts 98.7% of the time, based on OpenAI.

OpenAI admits that its test didn’t consider individuals who might attempt new prompts after getting prevented by the monitor, which is why the company states it’ll keep to depend in part on human monitoring.

O3 and o4-mini don’t cross OpenAI’s “high risk” threshold for biorisks, based on the company. However, compared to o1 and GPT-4, OpenAI states that initial versions of o3 and o4-mini proved more useful at responding to queries about developing biological weapons.

Chart from o3 and o4-mini’s system card (Screenshot: OpenAI)

The company is constantly tracking how its models could make it simpler for malicious users to develop chemical and biological risks, based on OpenAI’s newly revised Readiness Plan.

OpenAI is progressively depending on automatic systems to lessen the risks from its models. For instance, to stop GPT-4o’s built-in image creator from producing child sexual abuse material (CSAM), OpenAI states it uses a logic monitor like to the one the company implemented for o3 and o4-mini.

Yet multiple researchers have voiced worries OpenAI isn’t emphasizing safety as much as it ought to. One of the company’s testing collaborators, Metr, said it had comparatively limited time to assess o3 on a standard for misleading behavior. Meanwhile, OpenAI chose not to publish a safety document for its GPT-4.1 model, which started earlier this week.

5/5 - (1 vote)

Leave a comment Cancel reply