OpenAI Unveils GDPval to Compare AI and Human Expertise

By Newsroom, published 27 September 2025 at 13h48, updated on 27 September 2025 at 13h48.

Tech

OpenAI / PR-ADN

OpenAI has introduced GDPval, a new metric designed to evaluate artificial intelligence by directly comparing its performance with human expertise. This initiative aims to provide clearer benchmarks for AI capabilities in real-world, expert-level tasks.

TL;DR

OpenAI launches GDPval to test GPT-5 on real jobs.
1,320 tasks, crafted by seasoned professionals, span 44 careers.
Benchmark aims to assess AI’s ability to rival humans at work.

A Benchmark Unlike Any Other

When it comes to pushing the boundaries of artificial intelligence, few companies have the ambitions—or the audacity—of OpenAI. With its latest move, the creator of ChatGPT is entering uncharted territory. The newly announced benchmark, dubbed GDPval, seeks to put the forthcoming GPT-5 through its paces in a way never attempted before: not with theoretical puzzles or academic exams, but with tasks mirroring those performed daily by working professionals across a broad swath of industries.

The Rigorous Test: Simulating Real-World Expertise

Rather than relying on standard tests or simple textual prompts, GDPval comprises a collection of 1,320 highly specialized assignments. Each one was meticulously developed and reviewed by experts averaging more than 14 years of experience in their respective fields. The range is striking—from legal briefs and nursing care plans to financial spreadsheets and technical diagrams. Several factors explain this decision:

Create realistic workplace conditions with complex scenarios and multimedia files.
Reflect industry-specific standards rather than artificial benchmarks.
Gauge whether an AI can replicate outputs requiring nuanced judgment.

Unlike previous benchmarks, each task provides contextual materials—reference documents, multimedia elements—designed to replicate genuine on-the-job requirements.

A Cross-Sector Challenge for AI

The scope of this benchmark stretches across forty-four professions distributed over nine economic sectors. It’s not just about coders or financial analysts; the list includes everything from concierges and nurses to lawyers, engineers, and journalists. Through GDPval, OpenAI hopes to answer a fundamental question: Can a system like GPT-5—or any so-called general artificial intelligence (AGI)—produce work that truly matches human expertise?

Navigating an Uncertain Future for Work

Such ambitions inevitably rekindle debates about the future of employment. Could advances in AI make certain jobs obsolete? Even OpenAI concedes that GDPval represents only an early step; it cannot yet capture the entire complexity of real-world economic activity. Isolated scenarios are tested without accounting for evolving workplace contexts or long-term project development. Nevertheless, the writing is on the wall: as technology advances, professional life will be profoundly transformed. The pace and nature of this transformation remain uncertain—and there’s still a role for human adaptability in shaping what comes next.

Le Récap

TL;DR
A Benchmark Unlike Any Other
The Rigorous Test: Simulating Real-World Expertise
A Cross-Sector Challenge for AI
Navigating an Uncertain Future for Work