PERIÓDICOS ALIAZON: GPT-5 is alive

viernes, 8 de agosto de 2025

GPT-5 is alive

Platformer

GPT-5 is alive

It's big news for free users of ChatGPT, but AGI remains elusive. PLUS: one AI safety idea other labs should borrow

By Casey Newton • 7 Aug 2025

View in browser

A grid of images showing GPT-5 at work, including coding and editing.

(OpenAI)

Here's this week's free edition of Platformer: a hands-on look at GPT-5, focused on what it means for the average person, for the state of competition, and for AI safety.

Want to kick in a few bucks to support our work? Consider upgrading your subscription today. We'll email you all our scoops first, like our recent one about Grok's 12+ rating in the App Store. Plus you'll be able to discuss each today's edition with us in our chatty Discord server, and we’ll send you a link to read subscriber-only columns in the RSS reader of your choice.

Upgrade

This is a column about AI. My boyfriend works at Anthropic. See my full ethics disclosure here.

On Thursday, OpenAI released GPT-5. Today, let's talk about folks' early impressions, how it may change the competitive landscape, and a novel approach the company is taking to improve the safety of its models without making them too annoying to use.

GPT-5 arrives more than two years after GPT-4 — and in some ways, a completely different world. (Two of the people in the photo for a New York Times story about the launch later left OpenAI and now run their own AI companies with multibillion-dollar valuations.) OpenAI has consistently released notable new models in the time since, but has reserved the "5" designation in hopes of hitting a significant new milestone on the road to artificial intelligence.

As The Information chronicled last week, it has been a difficult road. A model known internally as Orion failed to meet the company's expectations, and the confusingly named GPT-4.5, released in February, failed to make much of a splash. "As recently as June, the technical problems meant none of OpenAI’s models under development seemed good enough to be labeled GPT-5, according to a person who has worked on it," reported Stephanie Palazzolo, Erin Woo, and Amir Efrati.

What the company called GPT-5 today actually represents a combination of models, the company said in a briefing with reporters that I attended this week. It routes easier questions to models that can answer them much more quickly, and pauses to "think" harder on questions that benefit from more consideration. Combined with other improvements, the result is a model that is noticeably faster than its predecessor. (OpenAI CEO Sam Altman said the model now answers questions so quickly that he worries it must have missed something.)

GPT-5 is now the default model for all ChatGPT users, including the free ones. That means this will likely be the first exposure that tens of millions of people have to OpenAI's reasoning models, which excel at more complicated tasks. (I regularly use the company's o3 model to look for spelling, grammatical, or factual errors in my columns before publication, and it reliably finds things I had missed.)

For them, I imagine GPT-5 will feel a bit like buying a new iPhone after a few years. On one hand, it's meaningfully improved across lots of dimensions. On the other, it's still just an iPhone.

Pro users also get access to two other models, GPT-5 Thinking ("Get more thorough answers") and GPT-5 Pro ("Research grade intelligence"), which let its best paying customers insist that model burns lots of tokens before returning a response. Free users can get a version of this by telling the model to "think hard" when entering a query, an employee said today during a live stream announcing the release. But the model picker that previously let users switch between modes has disappeared, for better and for worse.

In a conversation with reporters, Altman struck a balance between playing up the model's strengths and acknowledging its limitations.

"GPT-5 is a major upgrade," he said. "It’s a significant step along the path to AGI."

At the same time, he said, GPT-5 makes no claims to being AGI.

"This is not a model that continuously learns as it’s deployed from the new things it finds, which is something that, to me, feels like it should be part of AGI,” Altman said.

Other people's impressions

As usual, OpenAI gave a handful of reviewers early access to the models. All of them report being pleased with it, even if none quite says that it has totally changed their workflows.

Simon Willison, whose ongoing efforts to get an LLM to produce line-based drawings of a pelican riding a bicycle have made him a national treasure (to me), calls GPT-5 "my new favorite model. It’s still an LLM — it’s not a dramatic departure from what we’ve had before — but it rarely screws up and generally feels competent or occasionally impressive at the kinds of things I like to use models for." I'd rate the pelican drawing it makes for him the best I've seen.

Like others, Willison is also pleasantly surprised at GPT-5's API pricing, which matches Google's price for Gemini 2.5 Pro ($1.25 per million input tokens) and radically undercuts Claude Opus 4.1 ($15 per million). He also largely backs OpenAI's claim that GPT-5 is more accurate, and less prone to hallucinations, than its predecessor.

The team over at Every is slightly less impressed. When GPT-5 does not employ reasoning in its responses, it often hallucinates, CEO Dan Shipper writes. "For example, if I take a picture of a passage in a novel and ask it to explain what's happening, GPT-5 will sometimes confidently make things up," he says. "If I ask it to 'think longer,' it will deliver an accurate answer."

Underscoring this point is the fact that some of the charts OpenAI showed off in its presentation today make no sense. "For 'coding deception,' for example, GPT-5 apparently gets a 50.0 percent deception rate, but that’s compared to OpenAI’s smaller 47.4 percent o3 score which somehow has a larger bar," Jay Peter noted at The Verge in a story headlined OpenAI gets caught vibe graphing. (The company fixed the charts and apologized for the "chart crime.")

Over in the prediction markets, GPT-5's release saw some disappointment: Google has now overtaken OpenAI in bets about which company will have the best model at the end of August.

All that said, the worst day to review a new model is on the day it comes out. It took me months of using models from three different companies to determine that OpenAI's o3 was the best model for most things for me; I had a variety of good experiences today with GPT-5, but really I just want to use it more.

Competition

ChatGPT is already the clear winner in chat-based consumer AI, quadrupling its user base to 700 million weekly users over the past year. (Google's Gemini app has a respectable 450 million monthly users, and no one else has a standalone app in the same ballpark.) OpenAI also has 5 million businesses as customers, up from 3 million in June. It is reportedly on track for more than $12 billion in revenue.

In the consumer business, it's possible to imagine a rival like Google or Meta catching up. But doing so will likely require that AI models enable entirely new workflows and creative possibilities — something GPT-5 doesn't really do. (And something that Google and Meta have not really yet delivered, either.)

On the business side, OpenAI clearly has designs on accelerating its enterprise growth even faster. The company spent a lot of time today playing up GPT-5's capabilities in coding and highlighting its performance on top industry benchmarks. In one demonstration, an employee quickly coded up a tool to teach French, complete with virtual flashcards and a video game based on the old graphing calculator classic Snake.

Shipper calls it a "very good programmer" but a step back from Anthropic's Claude Code. Mckay Wrigley, a respected AI developer, echoed that sentiment. "Claude Code with Opus is still king and frankly it’s not close," he said.

On the other hand, Cursor CEO Michael Truell said on the live stream that GPT-5 is the smartest model the team has yet seen.

I imagine many developers will be drawn to GPT-5 for its strong baseline capabilities and its low price relative to Claude. But the fact that GPT-5 doesn't seem to have leapfrogged the field in coding — or any other domain, really — is surely causing sighs of relief at Google and Anthropic.

It will surely also inspire a fresh round of AI-is-hitting-a-wall discourse — something Altman tried to bat down in response to a reporter question this week.

In reference to the scaling laws, he said, "they absolutely still hold — and we keep finding new dimensions to scale on. ... This idea that we can use more compute, higher quality data, and better environments to make smarter and smarter models — we see orders of magnitude more gains in front of us. Obviously, we have to invest in compute at an eye-watering rate to get that, but we intend to keep doing it."

Safety

I haven't yet had time to read through the entire safety report on GPT-5. But I did want to highlight one useful contribution that OpenAI has made to the field: "safe completions."

One of the more frustrating aspects of chatbots is that they will sometimes refuse to answer a question, even if seems completely innocuous. These "refusals," as they are known, are one of the primary ways that AI labs attempt to prevent their models from being misused.

They also raise many of the same issues that content policies on social networks do. You also have likely encountered the social network equivalent of a refusal: your post being removed.

In the earlier part of this decade, trust and safety teams and academics began to advocate for a more nuanced approach to moderation. Instead of a binary decision — take a post down, or leave it up — they pushed for more nuanced approaches. Twitter, for example, once encouraged users to actually read stories before retweeting them. Other platforms gave individual users tools to bulk-delete comments on their own posts, or to set their own limits for how much offensive content they wished to see.

"Solving the problems in our online information ecosystem will require much bigger imaginations than 'just delete the bad stuff,'" Stanford Law School's Evelyn Douek wrote in 2021. "Even if we could agree on what the 'bad stuff' is — and we never will — deleting it is putting a bandaid on a deep wound."

I thought of this debate while reading about OpenAI's new safe completions. They represent the first effort I'm aware of in AI to move past that take-it-down/leave-it-up binary. Instead of refusing to answer a user's unsafe query altogether, GPT-5 will now seek to give the user the fullest possible answer it can within its own safety guidelines.

In one example, a user whose query previously would not have triggered a refusal — it involved how to set off fireworks — now gets gently redirected. "Sorry—I can’t help with detailed, actionable instructions for igniting pyrotechnic compositions," the model responds, "or with misfire-probability calculations. That kind of guidance could enable harmful or unsafe use."

But the model goes on to explain what kinds of queries it can help with, hopefully leaving the user less frustrated.

"It can be easy to trade off helpfulness for safety – a model can be safe if it refuses everything," the company wrote in a blog post about the future. "But we want our models to be both safe and helpful. A core research challenge is how to improve both of these goals together."

We'll need some time to understand how safe completions work in practice. But in theory, at least, they seem like a meaningful step forward in an area where OpenAI's rivals haven't yet shown much initiative.

On the podcast this week: Kevin and I discuss our first impressions of GPT-5. Then, we get early access to Alexa+, and speak with Alexa and Echo chief Daniel Rausch about the company's efforts to marry LLMs with hardware.

Sponsored

Fly.io lets you spin up hardware-virtualized containers (Fly Machines) that boot in milliseconds, run any Docker image, and scale to zero automatically when idle. Whether your workloads are driven by humans or autonomous AI agents, Fly Machines provide infrastructure that's built to handle it:

Instant Boot Times: Machines start in milliseconds, ideal for dynamic and unpredictable demands.
Zero-Cost When Idle: Automatically scale down when not in use, so you're only billed for active usage.
Persistent Storage: Dedicated storage for every user or agent with Fly Volumes, Fly Managed Postgres, and S3-compatible storage from Tigris Data.
Dynamic Routing: Seamlessly route each user (or robot) to their own sandbox with Fly Proxy and fly-replay.

If your infrastructure can't handle today's dynamic and automated workloads, it's time for an upgrade.

Talk to us

Send us tips, comments, questions, and your custom GPT-5 benchmarks: casey@platformer.news. Read our ethics policy here.

Sponsor a Newsletter

Comment

Páginas

viernes, 8 de agosto de 2025

GPT-5 is alive

Governing

Industry

Those good posts

Talk to us

No hay comentarios:

Publicar un comentario

ARTÍCULOS

ROPA Y COMPLEMENTOS ALIAZON

ABRIGOS DE MUJER

Tu selección está aquí...

Última oportunidad: 10 % de descuento en REBAJAS

OPINIÓN

Opinión y análisis // Diariocrítico.com

Artículos de Opinión | El Independiente

RSS de noticias de opinion

OPINION EL CONFIDENCIAL

Estrella Digital :: Últimas opiniones

Nuevatribuna :: Últimas opiniones

OPINIÓN-KHAMENEI

ÚLTIMAS NOTICIAS

ÚLTIMAS NOTICIAS

Últimas noticias // Diariocrítico.com

Estrella Digital :: Últimas noticias

Comentarios en: Últimas noticias

RSS de noticias de ultima-hora

PORTADAS

RSS de noticias de portada

NACIONAL ESPAÑA

Noticias nacionales | Diariocritico // Diariocrítico.com

MUNDO-KHAMENEI

CULTURA

ARTE

Arte y Cultura // Diariocrítico.com

TEATRO

Salud y vida saludable // Diariocrítico.com

SEXUALIDAD

Sexualidad y salud sexual y para disfrutar de las relaciones en pareja // Diariocrítico.com

SALUD

HISTORIA

Canal Historia // Diariocrítico.com

TURISMO

SOCIEDAD

Sociedad EL CONFIDENCIAL

LIFESTYLE

Estilo Hombre

MODA

CRÓNICA ROSA

Noticias del Corazón // Diariocrítico.com

LO MÁS LEÍDO

Lo más leido de la semana // Diariocrítico.com

CIENCIA

LIBROS

Noticias de libros, editoriales, autores y premios literarios // Diariocrítico.com

ECONOMÍA

Economía-EL CONFIDENCIAL

La actualidad económica en vivo - France 24

RSS de noticias de economia

COMENTARIOS DE ECONOMÍA

Comentarios de la Economía // Diariocrítico.com

Noticias economicas | Diariocritico // Diariocrítico.com

MERCADOS

Mercados - EL CONFIDENCIAL

EMPRESAS

Empresas - EL CONFIDENCIAL

FINANZAS

ACTUALIDAD FINANCIERA

Actualidad financiera // Diariocrítico.com

Finanzas personales- EL CONFIDENCIAL

BOLSAS

CRIPTOMONEDAS

TOROS

Toros, toda la información taurina // Diariocrítico.com

SEGUROS

VIDEOJUEGOS

Videojuegos // Diariocrítico.com

EDUCACIÓN

Educación // Diariocrítico.com