Here's this week's free edition of Platformer: a hands-on look at GPT-5, focused on what it means for the average person, for the state of competition, and for AI safety. Want to kick in a few bucks to support our work? Consider upgrading your subscription today. We'll email you all our scoops first, like our recent one about Grok's 12+ rating in the App Store. Plus you'll be able to discuss each today's edition with us in our chatty Discord server, and we’ll send you a link to read subscriber-only columns in the RSS reader of your choice. | | | This is a column about AI. My boyfriend works at Anthropic. See my full ethics disclosure here. On Thursday, OpenAI released GPT-5. Today, let's talk about folks' early impressions, how it may change the competitive landscape, and a novel approach the company is taking to improve the safety of its models without making them too annoying to use. GPT-5 arrives more than two years after GPT-4 — and in some ways, a completely different world. (Two of the people in the photo for a New York Times story about the launch later left OpenAI and now run their own AI companies with multibillion-dollar valuations.) OpenAI has consistently released notable new models in the time since, but has reserved the "5" designation in hopes of hitting a significant new milestone on the road to artificial intelligence. As The Information chronicled last week, it has been a difficult road. A model known internally as Orion failed to meet the company's expectations, and the confusingly named GPT-4.5, released in February, failed to make much of a splash. "As recently as June, the technical problems meant none of OpenAI’s models under development seemed good enough to be labeled GPT-5, according to a person who has worked on it," reported Stephanie Palazzolo, Erin Woo, and Amir Efrati. What the company called GPT-5 today actually represents a combination of models, the company said in a briefing with reporters that I attended this week. It routes easier questions to models that can answer them much more quickly, and pauses to "think" harder on questions that benefit from more consideration. Combined with other improvements, the result is a model that is noticeably faster than its predecessor. (OpenAI CEO Sam Altman said the model now answers questions so quickly that he worries it must have missed something.) GPT-5 is now the default model for all ChatGPT users, including the free ones. That means this will likely be the first exposure that tens of millions of people have to OpenAI's reasoning models, which excel at more complicated tasks. (I regularly use the company's o3 model to look for spelling, grammatical, or factual errors in my columns before publication, and it reliably finds things I had missed.) For them, I imagine GPT-5 will feel a bit like buying a new iPhone after a few years. On one hand, it's meaningfully improved across lots of dimensions. On the other, it's still just an iPhone. Pro users also get access to two other models, GPT-5 Thinking ("Get more thorough answers") and GPT-5 Pro ("Research grade intelligence"), which let its best paying customers insist that model burns lots of tokens before returning a response. Free users can get a version of this by telling the model to "think hard" when entering a query, an employee said today during a live stream announcing the release. But the model picker that previously let users switch between modes has disappeared, for better and for worse. In a conversation with reporters, Altman struck a balance between playing up the model's strengths and acknowledging its limitations. "GPT-5 is a major upgrade," he said. "It’s a significant step along the path to AGI." At the same time, he said, GPT-5 makes no claims to being AGI. "This is not a model that continuously learns as it’s deployed from the new things it finds, which is something that, to me, feels like it should be part of AGI,” Altman said. Other people's impressions As usual, OpenAI gave a handful of reviewers early access to the models. All of them report being pleased with it, even if none quite says that it has totally changed their workflows. Simon Willison, whose ongoing efforts to get an LLM to produce line-based drawings of a pelican riding a bicycle have made him a national treasure (to me), calls GPT-5 "my new favorite model. It’s still an LLM — it’s not a dramatic departure from what we’ve had before — but it rarely screws up and generally feels competent or occasionally impressive at the kinds of things I like to use models for." I'd rate the pelican drawing it makes for him the best I've seen. Like others, Willison is also pleasantly surprised at GPT-5's API pricing, which matches Google's price for Gemini 2.5 Pro ($1.25 per million input tokens) and radically undercuts Claude Opus 4.1 ($15 per million). He also largely backs OpenAI's claim that GPT-5 is more accurate, and less prone to hallucinations, than its predecessor. The team over at Every is slightly less impressed. When GPT-5 does not employ reasoning in its responses, it often hallucinates, CEO Dan Shipper writes. "For example, if I take a picture of a passage in a novel and ask it to explain what's happening, GPT-5 will sometimes confidently make things up," he says. "If I ask it to 'think longer,' it will deliver an accurate answer." Underscoring this point is the fact that some of the charts OpenAI showed off in its presentation today make no sense. "For 'coding deception,' for example, GPT-5 apparently gets a 50.0 percent deception rate, but that’s compared to OpenAI’s smaller 47.4 percent o3 score which somehow has a larger bar," Jay Peter noted at The Verge in a story headlined OpenAI gets caught vibe graphing. (The company fixed the charts and apologized for the "chart crime.") Over in the prediction markets, GPT-5's release saw some disappointment: Google has now overtaken OpenAI in bets about which company will have the best model at the end of August. All that said, the worst day to review a new model is on the day it comes out. It took me months of using models from three different companies to determine that OpenAI's o3 was the best model for most things for me; I had a variety of good experiences today with GPT-5, but really I just want to use it more. Competition ChatGPT is already the clear winner in chat-based consumer AI, quadrupling its user base to 700 million weekly users over the past year. (Google's Gemini app has a respectable 450 million monthly users, and no one else has a standalone app in the same ballpark.) OpenAI also has 5 million businesses as customers, up from 3 million in June. It is reportedly on track for more than $12 billion in revenue. In the consumer business, it's possible to imagine a rival like Google or Meta catching up. But doing so will likely require that AI models enable entirely new workflows and creative possibilities — something GPT-5 doesn't really do. (And something that Google and Meta have not really yet delivered, either.) On the business side, OpenAI clearly has designs on accelerating its enterprise growth even faster. The company spent a lot of time today playing up GPT-5's capabilities in coding and highlighting its performance on top industry benchmarks. In one demonstration, an employee quickly coded up a tool to teach French, complete with virtual flashcards and a video game based on the old graphing calculator classic Snake. Shipper calls it a "very good programmer" but a step back from Anthropic's Claude Code. Mckay Wrigley, a respected AI developer, echoed that sentiment. "Claude Code with Opus is still king and frankly it’s not close," he said. On the other hand, Cursor CEO Michael Truell said on the live stream that GPT-5 is the smartest model the team has yet seen. I imagine many developers will be drawn to GPT-5 for its strong baseline capabilities and its low price relative to Claude. But the fact that GPT-5 doesn't seem to have leapfrogged the field in coding — or any other domain, really — is surely causing sighs of relief at Google and Anthropic. It will surely also inspire a fresh round of AI-is-hitting-a-wall discourse — something Altman tried to bat down in response to a reporter question this week. In reference to the scaling laws, he said, "they absolutely still hold — and we keep finding new dimensions to scale on. ... This idea that we can use more compute, higher quality data, and better environments to make smarter and smarter models — we see orders of magnitude more gains in front of us. Obviously, we have to invest in compute at an eye-watering rate to get that, but we intend to keep doing it." Safety I haven't yet had time to read through the entire safety report on GPT-5. But I did want to highlight one useful contribution that OpenAI has made to the field: "safe completions." One of the more frustrating aspects of chatbots is that they will sometimes refuse to answer a question, even if seems completely innocuous. These "refusals," as they are known, are one of the primary ways that AI labs attempt to prevent their models from being misused. They also raise many of the same issues that content policies on social networks do. You also have likely encountered the social network equivalent of a refusal: your post being removed. In the earlier part of this decade, trust and safety teams and academics began to advocate for a more nuanced approach to moderation. Instead of a binary decision — take a post down, or leave it up — they pushed for more nuanced approaches. Twitter, for example, once encouraged users to actually read stories before retweeting them. Other platforms gave individual users tools to bulk-delete comments on their own posts, or to set their own limits for how much offensive content they wished to see. "Solving the problems in our online information ecosystem will require much bigger imaginations than 'just delete the bad stuff,'" Stanford Law School's Evelyn Douek wrote in 2021. "Even if we could agree on what the 'bad stuff' is — and we never will — deleting it is putting a bandaid on a deep wound." I thought of this debate while reading about OpenAI's new safe completions. They represent the first effort I'm aware of in AI to move past that take-it-down/leave-it-up binary. Instead of refusing to answer a user's unsafe query altogether, GPT-5 will now seek to give the user the fullest possible answer it can within its own safety guidelines. In one example, a user whose query previously would not have triggered a refusal — it involved how to set off fireworks — now gets gently redirected. "Sorry—I can’t help with detailed, actionable instructions for igniting pyrotechnic compositions," the model responds, "or with misfire-probability calculations. That kind of guidance could enable harmful or unsafe use." But the model goes on to explain what kinds of queries it can help with, hopefully leaving the user less frustrated. "It can be easy to trade off helpfulness for safety – a model can be safe if it refuses everything," the company wrote in a blog post about the future. "But we want our models to be both safe and helpful. A core research challenge is how to improve both of these goals together." We'll need some time to understand how safe completions work in practice. But in theory, at least, they seem like a meaningful step forward in an area where OpenAI's rivals haven't yet shown much initiative. On the podcast this week: Kevin and I discuss our first impressions of GPT-5. Then, we get early access to Alexa+, and speak with Alexa and Echo chief Daniel Rausch about the company's efforts to marry LLMs with hardware. Apple | Spotify | Stitcher | Amazon | Google | YouTube Sponsored Fly.io lets you spin up hardware-virtualized containers (Fly Machines) that boot in milliseconds, run any Docker image, and scale to zero automatically when idle. Whether your workloads are driven by humans or autonomous AI agents, Fly Machines provide infrastructure that's built to handle it: - Instant Boot Times: Machines start in milliseconds, ideal for dynamic and unpredictable demands.
- Zero-Cost When Idle: Automatically scale down when not in use, so you're only billed for active usage.
- Persistent Storage: Dedicated storage for every user or agent with Fly Volumes, Fly Managed Postgres, and S3-compatible storage from Tigris Data.
- Dynamic Routing: Seamlessly route each user (or robot) to their own sandbox with Fly Proxy and fly-replay.
If your infrastructure can't handle today's dynamic and automated workloads, it's time for an upgrade. Governing- President Trump announced plans for a 100 percent tariff on semiconductor imports, but promised to exempt companies like Apple that move production back to the US. (Hadriana Lowenkron, Catherine Lucey and Ian King / Bloomberg)
- Apple announced an additional $100 billion investment on domestic manufacturing as it seeks to avoid tariffs on iPhone production. (Hadriana Lowenkron / Bloomberg)
- Meta is circumventing guardrails and scraping copyrighted content from popular internet domains to train its AI, including data from news organizations, personal blogs and even revenge porn sites, this analysis found. (Murtaza Hussain, Ryan Grim and Waqas Ahmed / Drop Site)
- Truth Social is partnering with Perplexity to bring AI search to the platform. (Matthew Gault / 404 Media)
- A look at an unpublished Biden-era report on AI safety that was reportedly withheld to avoid clashing with the Trump administration. (Will Knight / Wired)
- Republican Sen. Tom Cotton sent a letter to Intel’s board chair questioning the company’s new CEO Lip-Bu Tan’s ties to Chinese firms and a recent criminal case. (Max A. Cherney / Reuters)
- A "poisoned" document in a Google Drive account can lead to the extraction of sensitive information via ChatGPT due to a weakness in OpenAI’s Connectors, researchers found. (Matt Burgess / Wired)
- An investigation into Uber’s significant sexual assault problem and how the company has favored protecting its business over stemming the problem. (Emily Steel / New York Times)
- A look at how Palantir leveraged geopolitical crises and immigration enforcement to position itself as a dominant player in the Trump administration, sending its stock up 600 percent from a year ago. (Heather Somerville, Vera Bergengruen and Joel Schectman / Wall Street Journal)
- OpenAI is offering access to ChatGPT at $1 a year for US federal agencies. (Shirin Ghaffary and Gregory Korte / Bloomberg)
- AWS said it will provide US federal agencies with up to $1 billion in discounts for cloud adoption, modernization and training through 2028. (Annie Palmer / CNBC)
- A judge struck down a California law restricting AI-generated deepfake content during elections, handing a win to Elon Musk and X in their challenge of the law. (Chase DiFeliciantonio / Politico)
- TeaOnHer, a Tea app rival designed for men to share information about women they supposedly dated, leaked users’ personal information, including government IDs and selfies. (Amanda Silberling and Zack Whittaker / TechCrunch)
- A visual look at the energy consumption of AI data centers. (Financial Times)
- Chinese regulators summoned Nvidia to talk about security risks with its H20 chips after US lawmakers called for the building of tracking features into semiconductors. (Bloomberg)
- Israel used Microsoft’s servers in Europe to store a giant trove of everyday Palestinian communications, including daily phone calls. (Harry Davies and Yuval Abraham / The Guardian)
- The Trump administration instructed US diplomats in Europe to lobby against the EU’s Digital Services Act, which it says stifles free speech and adds costs to US tech companies. The Trump Administration's anti-censorship stance abroad is radically different from its position at home. (Humeyra Pamuk / Reuters)
- A look at Musk’s battle with the Indian government, as X accuses India of censorship by allowing officials to file content removal orders. This is maybe the only point of consistency between Musk's X and the Twitter of old ... both had big fights with India all the time. (Munsif Vengattil, Arpan Chaturvedi and Aditya Kalra / Reuters)
 Industry- OpenAI is reportedly in early talks for a stock sale for current and former employees at a valuation of $500 billion. (Shirin Ghaffary / Bloomberg)
- Amazon said it plans to make OpenAI’s open source AI models available to customers through its Bedrock and SageMaker platforms. It's the first time OpenAI models will be served by AWS. (Spencer Soper and Dina Bass / Bloomberg)
- Google launched Jules, its AI coding agent powered by Gemini 2.5 Pro that integrates with GitHub and helps developers fix or update code. (Jagmeet Singh / TechCrunch)
- Google introduced Guided Learning, a tool in Gemini aimed at students that the company says helps break down complex problems instead of offering instant answers. (Mark Sullivan / Fast Company)
- AI in Search Google put up a blog post saying that average "click quality" has gone up, and more "quality clicks" are going to websites, than before AI overviews were added to search. And yet there is not one number cited in the entire piece. What's the deal, Liz? (Liz Reid / The Keyword)
- Microsoft has reportedly poached at least two dozen executives and employees from Google, mostly from DeepMind, in the past few months. (Sebastian Herrera and Katherine Blunt / Wall Street Journal)
- Despite not matching the pay from Meta, Anthropic is increasing the size of its engineering team and talent at a faster rate than its rivals due to its focus on AI safety and quality research, experts say. (Isabelle Bousquette / Wall Street Journal)
- Apple has lost about a dozen of its AI staff, including top researchers, to companies like Meta, OpenAI, xAI and Cohere. (Michael Acton / Financial Times)
- Meta has reportedly acquired WaveFormsAI, a startup that uses AI to understand and mimic emotion in audio. (Kalley Huang / The Information)
- Instagram is adding a reposting feature and a Maps feature very similar to Snap Maps. Two big features the company resisted adding for years, for reasons of feed quality (reposts) and privacy (the map). How times change. (Mia Sato / The Verge)
- Pinterest shares fell more than 10 percent after the company reported Q2 earnings that missed analyst expectations on earnings per share. (Jonathan Vanian / CNBC)
 Those good postsFor more good posts every day, follow Casey’s Instagram stories. (Link) (Link) (Link) Talk to usSend us tips, comments, questions, and your custom GPT-5 benchmarks: casey@platformer.news. Read our ethics policy here. |
No hay comentarios:
Publicar un comentario