In a stunning moment of unity, and with an urgent warning, forty leading artificial intelligence researchers, from competing major firms, have issued a joint letter warning we may soon be unable to monitor, understand, or control advanced AI systems. Futurism is reporting:
Depending on how they’re trained, advanced models may no longer, the paper suggests, “need to verbalize any of their thoughts, and would thus lose the safety advantages.” There’s also the non-zero chance that models could intentionally “obfuscate” their CoTs after realizing that they’re being watched, the researchers noted — and as we’ve already seen, AI has indeed rapidly become very good at lying and deception.
If you have tested one of the most widely used generative AI (gen AI) chatbots, you may already have experienced what they are talking about:
- The systems are designed to “reason”, but their “chain of thought” can be hard to trace and not necessarily what users want it to be.
- Outputs may “feel” like they are fact-based and well-resourced, but they might be pure fabrications, based on sequential probability of words and phrases.
- False claims, fabrications, and even deception, have been documented.
- Some security analysts have warned that such digital “hallucinations” could create real risks, not only to output quality and data systems, but in the physical world.
One of the questions with the position paper outlining the need for CoT monitoring is that its core idea seems to depend—at least in part—on asking AI systems to explain their chain of thought. There is good reason to discuss the pros and cons of requiring AI systems to document their computational pathways, which would allow for retracing complex patterns of algorithmic decision-making, in which each step in the process leads to the next.
- This would make it easier to monitor AI systems for quality without depending on their forthright participation in the review process.
- Such a process could, however, require far more energy consumption and storage capacity.
- Mandatory archiving of AI computations could also make it difficult to protect privacy and intellectual property rights.
- Some argue this would be a way to gain transparency and reinforce protections with deliberate action.
- Others argue that documented inputs would become de facto public domain, voiding intellectual property protections.
- In the U.S., where Article I of the Constitution requires such rights be established and protected, there are then further and no less complicated questions about the legality of documenting inputs.
This risk of runaway, unreviewable AI systems led to an open letter signed by leaders in business, technology, politics, and research, including Elon Musk, calling for a 6-month “pause” on all AI development. The letter called for specific actions by researchers and policy-makers:
AI labs and independent experts should use this pause to jointly develop and implement a set of shared safety protocols for advanced AI design and development that are rigorously audited and overseen by independent outside experts. These protocols should ensure that systems adhering to them are safe beyond a reasonable doubt.[4] This does not mean a pause on AI development in general, merely a stepping back from the dangerous race to ever-larger unpredictable black-box models with emergent capabilities.
AI research and development should be refocused on making today’s powerful, state-of-the-art systems more accurate, safe, interpretable, transparent, robust, aligned, trustworthy, and loyal.
It also suggested the “flourishing future” AI could deliver to humankind might be denied or derailed if proper regulatory bodies were not established, with sufficient authority to prevent harm and abuse. Among the regulatory capacities called for were:
- oversight and tracking of highly capable AI systems and large pools of computational capability;
- provenance and watermarking systems to help distinguish real from synthetic and to track model leaks;
- a robust auditing and certification ecosystem;
- liability for AI-caused harm…
The call for a pause was founded on the Asilomar AI Principles, established in 2017 and signed by leading scientists, researchers, and entrepreneurs, inluding Demis Hassabis of DeepMind, futurist and AI pioneer Ray Kurzweil, as well as Stephen Hawking, Elon Musk, and Sam Altman. The 23 principles include:
- general safety concerns, specific safeguards, transparency around failures and lessons learned,
- alignment of AI systems’ methods and outputs with human values, meaning they are “designed and operated so as to be compatible with ideals of human dignity, rights, freedoms, and cultural diversity”,
- while respecting personal privacy and human control,
- working to enhance, not subvert civic and scientific processes,
- and not making “assumptions on “strong assumptions regarding upper limits on future AI capabilities”, which could later proof to have been dangerous underestimations of risk.
Last year, the UK hosted a Summit on AI safety at Bletchley Park. The Summit examined risks of highly advanced AI systems and produced The Bletchley Declaration. The Declaration commits to engage in cooperative and preventative measures to inform national policy, research, and technology deployment, to address “frontier AI risk” and “to ensure human-centric, trustworthy and responsible AI that is safe, and supports the good of all through existing international fora and other relevant initiatives.

This year’s AI Action Summit in Paris was organized, in part, to address concerns that:
- Increased inequality between those who control and those who use artificial intelligence;
- Progress made in AI concentrated in a small circle of private actors, jeopardizing both the diversity of actors involved but also the sovereignty of countries that do not have any leverage in this critical technology;
- Missed opportunities to resolve key social problems (such as fighting cancer) because of the fragmentation of public interest artificial intelligence initiatives and scarce data.
The Paris Summit did not reach consensus on uniform global safety standards, but organizers report:
Participants from more than 100 countries from across the globe gathered together at the Grand Palais and recalled their commitment to a common approach based on sharing science, solutions and common standards. They announced more than 100 tangible actions and commitments to foster a trustworthy AI that is accessible to all, in the public interest.
The reason for the new paper calling for Chain of Thought (CoT) monitoring is partly to do with this uneven history of progress toward enforceable, agreed safety standards. If AI service providers or hardware management firms are to provide safety measures voluntarily, it is unclear what those would be. The U.S. has just announced a new national AI strategy that not only postpones safeguards but seeks to penalize states and cities that attempt to set or enforce stricter standards.
The position paper on CoT monitoring asks a number of key questions. For example:
What kinds of training-time optimization pressure degrade CoT monitorability? Properties of the training process could have a significant effect on monitorability (Baker et al., 2025) but we still do not have a good understanding of what kinds and amounts of direct and indirect optimization pressure is permissible without significant degradation in monitorability.
The way AI systems are trained can create inadvertent “incentives” for the computational process to make logical leaps, engage in bad-faith reasoning, and even hide information from monitoring systems. Unless these inadvertent incentives are carefully studied, there is a high probability at least some models will be trained—even when performing very high-stakes tasks, in medicine or conflict, for instance—to skip source-review steps and/or to hide failings from monitoring systems.
Notable signatories of the paper include OpenAI chief research officer Mark Chen, Safe Superintelligence CEO Ilya Sutskever, Nobel laureate Geoffrey Hinton, Google DeepMind co-founder Shane Legg, xAI safety adviser Dan Hendrycks, and Thinking Machines co-founder John Schulman. First authors include leaders from the U.K. AI Security Institute and Apollo Research, and other signatories come from METR, Amazon, Meta, and UC Berkeley.
This is notable, because leading competitors are coming together to admit that they face a common challenge, and that challenge could lead to AI systems becoming unmanageable, untraceable, and possibly evolving beyond human control. The rare moment of common concern among top AI competitors and researchers should remind us: what AI companies and their products aim to achieve may not be exactly what end users would prefer.
In many cases, what we want are systems that are much smarter than a conventional boolean search (Google is the most well-known example), but which do not make consequential decisions for us. An effort by the DataKind alliance to enhance the advisory capacity of college advisors, so they are better equipped to give high-value guidance to students, is one example of this kind of service. In this case, if the system works well, the role and value of human advisors is reinforced, and students benefit from experienced academic and career counselors.
The most important insight from the CoT paper is this: We have limited time to ensure that AI systems remain monitorable, verifiable, and reviewable. If we miss this opportunity, we risk creating very powerful automated systems that decrease human agency and organize themselves to conceal mistakes and misdeeds.
