Can Artificial Intelligence Think About its Thoughts?

Ricky J. Sethi’s recent article argues that the next practical step for generative AI is to give models a simple form of self-monitoring — metacognition, or “thinking about thinking.” The piece (originally published in The Conversation) explains that humans use metacognition whenever we notice we’re stuck on a sentence and change strategies; Sethi and his colleagues want AI to do something similar so models can detect confusion, estimate confidence, and change how they respond.

What the team proposes is a practical, engineering-friendly architecture: a metacognitive state vector that turns several qualitative judgments inside an AI into numbers the system can act on. The vector measures five dimensions: emotional awareness (does the content carry emotional charge?), correctness evaluation (how confident is the model?), experience matching (does this situation resemble past data?), conflict detection (are there contradictions?), and problem importance (how high are the stakes?). Those signals feed a controller that decides when the model should keep answering quickly and when it should switch to slower, more careful reasoning. This proposal is described in more technical form in the team’s paper “Towards Quantifying Metacognition in Ensembles of LLMs.”

Sethi borrows familiar psychology idioms — the System 1 / System 2 distinction popularized by Daniel Kahneman — to explain how the vector should guide a change of mode. System 1 is fast and intuitive; System 2 is slow and deliberative. Giving a model metacognitive signals would let it run fast by default, but “slow down” when confidence is low or conflicts are detected. The article also uses an orchestra metaphor: think of many models as musicians and the metacognitive controller as the conductor, assigning roles such as critic, fact-checker, or expert depending on the difficulty of the passage.

Why does this matter? Sethi emphasizes safety and transparency. In medicine, for example, a metacognitive system could flag contradictory symptoms and escalate to humans rather than issuing an overconfident diagnosis. In education, a tutoring AI could detect student confusion and change approach. In moderation, it could surface ambiguous cases for human review. Crucially, Sethi argues the approach makes model behavior more explainable: instead of opaque answers, systems could report confidence, contradictions, and why they switched to slower reasoning — a feature that could increase trust in regulated or high-stakes contexts.

A number of recent technical works show related, empirical lines of inquiry: researchers have been exploring whether LLMs can report or control internal activations and whether they already possess limited metacognitive abilities; these studies suggest a measurable “metacognitive space” inside models but also highlight limits.

Sethi’s framework is promising but still conceptual and faces important challenges. First, turning fuzzy introspective judgments into reliably calibrated numbers is hard: thresholds that trigger “slow thinking” must be tuned to avoid too much caution (inefficient) or overconfidence (unsafe). Second, ensembles and extra deliberation add computational cost and latency — a real concern where speed matters. Third, robustness and security questions loom: could adversaries manipulate metacognitive signals to force misbehavior or to hide uncertainty? Fourth, explainability isn’t solved by logging “low confidence” — humans still need actionable, meaningful reasons to make decisions. Finally, empirical validation is essential: the approach must be tested across domains and measured with clear metrics for when metacognition actually improves outcomes.

Sethi offers a clear, neuroscience-inspired blueprint for giving generative AI a narrow, operational form of self-awareness. It does not grant consciousness, but if implemented and validated carefully it could reduce certain failure modes, make models more interpretable, and improve safety in sensitive applications. The ultimate payoff will depend on careful calibration, robust testing, and thoughtful design to avoid new failure modes.

Photograph Courtesy: Gerd Altmann, Pixabay.