Fit for the runner

No one left to repair

Part I: The Efficiency Trap

Artificial intelligence has entered organizations through an accelerated sequence in which investment and adoption have advanced faster than settled evidence of enterprise-wide value. In 2024, global corporate AI investment reached $252.3 billion, while private investment in generative AI reached $33.9 billion. In the same period, reported organizational use rose sharply: 78% of survey respondents said their organizations used AI, and 71% said they used generative AI in at least one business function. Yet the value story remains more uneven than the investment story. McKinsey’s The State of AI in 2025 reports that most organizations are still early in scaling AI across the enterprise, that only 39% report any enterprise-level EBIT impact, and that most of those respondents attribute less than 5% of EBIT to AI. Stanford HAI, AI Index 2025 – Economy

That matters because, under these conditions, organizations are not responding only to demonstrated utility. They are responding to a climate in which hesitation begins to look imprudent. The question shifts from does this technology already justify itself? to can we afford to be seen as moving too slowly? In that atmosphere, boards, executives, vendors, workers, and clients do not simply evaluate AI one use case at a time; they begin adapting to a wider expectation that AI adoption signals seriousness, competitiveness, and institutional realism. The point is not that value is absent. It is that the social pressure to move is running ahead of any stable consensus on what value should count.

There were, of course, governance efforts. But they did not arrive with the same force as the investment wave. NIST released the AI Risk Management Framework on January 26, 2023, and explicitly describes it as intended for voluntary use. The European Commission announced that the EU AI Act entered into force on August 1, 2024, after the deployment wave was already well underway. The issue, then, is not that governance was nonexistent. It is that, in practice, the disciplines of return, competition, and visible movement have so far been stronger and faster organizational drivers than the disciplines of safety and governance.

Under those conditions, the first proof of value becomes predictable. When transformative gains remain uncertain, organizations reach first for what can be shown quickly. Labor is one of the most legible places to look. Hiring can be slowed, support functions tightened, administrative work compressed, and output per employee made to appear more efficient even while the deeper organizational consequences remain unclear. McKinsey reports that 80% of respondents say their companies set efficiency as an objective of their AI initiatives. Reuters reported in March 2026, citing Goldman Sachs economists, that AI contributed to 5,000 to 10,000 monthly net job losses across the most exposed U.S. industries during the previous year, and that Challenger, Gray & Christmas attributed 7% of planned U.S. layoffs in January to AI. This does not prove that AI is the single cause of workforce reduction. It does show that, under uncertainty, AI provides a more legible vocabulary for labor rationalization than for slower forms of organizational development.

The same narrowing appears in the metrics through which AI’s effects are publicly described. The dominant public measures increasingly track what is easiest to observe inside AI-mediated systems: task complexity, automation versus augmentation, autonomy, success, skills, and use cases. Anthropic’s January 2026 Economic Index report does exactly this using anonymized Claude.ai and first-party API transcripts, and says its “economic primitives” are generated by asking Claude specific questions about those transcripts. The report is useful. But it is not a neutral public observatory. It is a vendor-situated measurement regime built from within one company’s product boundary, using categories that are especially legible in transcript data. That does not invalidate the work. It does mean that its categories should not be mistaken for the whole phenomenon.

The deeper limitation is structural. Task-based measurement can tell us where AI appears, how it is used, and whether it seems to automate or augment visible units of work. What it sees much less well is what those tasks were doing before they were delegated. A task is not only an output unit. It can also be part of how workers acquire context, learn dependencies, develop judgment, and move from execution toward interpretation. Recent NBER work helps sharpen this point. One January 2026 paper, Enhancing Worker Productivity Without Automating Tasks, argues that the dominant task-based approach to AI and labor typically emphasizes replacement, while alternative models show that AI can raise productivity without automating tasks, with human capital mediating the effects.

A February 2026 paper, Chaining Tasks, Redefining Work: A Theory of AI Automation, goes further, arguing that production is a sequence of steps that firms bundle into tasks and then jobs, trading off specialization against coordination costs. Visible tasks, in other words, are not the whole organizational unit that matters.

This is where the real risk begins to come into view. If a delegated task was also transmitting tacit maps of the organization, exposing someone to bottlenecks, or keeping a worker inside the chain through which more complex understanding is formed, then its removal may not show up immediately as loss. Output may remain smooth. Dashboards may show efficiency. But something slower may already be weakening: the formation of judgment, the shared intelligibility of the organization, and the practical depth required to diagnose and repair failure when routines break. The epistemic trap follows directly. What is easiest to count begins to stand in for what is most important to preserve. Institutions then optimize for visible throughput and infer that what is not well measured is somehow secondary. But the capacities falling from view are not marginal. They are often the ones that make interpretation, coordination, and recovery possible in the first place.

That is why the first proof of AI’s value is so often a misleading one. It is not necessarily that AI has already made organizations more intelligent, more resilient, or more capable of transformation. It is that, in an environment shaped by investment pressure and adoption urgency, AI gives institutions a legible way to display movement: tighter staffing, faster throughput, visible efficiency, and a language of modernization that is easy to narrate upward. The harder question is what disappears behind that display.

Part II: Governing Meaning

If the first problem is that organizations are learning to recognize AI through the returns that are easiest to narrate, the second is deeper. AI does not only alter how work is executed. It also begins to alter how work is described, classified, prioritized, and made intelligible to the people inside the institution.

At that point, the issue is no longer only efficiency. It is the production of organizational meaning. This is where conventional AI governance starts to look too narrow. Most governance frameworks ask whether systems are accurate, secure, fair, transparent, or explainable. These are necessary questions. But they still tend to assume that the main thing to govern is the technology and its outputs. What they do not fully ask is whether the organization is preserving the conditions under which its own concepts remain shared, contestable, and connected to situated practice. OECD’s 2025 report on governing with AI is useful precisely because it links lack of transparency to accountability problems and warns that overreliance on AI can reduce human scrutiny and judgment. It also notes that many people perceive AI systems as neutral and impartial, which makes them more likely to accept outputs without sufficient interrogation. For the broader organizational background, see Weick, Sutcliffe, and Obstfeld’s classic account of sensemaking in organizations, Organizing and the Process of Sensemaking, and the more recent synthesis by Maitlis and Christianson, Organizational sensemaking: A systematic review and a co-evolutionary model.

The problem is not that AI has intentions in the human sense. It is that it becomes a recurrent mediator of linguistic selection. It amplifies some formulations, suppresses others, standardizes tone, stabilizes certain categories through repetition, and makes some distinctions easier to express than others. Once summaries, drafts, classifications, recommendations, and client-facing exchanges pass repeatedly through the same statistical machinery, the organization is no longer only using a tool. It is allowing an externalized system of pattern-production to enter the interpretive circuit of the institution. A recent bridge between sensemaking and AI appears in Tina Comes, Sensemaking AI: Introducing a research and design agenda for human–AI networks. Before AI, meaning inside organizations was already unstable and contested. A category introduced by one function could be resisted by another. A memo could be challenged. A decision could be reinterpreted in light of local knowledge, tacit constraints, or changing conditions. That friction was not merely inefficiency. It was one of the mechanisms by which shared understanding became inhabited. Meaning did not stay alive because it was perfectly defined. It stayed alive because it was continually defended, revised, and situated by the people who had to use it. This is very much in line with the sensemaking tradition developed by Weick and later expanded in the organizational literature. AI changes that dynamic because it is prolific, diffuse, and weakly accountable. Human meaning-makers can be asked to explain what they mean, defend a distinction, retract a category, or modify a judgment. AI-generated language is harder to contest in practice even when it is formally reviewable. It often arrives already smoothed, already plausible, already optimized for fluency and speed.

That is why the OECD’s discussion of automation bias matters beyond technical error: once AI outputs are perceived as neutral, efficient, or objective, organizations can begin to narrow the role of human judgment without ever explicitly deciding to do so.

This is why semantic drift matters.

A firm may believe it is simply accelerating routine communication or standardizing internal workflow. But if AI-mediated language gradually privileges speed over specificity, templates over situated judgment, and statistical typicality over interpretive nuance, the organization’s operative semantics of value can begin to shift. A business that once signaled depth, discretion, and tailored understanding may slowly begin signaling responsiveness, smoothness, and availability instead. Clients adapt to what is repeatedly offered to them. Managers respond to those new patterns as if they were external demands. What appears as neutral adaptation may, in part, be an internally induced semantic environment. For a technical literature adjacent to this concern, see Detecting hallucinations in large language models using semantic entropy, which works at the level of meaning rather than sequence, and recent work on semantic competence in LLMs such as Large language models without grounding recover non-sensorimotor but not sensorimotor features of human concepts.

This kind of change may not appear in conventional performance indicators. Satisfaction scores can remain high. Response times can improve. Administrative throughput can look better. What disappears is harder to see: the organization may no longer know with precision what it is promising, what distinctions it has ceased to make, or what forms of judgment its members are no longer being trained to exercise. It can remain operational while becoming interpretively thinner.

At this point, the governance question changes. The issue is no longer only whether AI outputs are accurate enough to use. It is whether the organization is still authoring the categories through which it understands itself.

Semantic governance begins from that recognition. It is not the governance of “meaning” in an abstract or mystical sense. It is the governance of the organizational conditions under which concepts remain shared, contestable, revisable, and linked to practice.

In practical terms, that means at least four things: identifying which concepts are structurally important enough to protect; watching whether AI-mediated language is flattening or drifting those concepts over time; preserving spaces in which human interpretation still occurs without immediate delegation; and refusing to treat visible output quality as sufficient evidence that the underlying interpretive conditions remain sound. Seen in this light, governing meaning is not an ornamental concern added after the technical work is done. It is part of governing how organizations continue to perceive, coordinate, and judge under AI conditions.

If that governance is absent, institutions may discover too late that they have not merely automated tasks. They have allowed the language through which they recognize problems, assign significance, and organize collective action to become thinner, smoother, and less their own.

Once AI begins to mediate categories, summaries, and recommendations at scale, governance must move beyond output quality and ask whether the organization still controls the interpretive conditions through which it understands itself.

This perspective also sits adjacent to emerging work in quantum-inspired computational semantics and quantum-like semantics analysis, which likewise treat meaning as structured, contextual, and irreducible to isolated tokens.

Part III: From Explanation to Provenance

One response to the opacity of large language models has been to demand that they show their reasoning. This seems sensible. If organizations are going to rely on AI systems to summarize, recommend, classify, and interpret, then some account of how those outputs were produced appears necessary. But this demand rests on a category mistake. What appears as visible reasoning is not the same thing as a faithful record of the path through which an output was produced. A chain of thought is still language: selective, stylized, and potentially post hoc. Anthropic’s 2025 research on chain-of-thought faithfulness states that there is no specific reason to assume a reported chain of thought accurately reflects a model’s true reasoning process, and OpenAI’s reasoning documentation says raw reasoning tokens are not exposed, only summaries are available.

This matters because institutions are beginning to treat explanation as if it were provenance. They inspect prompts, outputs, policy documents, and sometimes visible reasoning, and infer that the system has therefore become governable. But an explanation is not the same as a record of passage. It does not tell us, with sufficient fidelity, through what stages an output was produced, what internal route became decisive, or under what conditions the same result would emerge again. It offers a narrative at the surface of the system, not a durable map of the process that generated the result. For a technical framing of this distinction, see Anthropic’s discussion of “faithfulness” in chain-of-thought monitoring.

The distinction is not merely technical. It bears directly on institutional judgment. If the object being governed is only the final answer and its accompanying explanation, then governance remains attached to the point where the system narrates itself. Yet the real governance problem lies earlier: in the transformations that occur between input and output, and in the fact that those transformations are only partially visible to the people asked to trust them. What is missing is not more eloquent transparency, but process provenance: a more serious account of the path by which a recommendation, categorization, summary, or decision was produced. OpenAI’s reasoning guide is useful here precisely because it makes clear that reasoning tokens exist, are consumed internally, and are not themselves exposed as a full public trace.

This is why reproducibility at the surface is not enough. Even when prompts are repeated and context appears unchanged, outputs may still vary, and the visible reasoning may vary with them. That is not a minor inconvenience. It means the user-facing surface is not the full causal object. If the same apparent setup can yield different outcomes, governance cannot stop at what the user sees. It must ask what route the system actually took: what contextual material entered, what transformations were applied, what filtering or routing occurred, what tools were invoked, what model version or system state was in play, and which stages were stable enough to be meaningfully reconstructed later. OpenAI’s reasoning documentation notes that reasoning tokens are generated internally and discarded from context after the response, while only usage details and optional summaries are exposed.

Here the weakness of chain-of-thought governance becomes clear. It offers the comfort of legibility without guaranteeing fidelity. It can help a reader follow an answer, but it cannot be assumed to disclose the actual path that mattered. Worse, it may create the impression that the system is being supervised more deeply than it is. Anthropic’s own findings point in that direction: visible reasoning can omit precisely the hidden influences a governance system would most want to catch. See Anthropic’s examples of hint use and reward hacking remaining undisclosed in visible reasoning.

What is needed instead is a shift from explanation to provenance. The relevant question is not simply, “What did the model say about how it reasoned?” It is, “Through what path was this result produced, and can that path be logged, reconstructed, challenged, and compared across cases?” In high-stakes settings, that difference is decisive. A system that can explain itself fluently but cannot provide a sufficiently faithful record of its passage remains only partially governable.

This is not an argument for full inner transparency in any naïve sense. It is an argument for naming the governance gap correctly. The missing object is not better self-description by the model. It is a more serious form of process provenance: a layered record of transformation that is closer to the actual route of production than to the story told afterward.

There are already real observability efforts, but they mostly operate at the workflow layer. OpenTelemetry’s generative-AI semantic conventions define spans for inference, embeddings, retrievals, and tool execution, and MLflow Tracing says it captures inputs, outputs, and metadata associated with each intermediate step of a request. These are important advances. But they mainly trace the application path around the model, not a faithful inner causal path through the model itself. That distinction matters, because workflow observability is not the same thing as mechanistic provenance.

The social consequence is larger than technical uncertainty. Once AI systems begin mediating summaries, recommendations, and categories at scale, weak provenance does not remain a laboratory problem. It becomes an institutional one. Organizations may start to accept outputs because they are fluent, plausible, and accompanied by an explanation-like trace, while lacking the deeper record needed to determine how those outputs were formed and how stable they would remain under repetition or review. In that situation, governance lags behind dependence. The system appears legible enough to trust, but not traceable enough to truly govern. The gap between visible explanation and faithful traceability is exactly what both Anthropic’s chain-of-thought work and current observability tooling reveal from different angles.

The point, then, is not that reasoning should never be shown. It is that visible reasoning should not be mistaken for the thing institutions most need. If AI is to become part of organizational judgment, governance must move beyond the chain of thought and toward the chain of passage. What matters is not only whether the system can tell a coherent story about its answer, but whether the path that produced that answer can be treated as a serious object of oversight.

The deeper governance problem is not simply opacity, but the substitution of explanation for provenance; once that substitution is accepted, organizations risk governing only the surface at which AI narrates itself while the real transformation path remains insufficiently traceable.