Intelligence per Token
Why ogram's real product is not access to a model, but the compression layer between frontier tokens and expert work
By Elliot Vaucher
Co-Founder & CEO, ogram
The lesson of the Fable and Mythos restriction is not that one provider became politically complicated. It is that access to frontier intelligence is now contingent. Governments can intervene. Providers can change terms. Models can disappear from a workflow overnight. Any serious organization that builds its intelligence strategy on direct dependence on one model provider has confused capability with sovereignty.
On June 12, 2026, Anthropic announced that it had to disable access to Claude Fable 5 and Claude Mythos 5 for all customers after a U.S. government directive restricting access by foreign nationals. Three days earlier, the company had described Fable 5 as the most capable model it had ever made generally available. Then it was gone.
The point is not to litigate that decision. Anthropic can be right about safety. The government can be right about national security. Customers can be right to feel exposed. All three things can be true at once. What matters is the architectural lesson: frontier model access is no longer a stable primitive. It is a supply chain. And supply chains break.
The provider is not the product
The AI industry still talks as if the model is the product. Better benchmark, larger context window, lower price per million tokens, stronger coding score, stronger finance score. These things matter. A weak model gives you weak raw material. But raw material is not the final object.
A token provider sells access to general intelligence. OpenAI, Anthropic, Google, Mistral, xAI, whoever comes next. They sell a stream of probabilistic computation that can be pointed at many things. What a client needs is not that. A client needs intelligence inside a specific working context: their jargon, their documents, their deal standards, their approval logic, their risk tolerance, their theory of how their market works.
Between those two things there is a gap. It is the gap between generic capability and usable expertise. Most organizations fill that gap with prompts, meetings, trial and error, and the quiet labor of people correcting a system that almost understood the assignment. That correction work is not incidental. It is the place where value is created or destroyed.
The model is the engine. The client context is the terrain. The value lives in the transmission between them.
The metric that matters
At ogram, we optimize for a simple quantity: intelligence per token. Not tokens per dollar. Not output length. Not how impressive the model sounds in isolation. The question is narrower and more consequential: how much useful, verified, domain-specific intelligence reaches the client for every token consumed?
This metric changes how you build. A naive system consumes tokens to rediscover context that already exists. It asks the model to infer the client's definitions, reconstruct the domain, guess which sources matter, and relearn the same preferences on every run. It pays repeatedly for ignorance.
A high-yield system does the opposite. It makes the client's knowledge computational before the model starts spending serious reasoning budget. It compresses the relevant domain into forms the model can use. It routes work to the provider that currently gives the best capability for the task. It prevents the model from wasting expensive reasoning on what the organization already knows.
The compression layer
This is what ogram is building: the compression layer between token providers and expert work. Compression here does not mean making things shorter. It means preserving the structure that matters while removing everything that does not. It means turning a client's expertise into a representation that a model can actually operate on.
Every serious organization has its own dialect. The same term means something different in a law firm, a wealth manager, a real estate capital markets team, and a family-owned industrial company. The same source can be authoritative in one context and irrelevant in another. The same risk can be fatal for one client and acceptable for another. Generic models do not know this. They approximate it from public language. That is not enough.
Our work is to build compression algorithms that account for the client's domain, jargon, source hierarchy, tacit standards, and professional judgment. The goal is not to make the model generally smarter. The goal is to make each token carry more of the client's actual world.
The durable advantage is not owning a model. It is owning the layer that makes any model think in your context.
The machinery underneath
Some of the machinery is familiar. Skills. Plug-ins. Connectors. Retrieval. Tool calls. These are useful, but they are not the architecture. They are the visible handles. The harder part is deciding what the handles are allowed to do, what context they receive, what evidence they must preserve, and how the system knows whether it is still solving the original problem.
That is where harness engineering begins. A serious agentic workflow is not a chat with tools attached. It is a controlled environment for reasoning under constraints. The harness defines the task boundaries, exposes the right operations, records state, watches for drift, and decides when another pass of reasoning is worth its cost.
Our own IP sits in that layer. ogram streams are custom task contracts for long-running work. They resemble a goal, but they are stricter: they specify the objective, the accepted evidence, the unresolved questions, the verification steps, the recovery state, and the conditions under which the task is allowed to conclude. They do not make a base model magically incapable of hallucinating. They make hallucination structurally harder to miss, cheaper to correct, and less likely to survive into the final artifact.
Alongside those streams, we build parsing and extraction services tuned for the documents that matter in a given domain. We build source-grounding loops that keep claims attached to evidence. We build compression routines that preserve firm-specific meaning through context changes and long-running sessions. We build memory structures that keep the system from paying again and again for knowledge it already earned.
Portability is not optional
Once intelligence is delivered through a supply chain, model portability stops being a procurement preference and becomes an operating requirement. A client should not have to rewrite its expertise because one provider changes access, pricing, policy, latency, geography, or safety behavior. The client owns the expertise. The provider supplies compute.
This is why ogram is provider-portable by design. If OpenAI is best for a task, use OpenAI. If Anthropic is best, use Anthropic. If a European or open-weight model becomes preferable for sovereignty, cost, or regulatory reasons, route there. The client-specific compression layer remains intact. The firm's know-how does not get trapped inside the affordances of a single model family.
Sovereignty is usually discussed as a data-residency problem. That matters, especially in Switzerland and Europe. But there is another sovereignty problem hiding underneath it: cognitive sovereignty. Who controls the representation of your expertise? Who decides how your domain is compressed, retrieved, reasoned over, and remembered? If the answer is a model provider, you have not adopted AI. You have outsourced part of your institutional mind.
Data sovereignty asks where your information lives. Cognitive sovereignty asks who controls the form in which your expertise becomes computable.
What the client actually buys
The client does not buy a wrapper around an API. They do not buy a dashboard. They do not even buy access to the current strongest model. They buy a higher conversion rate between tokens and useful judgment. They buy less reasoning waste. They buy a system that understands which parts of the context are expensive to lose.
That is why this problem is economically important. Frontier intelligence will keep getting cheaper in some ways and more expensive in others. Input tokens may fall in price. High-effort reasoning may remain costly. The models will change. The best provider will rotate. But the organization that has encoded its expertise into a portable compression layer will benefit from every improvement without rebuilding its operating logic each time.
This is the machine, then the adaptation. The frontier labs build increasingly powerful engines. ogram adapts that power to the client's terrain, standards, and institutional knowledge. The result is not generic AI used by a serious organization. It is the organization's own expertise running on the best available intelligence supply at that moment.
The layer that compounds
A model release is an event. A client-specific intelligence layer is an asset. It improves as the organization uses it. It learns which sources matter. It captures which corrections repeat. It preserves the logic of completed work. It accumulates the difference between what a generic model would have said and what the client's best expert would have meant.
That accumulation is the point. If all you have is provider access, every model transition is a migration. If you own the compression layer, every model transition is leverage. The provider changes. The client's intelligence remains portable, sovereign, and increasingly dense.
This is ogram's core proposition. We maximize intelligence per token for the final client, whatever the provider of the LLM. We sit between frontier token supply and expert work, and we make that interface specific, reliable, portable, and worth owning.
ogram builds the provider-portable intelligence layer for organizations whose expertise is too valuable to leave trapped in prompts, provider defaults, or human memory alone. The model changes. Your algorithm should keep running.