BankerToolBench
Complete investment-banking workflows: task framing, data gathering, modelling, and review-ready output, scored against practitioner-written rubrics.
Claude and Codex provide the platform. ogram sits on top as a tailored machine for the work your firm actually does.
It is shaped around your documents, your standards, and your review process. Long-running, auditable, decision-grade.
Not another chat window. Not a generic assistant.
ogram edge is calibrated against public benchmarks built by independent practitioners: one for investment banking, one for law. They test complete work, from task framing to reviewable deliverable, under expert rubrics and institutional-grade standards.
Complete investment-banking workflows: task framing, data gathering, modelling, and review-ready output, scored against practitioner-written rubrics.
Full legal matters scored under an all-pass standard. A task passes only when every rubric criterion passes: facts, conclusions, citations, structure, and analysis.
current top LAB all-pass score for end-to-end legal tasks under Harvey's strict scoring standard.
Even the newest frontier model completes roughly one task in ten. Professional work is far from saturated.
Harvey's initial LAB results reported frontier models on a holdout set reflecting LAB's task distribution. The May 28 update adds Opus 4.8 at 10.4%, the first score above 10% under the all-pass standard.
ogram edge places reliability infrastructure between frontier models and the deliverable: bounded search, verified execution, controlled review, and traceability back to the source paragraph. Against these benchmarks, performance is measured. It is not declared.
Hugging Face and Harvey are marks of their respective holders, cited here as references to public benchmarks. LAB ranking data: Harvey, Initial Results on Legal Agent Benchmark, May 26, 2026, and Opus 4.8 update, May 28, 2026.
The partner keeps working in Claude or Codex. When the question needs specialised depth, ogram runs the dedicated machine and returns the answer.
The partner works inside Claude or Codex, in the language and rhythm they already use.
When the task needs depth, the platform sends a bounded call to the ogram machine built for that firm and workflow.
The dedicated machine runs the specialised work: reads the authorised sources, checks the reasoning, and keeps the task recoverable.
The answer comes back into the partner's working surface with the reasoning, sources, and limits intact.
Hallucination, memory loss, context rot, agent drift. These are not edge cases. They are structural failure modes that make current harnesses unsuitable for investment-grade work. ogram addresses each one architecturally.
Structural verification and source-grounding at every inferential step. Every number, every citation, traceable to its origin.
Persistent state management across extended agent sessions. Nothing is forgotten between the start of the diligence and the final memo.
Active monitoring and remediation of context degradation over long horizons. The agent that finishes is as sharp as the agent that began.
Preservation of critical information when context windows compress. The facts that matter survive. The noise does not.
Continuous alignment between agent behaviour and the original objective. No wandering. No scope creep. No polite deviation.
Checkpoint, recovery, and resumption of multi-hour workflows. A crashed runtime does not mean a lost afternoon.
Coherent coordination of specialised sub-agents working parallel workstreams. One plan, many hands, one output.
Source citations. Long-context traces. Checkpoint logs. The scaffolding produces documents the way a reliable analyst does — because investment-grade work must survive a review.
The Acquisition Committee notes that consolidated net revenue for the fiscal year ending 31 December 2023 reached CHF 2,341.8 million, representing a year-on-year increase of 7.4% against the prior period. Adjustments under Schedule 4.7(b) reduce normalised EBITDA to CHF 412.6 million.
Investment-grade AI is not a single generation problem. It is a control, memory, and verification problem carried across the full life of the case.
Discuss your workflow→