Gramener All Hands · May 2026

Agents are the
New Software

How AI shifts bottlenecks, why verifiability and context are the new scarce assets, and what each role should do next.

Anand S · Head of Innovation, Gramener · 15 May 2026

Sketchnote · Click to open full size

Agents are the New Software

We don't need any other software, really. If we just have an AI agent run, it can operate for 16 hours without mistakes. GPT-3.5 could manage that for about 30 seconds. That is the pace at which things are moving.

Task-Completion Time Horizons of Frontier AI — METR

metr.org/time-horizons — The benchmark tracking how long AI agents can work autonomously

Recently, a financial research client said: "Can we create a CIM?" — a Confidential Information Memorandum, the kind of heavyweight document banks and investors use to evaluate acquisitions.

We took an earlier CIM as a reference and gave the AI a single prompt:

Prompt given to Claude

Create a similar presentation including details on the industry, company overview, management overview and product overview for Koch Industries (https://www.kochinc.com/)

💡 Insightful

This entire slide deck was created based on research in approximately five minutes — and there didn't appear to be a single factual error in it. A task that once required days of analyst work.

AI-generated CIM — Koch Industries (example)

View full presentation →

Or consider teacher-coaching. A client said: "We have teachers conducting classes. We want to give them feedback. What should we tell the coach?" In a few hours, we processed hundreds of videos, telling coaches things like: this teacher was frequently reminding the student about upcoming assessments because of urgency, and should instead let the student come to their own conclusions.

Another client — Cengage — said: "From textbooks, we want to build test banks, instruction manuals, study guides, explainers, and so on." We put it into an agent pipeline and said: "For each chapter, create a test bank." It builds them one after another. All it takes is one instruction.

Higher Education Content Pipeline — live demo

files.s-anand.net/pages/ed-content-pipeline — Generating test banks, study guides, and explainers from textbook chapters

In other words, generation has become easy. But if generation has become easy, something else downstream becomes hard.

When research becomes easy, the new question is: how about the private data that specific teams have? What does it take to integrate and provide connectors to private data? Data stories have become easy to generate — but who is going to verify them?

Times of India reached out and said: "Can you create data stories for one of our features called Statoistics?" I said, "Okay, here are 30 data stories" — and that took half an hour. But then, for them to verify those 30 stories is a nightmare. Or take dashboards: a minute is all it takes to produce one. But that has to be integrated both upstream and downstream. Pipelines can be created extremely easily. But how do they fall into the production process?

The bottlenecks are constantly shifting.

Statoistics — 30 AI-generated data stories

sanand0.github.io/journalists/statnostics — 30 data story cards, generated in ~30 minutes. The hard part? Verifying them.

Before AI	What's hard now
Research	Private data access
Data stories	Verification
Dashboards	Integration
Pipelines	Workflows

💡 The Core Idea

Use AI for everything. That will help you find the new bottleneck. And where there are new bottlenecks, that's where there is scarcity — build your assets there.

Verifiability is an Asset

This is one of the most common questions I get. Everywhere I go, people are asking versions of the same thing:

A trainee at Straive: AI hallucinates during analysis. How can we ensure accuracy?
An analyst at AmEx: Will the data be right on my terabyte-scale dataset?
A data analyst at P&G: Will it know what cross-sell means, or just search for the string "cross-sell"?
A researcher at Harvard: How do I judge AI quality without expert knowledge?
A manager at J&J: My team doesn't know what questions to ask AI.
A leader at PGIM: How do we judge the quality of a prompt?

Ankor Rai and I delivered a talk at Prudential on verifiability. There are different ways to verify AI output:

Human expert in the loop. We as an organization are fantastic at that.
LLM as a judge. Several techniques exist — scoring, pairwise comparison, rubric-based evaluation.
Cross-referencing with citations. Provide sources so a human can verify easily.
Rules and executable logic. This is one of the most powerful approaches.

Talk: The Trust Problem — Verifiable Agents

The Trust Problem | Verifiable Agents — Ankor Rai & Anand S, Straive

💡 Counterintuitive insight

When you tell an LLM to calculate a bunch of numbers, it might make a mistake. When you tell it to write code to solve the same problem, it won't make a mistake. First, LLMs are very good at code. Second, code either works or it doesn't — and if it's working, it has almost certainly done the job right.

This rule-based approach can extend to many domains. For instance, there is a language called InsurLE. You can take an insurance claim and convert it into a set of rules. Then you have two pieces of code: "Here is my policy" and "Here is the claim." The question — Does this claim match the policy? — gets a verifiable yes or no. No hallucination. No ambiguity.

InsurLE — Insurance Logic Engine demo

mynkpdr.github.io/insurle-demo — Insurance claims converted to executable logic = verifiable outcomes

When AI starts generating content at scale, people will generate ten times the volume. You can be the person who comes in and starts verifying it. The ability to figure out the various ways of checking if AI output is good is power. Building that skill is an asset.

Context is an Asset

Save all your prompts. My prompts are in a public repository. Convert them into what are called skills — instructions that the agent can use when needed.

For instance, everything I've learned over the last 15 years about data storytelling is in one "Narrative Data Story" skill file. It's a long document capturing everything I know. What that means is that we have encapsulated intelligence.

AgentSkills.io — reusable instructions for AI agents

🧠 agentskills.io Type instructions once; the agent uses them whenever needed

Example: "Narrative Data Story" skill — 15 years of storytelling knowledge in one file

Loading…

scripts/agents/data-story/SKILL.md

Every conversation you have with your team in a meeting, every process you execute, every operational method you know — these are the kinds of things you can extract and save, and start using.

Here is an example. I asked the AI:

Prompt

What meetings am I not setting up that I should be? Help me find contacts. Here are all my LinkedIn contacts, people that I have emailed, my transcripts, people that I've written notes for. Research them and let me know who I should meet and why.

It came back with a prioritised list — IAS officers from a recent training session, leaders from government think tanks, clients from past engagements — a series of people I should be in touch with. Imagine using this to set up client meetings, finding from your network who you should be talking to. Or preparing for those meetings.

This is the prompt I use to prepare every day for meetings. I give it a standard prompt with historical context and live context — today's agenda, recent updates. This morning, for instance, it said in one sentence exactly what frame of mind I needed to go into a particular meeting with, what the counterpart needed to know, and what the agreed next steps were.

💡 Small things, big difference

It's a small thing — one sentence of prep. But even for personal productivity, it makes a huge difference when you're moving between six meetings a day.

Anand's public prompt repository

s-anand.net/blog/prompts — Save prompts like code. Version them. Reuse them.

Meeting preparation fragment — daily context prompt

Meeting Preparation prompt

All prompt fragments — reusable building blocks

Loading…

blog/pages/prompts/fragments.md

You can create several derived formats from a single piece of content too. A colleague mentioned we created a report for a consumer goods client — and then converted it into a rap song.

😄 True story

We took a dry business report and converted it into a rap song. Very different format, very engaging — and it got a fair bit of interest. The same content, multiple formats: video, sketchnote, report, data story, dashboard. Stop shipping one thing.

Derived Formats with Gemini — one source, many outputs

Derived Formats with Gemini — a single document spawning a podcast, sketchnote, video, and quiz

💡 The Asset Principle

When you have more context that you save — any kind of text, any kind of process, any kind of operational method, any knowledge — just put it into a document, text file, save it, and you will start building assets that become reusable in a variety of ways.

What Should I Do Next?

So, what does that mean for you practically?

Your role is changing. Here's how.

Data Scientists & Engineers

Stop delivering models & notebooks. Build agent-verifiable analytical workflows. Your value is no longer the model. It's knowing what to verify and how.

Developers

Learn agent harness engineering — Git checkpoints, Docker isolation, MCP connectors, CLI-friendly tools, AGENTS.md. The agent writes the code. You own the architecture it runs inside.

Business Analysts

Become excellent at examples, exclusions, acceptance criteria, and failure cases. These are what agents need to do your work well. You're writing their job description now.

Account Managers & Sales

Stop selling AI features. Sell verified outcomes. "We built it" is table stakes. "We can prove it works, and we'll own it if it doesn't" is the pitch.

Project Managers

Treat prompts, rubrics, validation rules, and postmortems as project assets. Version-controlled, reusable, owned. Not email threads.

Marketing

Shift from case studies to living demos. Every document you publish can now spawn a podcast, a sketchnote, a video, a quiz. Stop shipping one thing.

Train through games and agent-native challenges, not slide decks. If someone can't complete a task with an agent within a time limit, that's the signal. Not whether they passed a module.

"Use AI for everything. You will find the next bottleneck. The bottleneck is where scarcity lies, and that is where you need to build assets."

— Anand S, Gramener All Hands, May 2026

Top Takeaways

AI agents can now run for hours autonomously — what GPT-3.5 managed for 30 seconds, frontier models can sustain for 16 hours. The pace of capability improvement is non-linear. What's impossible today will be trivial next quarter.

Generation is no longer the bottleneck. Research, data stories, dashboards, and pipelines are all easy to generate. The bottleneck has shifted to private data access, verification, integration, and workflows. That's where the scarce value now lives.

Verifiability is a competitive asset. The most common client concern is not capability — it's trust. LLM-as-judge, human-in-the-loop, citations, and executable code/rules are the four verification strategies. The ability to pick the right one is rare and valuable.

Code beats arithmetic. When an LLM writes code to solve a numerical problem rather than computing directly, accuracy is dramatically higher — because code either works or it doesn't.

Context is the new moat. Save your prompts, processes, and domain expertise in reusable documents and skills. Every conversation, postmortem, and meeting transcript is raw material for an asset. The people who systematically capture and reuse context will compound their advantage.

One piece of content, ten formats. Documents, reports, and data can now spawn podcasts, sketchnotes, videos, quizzes, and dashboards automatically. The teams that publish multiple derived formats from a single source will vastly out-reach those who ship one thing.

Every role has a specific pivot. Data scientists: shift from models to verifiable workflows. Developers: own the architecture agents run inside. Business analysts: master examples and acceptance criteria. Sales: sell verified outcomes, not AI features. PM: version-control your prompts and rubrics.