Agents as Code

14 June 2026

As agents break out of general-purpose apps and into specific, purpose-built workflows, the right way to build one is no longer as an app, but as code.

Context

Early agents were the code. Before Claude Code and Codex, you’d build an agent by manually constructing the agent loop and providing the tools. Those agents were primitive, but still programmable.

The real agentic era came with harnesses from Anthropic and OpenAI. Those agents were single-purpose (coding) and the models were fine-tuned to work efficiently with those harnesses (something that becomes much less important). They were also apps, with few options to tweak them.

As the next logical step, the harnesses would become self-configurable. The mainstream saw this first with the introduction of OpenClaw, but that particular feature was coming from pi, the harness powering it. The agents became platforms, yet still not fully programmable.

Next, we will see the era of agentic workflows defined as code.

Insights

A few interesting trends became apparent during this development:

The harnesses became less and less dependent on specific models. You could easily use a GPT model in Claude Code, and vice versa. More interestingly, you can make a custom harness, and Opus or Codex would happily run inside it. The major success of Devin, Amp, and many others is the best proof of that.
Agents need surprisingly few tools to operate. All you need for a functional coding/computer-use agent is a bash tool, everything else is a nice-to-have. pi has four default tools, yet it is perfectly capable of writing code and operating on non-coding use cases.
There’s nothing sacred in the harness. The system prompts are mostly a role definition and a list of tool descriptions. All the magic is in the models.
Skills are interesting. On one hand, the most popular skills are generic (frontend design, code refactoring, image generation) and will most likely be a part of RL for the future models and therefore become redundant over time. On the other hand, we will continue to see the proliferation of ultra-niche, domain-specific skills that are tailored for specific products and teams. Those will mostly be private and written from scratch.
Background agents are emerging. It’s becoming increasingly common to make agents run in the background. Often, those runs can even be triggered async, not relying on human input to start.

We’re very likely to see a wave of experimentation with products that look nothing like existing coding agent apps and platforms. Those new products require new tools.

Building blocks

Agents are made of four components: model, harness, context, and I/O.

Model is whatever powers the agent. Think of it as a fuel. Different models have different capabilities, and some models are certainly better than others, but they are also somewhat of a commodity. It’s the raw intelligence.

Harness is whatever controls it. It’s like an engine, operating the entire system in a specific way. The harness lets you define how exactly the system should operate. It consists of the system prompt and the tools.

Context describes the conditions. It’s an environment in which the system operates. Context is defined by skills, MCPs, and user prompts.

Input and output let you steer and observe the system. Think of them as the controls and dashboard. We have triggers on the input side, traces and logs on the output side, and gateways as the bidirectional link.

Codifying the concepts

Now that we’ve dissected the parts of the agent, let’s see how we could implement them in code.

Defining a model is trivial, there’s already a clear pattern for that:

const agent = new Agent({
  model: OpenAI('gpt-5-5-codex'),
});

Defining the harness is also simple: let the Agent take both tools and a system prompt:

const read = new Tool({
  title: 'read',
  description: '…',
  inputSchema: …,
  execute: (args) => {
    // Read file
  },
});

const agent = new Agent({
  system: 'You are a coding agent. You have access to the following tools: …',
  tools: [read, write, edit, bash],
});

Context is also easy to define. Skills are mostly markdown files, and MCP servers are just a bag of tools with some auth.

// Simplified
const grafanaMcp = new MCP({
  url: process.env.GRAFANA_MCP_URL,
  accessToken: process.env.GRAFANA_MCP_TOKEN,
  tools: [getDatasources, query, searchDashboards],
});

// Simplified
const triage = new Skill({
  name: 'triage',
  description: 'Helps debugging production issues',
  content: '…',
});

const agent = new Agent({
  mcp: [grafanaMcp],
  skills: [triage],
});

Finally, the I/O, arguably the most opinionated part of the stack.

The easiest choice is to not provide any abstractions here and let the application define it.

But looking at the agentic use cases in the wild, I’d argue there are two separate APIs that would be valuable to the agent developer.

Generally, we have assistants operating through a shared medium (e.g. a Telegram or Slack bot), as well as fully automated workflows triggered by external events (e.g. a GitHub webhook).

Triggers could abstract the pull/push based receivers into a unified Agent.on API:

const reviewAgent = new Agent({
  // …
});

const prTrigger = ghReceiver.pullRequest({
  actions: ['opened', 'synchronize', 'reopened', 'ready_for_review'],
});

reviewAgent.on(prTrigger, async (event) => {
  const session = reviewAgent.createSession();
  // Reply with findings on GitHub
});

For gateways, we could offer even something more coupled:

const agent = new Agent({
  // …
});

// Long polling or webhook-based
serve(agent, telegramGateway({ token: process.env.TELEGRAM_TOKEN }));

In action

Once we have all the pieces, it’s easy to see how those fragments shape the agent.

A general purpose coding agent could be defined like this:

const agent = new Agent({
  model: OpenAI('gpt-5-5-codex'),
  system: codex.system,
  tools: codex.tools,
});

A PR review bot could be implemented like this:

const reviewAgent = new Agent({
  model: OpenAI('gpt-5-5-codex'),
  system: pi.system,
  tools: pi.tools,
  skills: [prReview],
});

reviewAgent.on(prTrigger, (event) => {
  reviewAgent.createSession({
    prompt: `Review this pull request: ${event.url}`,
  });
});

A triage agent could be as simple as:

const triageAgent = new Agent({
  model: Anthropic('opus-4-8'),
  system: claudeCode.system,
  tools: claudeCode.tools,
  mcp: [grafanaMcp, linearMcp],
  skills: [triage],
});

triageAgent.on(grafanaAlertTrigger, (event) => {
  triageAgent.createSession({
    prompt: `Triage this incident and make a Linear ticket with findings: ${event.id}`,
  });
});

A general purpose assistant could be made like this:

const assistantAgent = new Agent({
  model: Moonshot('kimi-2-6'),
  system: pi.system,
  tools: [...pi.tools, webSearch, webFetch],
});

serve(assistantAgent, telegramGateway({ token: process.env.TELEGRAM_TOKEN }));

Benefits

By making our agents in code, we get everything that’s great about software development for free:

Version control: every change can be traced back to a specific commit
Deployment: code-based agents are trivial to roll out and scale
Static analysis: most mistakes are caught during development, not execution
Auditability: it’s easy to see when and why something changed
Agent-friendliness: agents are great at writing code, so it’s natural for them to build other agents
Tooling: instrumentation is much easier when your agent is defined as code

Roboport

I have implemented the concepts I’ve shared above in a TypeScript library called roboport ↗.

My idea for it is simple: everything that could be abstracted — should be abstracted, but nothing more. Roboport gives you all the tools to make your agents, while also coming with some nice defaults. It lets you start simple with built-in harnesses, skills, MCPs, and gateways, while having an option to extend or override any component with your own logic.

If you’re building an agentic workflow, try using Roboport and let me know how it goes.