Read Article

Discussion -

Read Article

Discussion -

The Four Jobs Around AI Agents: a working ecosystem for the years ahead

By Hugo

May 18, 2026

There is an AI being trained right now that will do my job better than me. Going to client meetings. Taking down project goals. Analysing systems. Designing solutions. Leading my delivery team. And it will do most of that before lunch.

The temptation is to hold the line, to argue our value, list the things AI cannot do yet, and hope the wave breaks before it reaches us. I do not think that works.

The way I see it, this is starting now. We are at the very beginning. Four new roles are taking shape around AI agents. Plenty of people are being hired to build them. Almost no-one is being hired to do the other three. Build them. Hire them. Direct them. And check on them. Or retire them when they go off the rails. I am planning to work on all four.

This is the lifecycle I see taking shape over the next couple of years. Four jobs around every AI agent, three of them still waiting for the people who will do them.

Each job feeds the next.

1. Build them

Most of the industry’s attention sits here today. Microsoft’s Azure AI Foundry and the Microsoft Agent Framework on the enterprise side. Copilot Studio for declarative agents that non-developers can author. LangChain, Semantic Kernel, AutoGen and a hundred open-source contenders on the open side. Models keep getting cheaper, faster, and more capable every quarter.

What separates a hobby agent from a production agent is the same discipline that separates a working application from a one-off script. Requirements before code. Design before implementation. Evaluation before deployment. Monitoring after deployment. We have spent forty years getting good at this and the temptation now is to skip it because ‘the agent figures it out’. It does not.

Four patterns are emerging in the production work:

Eval-first development. The eval harness is written before, or alongside, the prompt.
Composable architectures. Small specialised agents handing work off under a supervisor. Each piece testable, each piece replaceable.
Structured memory. Short-term working memory, mid-term project memory, long-term identity. Each with explicit retention rules.
Tool-use boundaries. What the agent can read, write, send, schedule, pay. Part of the design, not an afterthought.

Building agents alone gets you a craft fair, not an industry. The three jobs around the building are where the system actually starts to function.

Built is not the same as employed. The agent still needs a job.

2. Hire them

This is the role almost no-one is talking about.

If you want an agent today to handle your tier-1 customer service, your code review, or your invoice categorisation, you commission a custom build. There is no marketplace. No shared body of evaluated work. No portable profile that follows the agent if it changes hands. The equivalent of every company building its own ERP from scratch in 1995. It does not scale.

What replaces it is an agent recruitment agency. Every agent has a passport: a standardised, machine-readable document that describes what the agent can do, the domains it knows, the data it is licensed to touch, the certifications it has earned, and its production history. Every claim on the passport is signed. Eval scores by the registry. Mission-critical certifications by the human review board that issued them. Performance history by the hirers themselves.

Before an agent enters the registry it goes through a ‘Ready for Employment’ track. The builder publishes a signed manifest. The registry runs the agent through an automated eval suite for its declared specialty. For money-moving and safety-critical classes, a human review board reads the behaviour log against a red-team prompt set and signs off. The agent is published under a versioned URI. Once hired, anonymised production telemetry feeds back into the passport.

A hirer arrives at the portal with a job to fill. They filter the registry the way a recruiter filters LinkedIn, but on attributes only an agent can have. Specialty match. Certification class. Provenance of model and training data. Performance band. Compatibility with the identity provider, CRM, observability platform. Endorsement from other hirers in the same sector. They shortlist, interview each candidate by chatting with it, and sign a machine-readable employment contract.

Builders publish. Hirers search. Every claim on the passport is signed.

The role that goes with this layer is the agent recruiter. Someone who knows what is available, certified and trustworthy on the supply side, and what the business actually needs an agent to do on the demand side, and matches the two. The job did not exist three years ago.

Hired is not the same as productive. The agent needs somewhere to operate.

3. Direct them

You have hired an agent. Where does it go to work?

Most production agent deployments today are bolted-on. An agent runs on a developer’s laptop with whatever access happened to be available when they wrote it. That is not a workplace. That is a side gig.

The workplace I am thinking about has the same components a person’s working environment has, applied to agents. A reliable place to run, predictable and properly monitored, with a record of every action it takes. An identity of its own, so it can only see and touch the data its job needs. A clear level of authority, set per role rather than per agent. Some decisions are pre-approved, some need a person in the loop, some are off limits. A way to talk to other agents, to find them, hand work off, ask for help, and supervise the ones reporting to it. Team patterns built in, for when a problem needs ten agents in parallel or one supervising five. And a way to reach outside your company, so your agents can speak to your customers’, your suppliers’ and your regulators’ agents through shared standards that everyone agrees on.

Inside, your estate. Outside, the agents you do business with.

We are at the equivalent of the early internet. Every company has its own agent stack, none of them talk to each other natively, and the protocols are about to be written. The workplace that makes them feel native owns a meaningful chunk of the next infrastructure layer.

The role here is the agent director. Somewhere between a site reliability engineer, an identity engineer, and an enterprise architect.

Working today is not the same as working in nine months. Somebody has to keep checking.

4. Check on them

Agents drift. Their underlying models update. The world they were trained on changes. The processes they are hired into evolve. Their data sources shift. Their tools change. Their context grows messier. Edge cases accumulate. They start hallucinating more, escalating less, quietly answering things they should not.

Drift is the obvious problem. The deeper question is alignment. An agent that was perfectly calibrated nine months ago may now be too cautious, or too eager, or too literal, or quietly drifting in a direction nobody asked for.

The discipline I half-jokingly call agent therapy has four layers. Continuous evaluation against the same suite that signed the agent off at hire, run in production. Behavioural reviews where a human reads a sampled sliver of the agent’s work. Re-certification every six months or after a major model update. And the therapy interview: a structured conversation with the agent. Does it still understand its job. Does it know its boundaries. Does it still describe its mission the way you would describe it.

For agents in sensitive domains, the alignment layer is explicit. Does it still refuse to lie to a customer about a known product fault. Does it still escalate when a customer shows signs of vulnerability. Does it still flag potential discrimination in the data it sees. The agent’s responses to values-laden decisions are scored against the behaviour you would expect from a thoughtful human in the same role.

When an agent fails its check-ups, three things can happen. Tune the agent: refresh the prompt, retrain on fresh data, tighten tools. Replace it: hire a successor through the registry with the same passport, parallel-run for an overlap period, then cut over. Or retire it permanently: archive the decisions, migrate the workload, mark the passport retired.

The check-up loop and the three outcomes. An agent in production is continuously evaluated. When drift or failure is caught, the owner chooses tune, replace, or retire.

An agent retires not because it stopped working but because it stopped being the best fit. Replacement is not a loss. It is the system functioning the way it was designed.

This is human resources for agents. It is where the next generation of risk officers, compliance leads, and operations leaders will quietly build their careers.

Where I am putting my time

Four jobs. Build. Hire. Direct. Check.

The recruitment portal does not exist yet. The workplace standards are being argued about in working groups. The therapy discipline is barely starting to emerge. The people doing this work full-time can probably fit in a small room.

That is where I want to be. I am building agents on the Microsoft stack today inside enterprise programmes, with the working assumption that the other three jobs are coming and that the systems around them have to be ready.

If you are shipping agents into production, planning to, or trying to figure out the operating model your board is about to ask you for, message me. We will talk.

Tags: agentic AI AI agents AI Alignment AI Governance AI Lifecycle AI Operating Model Enterprise AI future of work Microsoft Agent Framework Solutions Architecture

← Previous Post