AI agent orchestration patterns for production
The patterns that actually work when agents go live
AI agent orchestration patterns for production
Getting an AI agent working in a demo is one thing. Getting it to work reliably in production with real users, real data and real consequences is a different challenge entirely.
Over the past year I have deployed several agentic systems for enterprise clients and there are patterns that keep coming up as the reliable approaches. I want to share the ones that have worked and some of the mistakes I made along the way.
The orchestrator-worker pattern
The most reliable pattern I have used is separating the orchestrator from the workers. The orchestrator is responsible for breaking down the task and deciding which worker agent to call next. The worker agents are focused and only do one thing well.
from openai import AzureOpenAI
import json
import os
client = AzureOpenAI(
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-12-01-preview",
)
# Worker agents - each focused on one task
def classify_document(content: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Classify this document as one of: Invoice, Contract, Report, Email, Other. Respond with just the category."},
{"role": "user", "content": content}
],
max_tokens=10,
)
return response.choices[0].message.content.strip()
def extract_key_fields(content: str, doc_type: str) -> dict:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Extract the key fields from this {doc_type}. Return as JSON."},
{"role": "user", "content": content}
],
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)
def route_for_approval(doc_type: str, fields: dict) -> str:
# Your business logic for routing
if doc_type == "Invoice" and float(fields.get("amount", 0)) > 10000:
return "finance_director"
elif doc_type == "Contract":
return "legal_team"
else:
return "standard_queue"
# Orchestrator - coordinates the workers
def process_document(content: str) -> dict:
doc_type = classify_document(content)
fields = extract_key_fields(content, doc_type)
approver = route_for_approval(doc_type, fields)
return {
"type": doc_type,
"fields": fields,
"routed_to": approver,
}Why this works: each worker is small, testable and replaceable. If your classification accuracy drops, you fix that one worker without touching the rest of the pipeline.
Always have a human checkpoint for consequential actions
This is the lesson I learned the hard way on an early project. We built an agent that could approve purchase orders under a certain threshold. It worked well in testing but in production it approved a duplicate order that should have been flagged.
Now I build in human checkpoints for any action that is difficult to reverse:
from typing import Callable
def with_human_approval(action_description: str, action: Callable, auto_approve_threshold: float = 0.95) -> bool:
"""
Execute an action only after human approval.
Returns True if the action was approved and executed.
"""
print(f"\nAgent wants to: {action_description}")
print("Approve? (yes/no): ", end="")
response = input().strip().lower()
if response == "yes":
action()
return True
else:
print("Action rejected by user")
return False
# Usage
with_human_approval(
"Send approval email to finance director for invoice INV-2024-0892 for $15,400",
lambda: send_approval_email("finance_director@company.com", "INV-2024-0892"),
)For a production system you would replace the console input with a proper approval workflow - a Teams message, an email with an approve link, or a task in your ITSM system.
Build for failure
Agents will fail. The LLM will return something unexpected, an API will time out, a document will be in an unexpected format. Your orchestration layer needs to handle these gracefully.
import time
from typing import Optional
def call_with_retry(func: Callable, max_retries: int = 3, delay: float = 1.0) -> Optional[any]:
for attempt in range(max_retries):
try:
return func()
except Exception as e:
if attempt == max_retries - 1:
# Log the failure and route to a dead letter queue
log_failed_task(str(e))
return None
time.sleep(delay * (attempt + 1))
return NoneEvery failed task should go somewhere - a dead letter queue, a human review inbox, somewhere that a person can look at it and decide what to do. The worst outcome is a task that silently disappears.
Monitor the agent, not just the output
Standard application monitoring looks at response times and error rates. For agents you also need to monitor:
- How many tool calls is the agent making per task - a spike here could mean the agent is stuck in a loop
- Token usage per task - useful for cost tracking and detecting unusual behaviour
- Which tasks are being rejected by human reviewers - this tells you where the agent is getting things wrong
- End to end task completion rate - what percentage of tasks complete successfully without human intervention
These metrics will tell you whether your agent is improving or degrading over time as your data and usage patterns evolve.
Production agents are not a set and forget deployment. They need the same care and attention as any other production system.