Facio's Task Dispatch: How Hundreds of AI Agents Get the Right Work at the Right Time
A single AI agent is a workflow. A fleet of AI agents is a system. The fleet needs dispatch — a way to route incoming work to the right agent at the right time, balance load across agents, prioritize critical work over routine work, and ensure no agent gets overwhelmed.
Naive fleets have no dispatch. Every agent polls for work independently. When work arrives, every agent sees it. Multiple agents pick it up. Some work gets done multiple times; some work gets lost. When a flood of work arrives (Monday morning, end of month, deploy day), all agents get overwhelmed simultaneously. The system fails at the moment it's needed most.
Facio's task dispatch is the structural discipline that turns agent fleets into systems. The dispatch handles routing, load balancing, priority queuing, fair scheduling, and observability. The agents receive work that's appropriate for them, when they can handle it, in the order the business needs.
Here's how the dispatch works, what patterns it uses, and why fleets without dispatch fail while fleets with dispatch scale.
The Fleet Reality
Production AI agent platforms run fleets. Not one agent handling one task at a time — dozens or hundreds of agents, each capable of handling a variety of tasks, running concurrently, serving multiple customers and workflows.
The fleet's work comes from many sources:
- User interactions. Real users chat with the agent, asking questions, requesting actions.
- Scheduled tasks. Cron jobs and heartbeat tasks fire at specific times.
- Webhook events. External systems notify the agent of changes that need processing.
- Sub-agent results. When one agent spawns another, the child agent's results come back to the parent.
- HITL responses. When human reviewers respond to approval requests, the responses need to be routed back to the originating agent.
Each source produces work that needs to be done. The work has priorities, deadlines, dependencies, and resource requirements. The fleet needs to handle all of this.
Without dispatch, the work piles up. The agents grab whatever they see first, with no regard for priority, dependencies, or balance. Some tasks get done quickly; others wait forever. Some agents are overloaded; others are idle. The system underperforms despite having enough capacity.
The Dispatch Architecture
Facio's dispatch architecture has three main components: the work queue, the dispatcher, and the agent pool.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Work │───▶│ Dispatcher │───▶│ Agent │
│ Sources │ │ │ │ Pool │
│ (users, │ │ - Routing │ │ (workers) │
│ cron, │ │ - Load │ │ │
│ webhooks) │ │ balancing │ │ │
└──────────────┘ │ - Priority │ │ │
│ │ queuing │ │ │
│ │ - Fairness │ │ │
│ │ - Quotas │ │ │
│ └──────────────┘ └──────────────┘
│ │ │
└─────▶ Work Queue ◀─┘ │
(priority queue, persistent) │
▼
Tool execution,
external systems
The work queue holds incoming tasks, organized by priority and timing. The dispatcher pulls from the queue and assigns to agents. The agents process the work and report back.
The Dispatch Patterns
The dispatcher uses several patterns, each suited to different work characteristics.
Pattern 1: Priority Queues
Work is organized into priority queues. Higher-priority work is dispatched first:
# Priority queues
queue_critical = [] # Security incidents, production failures
queue_high = [] # Customer-facing requests, deadlines within hours
queue_normal = [] # Routine work, scheduled tasks
queue_low = [] # Background maintenance, non-urgent cleanup
# Dispatch order: critical → high → normal → low
next_task = (
queue_critical.pop() if queue_critical else
queue_high.pop() if queue_high else
queue_normal.pop() if queue_normal else
queue_low.pop() if queue_low else
None
)
The priority queues prevent critical work from being blocked by a flood of routine work. A security incident at 3 AM gets handled before the routine cleanup jobs scheduled for 3 AM.
Priority is determined by the work source: explicit priority field, deadline, source type, customer SLA, or business rule. The dispatcher's priority logic is configurable.
Pattern 2: Load Balancing
The dispatcher tracks which agents are busy and which have capacity. New work goes to available agents:
# Agent pool state
agents = {
"agent-001": {"status": "busy", "current_task": "...", "started_at": "..."},
"agent-002": {"status": "idle", "current_task": None, "idle_since": "..."},
"agent-003": {"status": "busy", "current_task": "...", "started_at": "..."},
"agent-004": {"status": "idle", "current_task": None, "idle_since": "..."},
}
# Dispatch decision
available_agents = [a for a in agents.values() if a["status"] == "idle"]
# Multiple idle agents: choose based on next pattern
The load balancing ensures work spreads across agents. No agent is overwhelmed while others sit idle.
Pattern 3: Capability-Based Routing
Work is routed to agents with the right capabilities. A database migration task goes to an agent with database credentials and migration tools; a customer support task goes to an agent with support tooling:
# Task requirements
task = {"type": "deploy", "target": "production", "requires": ["k8s-mcp", "deploy-token"]}
# Agent capabilities
agent_capabilities = {
"agent-001": ["slack-mcp", "support-tools"],
"agent-002": ["k8s-mcp", "deploy-token", "aws-mcp"],
"agent-003": ["github-mcp", "code-review-tools"],
}
# Routing decision: task requires k8s-mcp and deploy-token
capable_agents = [
a for a in agents.values()
if all(req in agent_capabilities[a] for req in task["requires"])
]
# Result: agent-002 is capable
The capability-based routing ensures work goes to agents that can actually do it. The work isn't dispatched to an agent that lacks the necessary tools or credentials.
Pattern 4: Affinity Routing
Some work should stay with the agent that did related work. A follow-up task about a customer issue should go to the agent that handled the initial task:
# Task metadata
task = {
"type": "follow-up",
"session_ref": "session-customer-issue-12345",
"initial_agent": "agent-002",
}
# Affinity check
if task.get("session_ref"):
initial_agent = get_agent_for_session(task["session_ref"])
if initial_agent and is_available(initial_agent):
assign_to(task, initial_agent)
# Continue work with the same agent that started it
The affinity routing preserves context. The agent that knows the customer's history, the project's state, the conversation's nuances continues the work. The user doesn't have to re-explain context to a new agent.
Pattern 5: Deadline-Aware Scheduling
Work with deadlines is scheduled to complete before the deadline. The dispatcher considers estimated work duration, current agent load, and deadline:
# Task with deadline
task = {"type": "report", "deadline": "2026-07-04T15:00:00Z", "estimated_duration_minutes": 30}
# Current time
now = "2026-07-04T14:00:00Z"
# Time until deadline
time_remaining = 60 minutes
# Decision
if time_remaining > task["estimated_duration_minutes"] * 2:
# Plenty of time: assign to any available agent
assign_to(task, available_agent)
elif time_remaining > task["estimated_duration_minutes"]:
# Tight but doable: assign to fastest available agent
assign_to(task, fastest_available_agent)
else:
# Critical: prioritize this task in the queue
prioritize(task)
The deadline-aware scheduling ensures time-sensitive work gets handled in time. The dispatcher calculates urgency based on remaining time and estimated duration.
Pattern 6: Fair Scheduling
Over time, every customer and every workflow should get fair access to agents. The dispatcher tracks work distribution and ensures no customer monopolizes the fleet:
# Per-customer work distribution (last hour)
customer_work = {
"customer-a": 45,
"customer-b": 12,
"customer-c": 8,
}
# Fairness target: each customer gets ~20% of capacity
# Decision: customer-a's task is at the back of normal queue
# because customer-a is over their fair share
if customer_work[task["customer"]] > fairness_target * 1.5:
# Customer is over their share: defer their work
defer(task)
else:
# Normal dispatch
assign_to(task, available_agent)
The fair scheduling prevents one customer's flood of work from starving other customers. The system stays fair even under uneven load.
The Dispatch Observability
A fleet without dispatch observability is a fleet you can't operate. The dispatcher exposes:
Queue depth. How many tasks are waiting, by priority. Growing queues indicate capacity issues.
Agent utilization. How busy each agent is, how long they've been working, what they're working on. Hot spots indicate routing issues.
Task wait time. How long tasks wait before being assigned. Long waits indicate capacity shortfalls.
Task duration. How long tasks take from assignment to completion. Long durations indicate work complexity or agent performance issues.
SLA compliance. Percentage of tasks completed within their deadline. The key business metric.
# Dispatch dashboard
{
"queue_depth": {"critical": 0, "high": 3, "normal": 47, "low": 12},
"agent_utilization": {"busy": 18, "idle": 4, "total": 22},
"avg_wait_time_seconds": {"critical": 0, "high": 12, "normal": 87, "low": 245},
"avg_task_duration_seconds": 340,
"sla_compliance_pct": 98.7,
"fairness_distribution": {"customer-a": 0.42, "customer-b": 0.18, "customer-c": 0.12, "other": 0.28}
}
The observability surfaces the dispatch health. Operators see bottlenecks, imbalances, and SLA risks before they become outages.
The Dispatch Resilience
The dispatcher itself is a system. It needs to be resilient:
Persistent queues. Tasks in the queue survive dispatcher restart. The queue is backed by durable storage (database or persistent log).
Multi-instance dispatcher. The dispatcher can run as multiple instances for redundancy. They coordinate through the shared queue (no two dispatchers pick the same task).
Graceful degradation. When the dispatcher is overloaded, it drops low-priority work first. Critical work always gets through.
Agent health monitoring. If an agent stops responding, the dispatcher reassigns its work to another agent. The stuck agent's task doesn't block the queue.
The resilience ensures the dispatch system is as reliable as the work it dispatches.
The Dispatcher Doesn't Do
Honest limitations:
- It doesn't make agents faster. Dispatch routing doesn't reduce task duration; it just routes tasks to available agents. If tasks are inherently slow, dispatch doesn't help.
- It doesn't create capacity. If the fleet has 10 agents and 100 concurrent tasks, dispatch spreads the load but can't add capacity. Scaling requires adding agents.
- It doesn't handle task dependencies automatically. If Task B depends on Task A's output, the dispatcher needs explicit dependency configuration. The dispatcher doesn't infer dependencies from task content.
- It can prioritize incorrectly. The priority logic depends on configuration. If priorities are wrong, the wrong work gets dispatched first. The dispatcher is only as good as its rules.
- It adds latency. The dispatch decision takes time. For very low-latency tasks, dispatch overhead may be unacceptable. Direct agent invocation is faster for latency-critical paths.
The Compound Effect of Good Dispatch
A fleet with good dispatch compounds its value:
- Higher utilization. Work flows to where capacity exists. Idle agents get assigned work. Busy agents don't get overloaded.
- Better SLA compliance. Priority queues and deadline awareness ensure time-sensitive work is handled on time.
- Fair customer experience. Fair scheduling ensures every customer gets adequate service.
- Operational visibility. Dispatch observability surfaces issues before they cascade.
- Easier scaling. Adding agents to the pool increases capacity; the dispatcher routes work to the new agents automatically.
A fleet without dispatch has the opposite trajectory. Underutilization at some agents, overload at others, missed SLAs, unfair distribution, opaque operations. The fleet struggles despite having enough total capacity.
Bottom Line
A single agent is a workflow. A fleet is a system. The system needs dispatch.
Facio's task dispatch handles the routing, load balancing, priority queuing, fair scheduling, and observability that fleets need to operate at scale. The dispatch patterns (priority queues, load balancing, capability-based routing, affinity routing, deadline-aware scheduling, fair scheduling) work together to ensure work flows to where it can be done, when it should be done, by agents that can do it.
The fleet without dispatch is chaos. The fleet with dispatch is a system. The operators prefer the system. The customers trust the system.
Because agent fleets are how AI agents operate at scale. Agent fleets without dispatch fail at scale. Agent fleets with dispatch scale.
See the task dispatch documentation for routing rules, queue configuration, and fleet observability setup.