Back to blog

Human-in-the-loop · Jun 4, 2026

Synchronous vs. Asynchronous HITL: When to Block the Workflow and When to Let It Continue

The biggest architectural decision in HITL isn't which framework to use — it's whether your approval gates should block execution or let the workflow continue. Synchronous HITL halts everything. Asynchronous HITL keeps moving. Both are right. Neither is right everywhere. Here's how to decide.

HITLAsync WorkflowsAgent ArchitectureWorkflow DesignHuman Oversight

Synchronous vs. Asynchronous HITL: When to Block the Workflow and When to Let It Continue

The question comes up in every HITL architecture conversation, usually after the third coffee:

"Should we block the agent until the human responds, or let it keep working?"

It's the wrong question — but it's the right instinct. Synchronous HITL (blocking, the agent waits) and asynchronous HITL (non-blocking, the agent continues) are both essential patterns. The mistake isn't choosing one over the other. The mistake is applying only one to every action type.

Here's when each is right, how to implement both in the same system, and why most production architectures land on a hybrid model.


Synchronous HITL: Block Until Approved

In synchronous HITL, the agent pauses execution at the approval gate and waits. The human receives the request, reviews the context, and approves, modifies, or rejects. Only after the decision does the agent resume — and only if the decision was positive.

Agent proposes → Gate closes → Human reviews → Gate opens → Agent continues
                                              ↓
                                           Rejected → Agent backtracks

When synchronous HITL is the right choice

1. The action is irreversible. A production deployment, a database deletion, an external communication to thousands of users — if the action cannot be undone once executed, block until you're certain. A few minutes of latency is cheaper than a weekend of recovery.

2. Downstream actions depend on the outcome. If the agent's next four steps all depend on whether step three was approved, there's no value in executing steps four through seven speculatively. Block, get the decision, then proceed with clarity.

3. The workflow is linear. Single-path workflows with well-defined decision points don't benefit from speculative execution. The agent reaches a gate, waits for clearance, and continues down the one path — with no branching that could have been pre-computed.

4. Latency is acceptable. If the human reviewer can respond in minutes (not hours), blocking is fine. A Slack notification to an on-call engineer who typically responds in 90 seconds doesn't need an async architecture.

Implementation pattern

The key requirement: state serialization before the gate. The agent must persist its complete execution state — context window, tool call history, in-progress reasoning — before pausing. The runtime process may crash, restart, or scale to zero while the approval is pending. When the reviewer responds, the agent must resume from the exact same state.

class SyncHITLAgent:
    async def execute_with_gate(self, action: ToolCall) -> ActionResult:
        # 1. Serialize full state
        checkpoint = self.checkpoint_store.save({
            "workflow_id": self.workflow_id,
            "step": self.current_step,
            "context": self.context,
            "proposed_action": action,
            "status": "awaiting_approval"
        })

        # 2. Route to human, wait for response
        decision = await self.approval_service.request_and_wait(
            action=action,
            context=self.context,
            timeout_minutes=15
        )

        # 3. Resume from checkpoint
        if decision.approved:
            return await self.execute(action)
        else:
            return ActionResult(status="rejected", reason=decision.reason)

Asynchronous HITL: Fire and Continue

In asynchronous HITL, the agent requests human approval but doesn't stop. It continues executing actions that are independent of the pending decision. The human reviews in parallel. If the decision is negative, the agent must roll back or compensate for the actions it continued to take — which only works if those actions were reversible.

Agent proposes → Gate fires notification → Agent continues with independent work
                      ↓
              Human reviews (in parallel)
                      ↓
              Approved → Already-executed actions were correct
              Rejected → Roll back / compensate

When asynchronous HITL is the right choice

1. The actions being gated are reversible. Provisioning cloud resources (can be torn down), creating draft documents (can be discarded), sending internal messages (can be clarified). If you can undo the action at a cost lower than the latency of waiting, don't wait.

2. The agent has independent work to do. The pending approval is for step 3 of 10, and steps 4–7 don't depend on step 3's outcome. Blocking the agent means wasting the time it could spend advancing the other 60% of the workflow. Let it continue, and only undo step 3's dependencies if rejected.

3. Human response time is unpredictable. The reviewer might respond in 10 minutes or 10 hours. Blocking for an unpredictable duration creates unpredictable throughput. If the workflow can make progress in parallel, let it.

4. Throughput matters more than perfect consistency. Batch processing workflows where a small percentage of rejections are acceptable and can be compensated post-hoc — customer outreach campaigns, internal audit tasks, data enrichment pipelines. The occasional rollback is cheaper than the constant blocking.

Implementation pattern

The key requirement: compensation logic. Every action executed after the approval gate must be reversible, and the system must know how to reverse it. This means:

  • Every action has a corresponding "undo" operation
  • The undo chain is tracked alongside the execution chain
  • On rejection, the undo chain fires in reverse order
class AsyncHITLAgent:
    def __init__(self):
        self.pending_decisions = {}
        self.executed_since_gate = []

    async def execute_with_async_gate(self, action: ToolCall):
        # 1. Fire approval request, don't wait
        approval_id = await self.approval_service.request(
            action=action,
            on_approved=self.handle_approval,
            on_rejected=self.handle_rejection
        )
        self.pending_decisions[approval_id] = action

        # 2. Continue with independent work
        await self.continue_workflow()

    async def continue_workflow(self):
        for step in self.remaining_independent_steps():
            result = await self.execute(step)
            self.executed_since_gate.append({
                "action": step,
                "undo": step.undo_operation,
                "result": result
            })

    async def handle_rejection(self, approval_id: str, reason: str):
        # 3. Roll back in reverse order
        for record in reversed(self.executed_since_gate):
            await self.execute(record["undo"])
        self.executed_since_gate = []

The Hybrid Model: Different Gates, Different Modes

No production system uses purely sync or purely async HITL. The right architecture uses both — per action type, per workflow, sometimes per execution context.

Action TypeModeWhy
Delete production databaseSyncIrreversible, catastrophic if wrong
Send customer newsletterSyncIrreversible once sent, high blast radius
Provision test environmentAsyncReversible (tear down), low blast radius
Create draft reportAsyncReversible (discard), latency-tolerant
Execute financial transaction < $50AsyncReversible (refund), low value
Execute financial transaction > $5,000SyncMaterial financial impact
Update internal wikiAsyncReversible, low stakes

The action manifest — the same version-controlled policy configuration that defines approval requirements — should also define the blocking mode:

actions:
  delete_production_data:
    approval_required: true
    blocking: true            # Sync — agent waits
    timeout_minutes: 15
    undo_operation: null       # Can't undo, so must block

  provision_test_environment:
    approval_required: true
    blocking: false           # Async — agent continues
    timeout_minutes: 60
    undo_operation: "destroy_environment"

  update_crm_record:
    approval_required: false
    blocking: false           # Autonomous
    audit_log: true

The manifest is the single source of truth for both whether approval is needed and how the approval gate should behave.


The Danger Zones

Both modes have failure patterns worth designing for:

Sync danger: The Stalled Workflow

A blocking gate with no timeout stalls the agent forever. Every blocking gate needs a timeout and an escalation path — as covered in the HITL timeout post. The combination of sync blocking + no timeout = production outage waiting to happen.

Async danger: The Unwound Stack

If the agent continues for 12 steps after an async gate and the gate is rejected, the compensation chain of 12 reversals must complete atomically. If any reversal fails, the system is in an inconsistent state. Mitigation: limit the maximum async depth. After N steps without approval resolution, force a blocking wait.

The confidence trap

The temptation with async HITL is to make everything async because "blocking slows us down." But async only works if you've correctly classified which actions are reversible — and if your reversal logic is tested. An action that you think is reversible but actually isn't becomes a production incident when the rejection arrives 30 minutes later and the undo path fails.


Where Facio Fits

Facio's HITL primitives support both sync and async approval modes — configured per action in the policy manifest, enforced at the runtime level. Blocking gates serialize state to the checkpoint store and wait. Non-blocking gates fire the notification and let the agent continue, tracking the pending decision alongside the execution chain.

Placet.io handles the human side regardless of mode. The reviewer receives the same structured approval request whether the agent is blocked waiting for their answer or continuing in parallel. The mode is an agent runtime concern — the human experience is consistent.

The combination means you can design workflows with the right gate behavior per action without building two separate HITL systems. One manifest, two modes, consistent human experience.


Key Takeaways

  • Sync HITL blocks the agent until approval — right for irreversible, high-stakes actions where latency is acceptable
  • Async HITL lets the agent continue — right for reversible actions where throughput matters and response time is unpredictable
  • Production systems use both — per action, per workflow, defined in the same policy manifest that governs approval requirements
  • Every blocking gate needs a timeout and escalation path — no timeout means stalled forever
  • Every async gate needs compensation logic — every action executed after the gate must be reversible, and the undo chain must be tested
  • Limit async depth — after N pending steps without approval resolution, force a blocking wait to prevent unwinding complexity
  • The manifest is the single source of truth — approval requirement, blocking mode, timeout, and undo operation all live in the same per-action configuration

Sources: The sync/async HITL taxonomy draws on architectural patterns from the Understanding Data HITL framework, Omnithium's production agent patterns, and workflow engine designs (Temporal, Prefect) adapted for agentic human-in-the-loop. The async depth limiting pattern mirrors circuit breaker designs from distributed systems engineering.