Product · Jun 10, 2026

Facio's exec Tool: How AI Agents Run Shell Commands Safely With Timeouts and Audit Trails

AI agents that can't execute code are research tools, not workers. Facio's exec tool gives agents controlled access to a real shell — with configurable timeouts, output truncation, dangerous-command blocking, and full audit trails. Here's how the safety architecture works and why it matters for production agents.

Shell CommandsCode ExecutionAgent ToolingSafetyAutomation

Facio's exec Tool: How AI Agents Run Shell Commands Safely With Timeouts and Audit Trails

An AI agent that can read files, write code, and search the web — but cannot execute what it has written — is a research tool, not a worker. The agent plans, drafts, and analyzes, but every execution step requires a human to copy-paste a command into a terminal. For research tasks and content creation, that's tolerable. For deployment pipelines, infrastructure automation, and DevOps workflows, it kills the productivity gain entirely.

Facio's exec tool gives agents controlled access to a real shell — with safety limits, configurable timeouts, output truncation, and full audit trails. Here's how the architecture works and why controlled shell access is the difference between an agent that helps and an agent that ships.

The Architecture: Real Shell, Constrained Surface

The exec tool runs shell commands directly on the host system — not in a sandboxed container, not in a restricted language subset. This is intentional. Agents working on real production codebases need access to the actual git, docker, psql, npm, and python binaries. Constraining the shell to a "safe" subset would make the tool useless for the workflows that need it most.

Instead of sandboxing, Facio uses a layered safety model:

Layer	What it does
Command allowlist/blocklist	Dangerous commands (`rm -rf`, `dd`, `shutdown`, `format`, `mkfs`) are blocked at runtime
Timeout enforcement	Every command has a maximum execution time (default 60s, configurable up to 10 minutes)
Output truncation	Command output is capped at 10,000 characters to prevent context overflow
Working directory	Commands can be scoped to a specific working directory
Workspace boundaries	Optional `restrictToWorkspace` config can limit file access to the workspace
Audit trail	Every command, every output, every timeout is logged permanently

The agent calls exec like a developer calls a terminal. The safety layers enforce discipline without the overhead of a sandbox.

exec(command="git status", working_dir="projects/api-server")
# Returns: clean exit, output of `git status`

exec(command="pytest tests/ -v", timeout=120)
# Returns: test results, truncated to 10,000 characters

exec(command="rm -rf /")
# Returns: "Command blocked: dangerous command 'rm -rf' is not allowed"

Timeouts: The Anti-Stuck-Agent Mechanism

Production agents that execute code inevitably hit slow operations. A long test suite, a network call to an unreachable service, a sleep command in a CI script. Without a timeout, the agent waits indefinitely — the session is stuck, the user can't get a response, the iteration budget burns.

Facio's exec tool enforces timeouts on every command:

Default timeout: 60 seconds. Suitable for most read-only commands, git operations, and quick scripts.
Configurable up to 10 minutes. Long builds, complex test suites, and data migrations need more.
Hard kill on timeout. The process is terminated, not just the request — no orphan processes.
Timeout response is informative. The agent receives: "Command exceeded timeout of 120s. Process terminated. Partial output: [first 10,000 chars]."

The agent learns to scope timeouts appropriately. A quick git log doesn't need 10 minutes. A full pytest run on a large codebase does. The timeout parameter is a first-class part of the tool's API, not a global config the agent has no control over.

Output Truncation: Protecting the Context Window

A single exec call can produce enormous output. A verbose npm install, a database dump, a multi-gigabyte log file. Without truncation, the agent's context window fills with output it can't process, and the entire session goes off the rails.

Facio's exec tool caps output at 10,000 characters:

exec(command="docker logs api-server --tail=1000")
# Returns: first 10,000 characters of output, plus a notice if truncation occurred

The agent can use the truncation notice to decide whether to:

grep the log file for specific patterns (using exec again with a focused command)
Write the full output to a file (exec(command="docker logs api-server > /workspace/tmp/api-logs.txt")) and read_file the relevant sections
Run the command with a more targeted scope (docker logs --since=10m instead of --tail=1000)

The truncation isn't a limitation — it's a forcing function for the agent to write efficient commands. The agent that learns to scope its commands well stays productive in the runtime. The agent that runs unfiltered commands gets truncated output and learns quickly.

Dangerous Command Blocking

Facio's exec tool blocks the most destructive commands by default. The blocklist covers the obvious foot-guns:

rm -rf — recursive force delete
dd — direct disk writes
shutdown / reboot / halt — system power management
format / mkfs — filesystem creation
And other patterns that have no legitimate use case in agent-driven workflows

When a blocked command is attempted, the agent receives a clear error: "Command blocked: dangerous command 'rm -rf' is not allowed." The agent doesn't silently fail — it knows the command was rejected and can choose an alternative.

This is the same HITL-adjacent principle that gates manage_mcp destructive operations: the runtime prevents the catastrophic mistake, not the human. For commands that need elevated risk, the human can configure the allowlist to permit them — but the default is safe, and changing it requires explicit human action.

Working Directory and Workspace Boundaries

By default, exec runs in the agent's current working directory. But the agent can scope commands to a specific directory:

exec(command="npm test", working_dir="projects/web-app")

For deployments and infrastructure work, this prevents the "wrong directory" problem. A git push to the wrong repo, a pytest run in the wrong project, a docker-compose up in the wrong stack — all prevented by explicit working directory.

For even tighter isolation, restrictToWorkspace config can limit file access to the workspace directory:

# Without restriction: agent can read/write any path on the host
# With restriction: agent is limited to /data/facio-1/.facio/workspace

This is the same pattern as Facio's Docker Quickstart — the agent's execution environment can be scoped to its intended boundaries.

Integration with the Audit Trail

Every exec call is logged:

The command (verbatim)
The working directory
The timestamp
The exit code
The full output (up to 10,000 characters)
The duration

This is the operational audit trail that complements the higher-level audit trail of HITL decisions, MCP operations, and credential access. The compliance question "what did this agent actually do on the system?" is answerable by querying the exec log.

For regulated industries, this is the difference between "the agent touched the production database" being a vague suspicion and a documented fact. The audit trail says: at 2026-06-10 09:47:32 UTC, the agent ran psql -c "SELECT COUNT(*) FROM users" against the production database. Duration: 0.4 seconds. Output: 1,247,389.

When to Use exec vs. File Tools

The agent has a complete file tool surface. Not every operation needs a shell. Here's the decision rule:

Operation	Right tool
Read a file's content	`read_file`
Find files by name	`glob`
Search file contents	`grep`
Edit a specific section	`edit_file`
Create a new file	`write_file`
Run a build / test / deploy	`exec`
Check git status	`exec` (or `read_file` on `.git/`)
Run database migrations	`exec`
Install dependencies	`exec`
Process data with a script	`exec` (write script first, then run)

The file tools are for working with files. The exec tool is for invoking the system — running binaries, executing scripts, interacting with processes. The agent picks the right tool for the operation, not the more powerful one for every job.

Production Patterns

Pattern 1: CI/CD Pipeline Automation

A deployment workflow:

1. exec("git status", working_dir="projects/api-server") → check for uncommitted changes
2. exec("git pull origin main", working_dir="projects/api-server") → update code
3. exec("npm install", timeout=180) → install dependencies
4. exec("npm run test", timeout=300) → run tests
5. exec("npm run build") → build artifacts
6. exec("docker build -t api-server:latest .", timeout=300) → containerize
7. exec("kubectl apply -f deployment.yaml", timeout=60) → deploy
8. exec("kubectl rollout status deployment/api-server", timeout=180) → wait for rollout
9. Write status to MEMORY.md → record successful deployment

Nine shell commands, all time-bounded, all logged, all part of a coordinated deployment workflow. The agent did the work; the audit trail captured every step.

Pattern 2: Data Pipeline with Script Generation

1. Write a Python data processing script to projects/etl/transform.py
2. exec("python projects/etl/transform.py", timeout=600) → execute
3. exec("wc -l projects/etl/output.csv") → verify output size
4. exec("head -5 projects/etl/output.csv") → verify output format
5. Deliver the output to the user

The agent writes the script (file tool), then runs it (exec tool), then verifies the output (exec tool). The full pipeline is reproducible — re-running the same commands produces the same results.

Pattern 3: System Diagnostics

A heartbeat task monitoring system health:

1. exec("df -h", working_dir="/") → check disk usage
2. exec("free -m") → check memory
3. exec("docker ps --format '{{.Names}}: {{.Status}}'") → check container health
4. exec("systemctl status nginx --no-pager | head -20") → check service status
5. Log findings via read_logs
6. Alert human if any check failed

Five quick diagnostic commands, all with default timeouts, all producing concise output. The agent runs a complete health check in under a minute.

The HITL Boundary: When exec Should Require Approval

Facio's exec tool doesn't gate individual commands behind HITL approval — that would make the tool too slow for routine work. But the broader pattern is HITL-gated at the workflow level:

The agent's overall mission is approved via ask_approval ("Approve this deployment plan?")
The exec commands within the approved plan run without re-approval
Destructive operations outside the approved plan should trigger a new approval request

For example: a deployment plan is approved. The agent runs git pull, npm install, npm test, docker build, kubectl apply. All approved by the initial go-ahead. But if the agent decides mid-execution to run kubectl delete namespace production to "clean up," that command should trigger a fresh ask_approval call — because it's outside the approved plan.

The HITL boundary is a discipline the agent follows, not a runtime enforcement. The agent's job is to recognize when a command exceeds the scope of its current approval and ask before executing.

Bottom Line

AI agents that can write code but not run it are intellectual tools, not workers. The deployment never ships. The test never runs. The migration never completes. The agent contributes analysis but not execution — which means every workflow still needs a human at the keyboard for the final mile.

Facio's exec tool gives agents the final mile. Real shell access, with safety limits, timeouts, output truncation, and full audit trails. The agent can run builds, deploy code, query databases, install dependencies, and execute data pipelines — all within boundaries that prevent catastrophic mistakes and log every action for compliance.

Because an agent that can reason but can't act is only useful in research. To ship, the agent needs to run the command.

See the exec tool documentation for timeout configuration, working directory patterns, and integration with the audit trail.

Facio's exec Tool: How AI Agents Run Shell Commands Safely With Timeouts and Audit Trails

Facio's exec Tool: How AI Agents Run Shell Commands Safely With Timeouts and Audit Trails

The Architecture: Real Shell, Constrained Surface

Timeouts: The Anti-Stuck-Agent Mechanism

Output Truncation: Protecting the Context Window

Dangerous Command Blocking

Working Directory and Workspace Boundaries

Integration with the Audit Trail

When to Use exec vs. File Tools

Production Patterns

Pattern 1: CI/CD Pipeline Automation

Pattern 2: Data Pipeline with Script Generation

Pattern 3: System Diagnostics

The HITL Boundary: When exec Should Require Approval

Bottom Line

More on Product

Facio's Web Research Stack: How web_search and web_fetch Turn AI Agents Into Live Research Engines

Facio's Notebook Editing: How AI Agents Read and Modify Jupyter Notebooks at the Cell Level

Facio's Media Generation Tools: How AI Agents Create Images and Videos Programmatically