Facio's exec Tool: How AI Agents Run Shell Commands Safely With Timeouts and Audit Trails
An AI agent that can read files, write code, and search the web — but cannot execute what it has written — is a research tool, not a worker. The agent plans, drafts, and analyzes, but every execution step requires a human to copy-paste a command into a terminal. For research tasks and content creation, that's tolerable. For deployment pipelines, infrastructure automation, and DevOps workflows, it kills the productivity gain entirely.
Facio's exec tool gives agents controlled access to a real shell — with safety limits, configurable timeouts, output truncation, and full audit trails. Here's how the architecture works and why controlled shell access is the difference between an agent that helps and an agent that ships.
The Architecture: Real Shell, Constrained Surface
The exec tool runs shell commands directly on the host system — not in a sandboxed container, not in a restricted language subset. This is intentional. Agents working on real production codebases need access to the actual git, docker, psql, npm, and python binaries. Constraining the shell to a "safe" subset would make the tool useless for the workflows that need it most.
Instead of sandboxing, Facio uses a layered safety model:
| Layer | What it does |
|---|---|
| Command allowlist/blocklist | Dangerous commands (rm -rf, dd, shutdown, format, mkfs) are blocked at runtime |
| Timeout enforcement | Every command has a maximum execution time (default 60s, configurable up to 10 minutes) |
| Output truncation | Command output is capped at 10,000 characters to prevent context overflow |
| Working directory | Commands can be scoped to a specific working directory |
| Workspace boundaries | Optional restrictToWorkspace config can limit file access to the workspace |
| Audit trail | Every command, every output, every timeout is logged permanently |
The agent calls exec like a developer calls a terminal. The safety layers enforce discipline without the overhead of a sandbox.
exec(command="git status", working_dir="projects/api-server")
# Returns: clean exit, output of `git status`
exec(command="pytest tests/ -v", timeout=120)
# Returns: test results, truncated to 10,000 characters
exec(command="rm -rf /")
# Returns: "Command blocked: dangerous command 'rm -rf' is not allowed"
Timeouts: The Anti-Stuck-Agent Mechanism
Production agents that execute code inevitably hit slow operations. A long test suite, a network call to an unreachable service, a sleep command in a CI script. Without a timeout, the agent waits indefinitely — the session is stuck, the user can't get a response, the iteration budget burns.
Facio's exec tool enforces timeouts on every command:
- Default timeout: 60 seconds. Suitable for most read-only commands, git operations, and quick scripts.
- Configurable up to 10 minutes. Long builds, complex test suites, and data migrations need more.
- Hard kill on timeout. The process is terminated, not just the request — no orphan processes.
- Timeout response is informative. The agent receives: "Command exceeded timeout of 120s. Process terminated. Partial output: [first 10,000 chars]."
The agent learns to scope timeouts appropriately. A quick git log doesn't need 10 minutes. A full pytest run on a large codebase does. The timeout parameter is a first-class part of the tool's API, not a global config the agent has no control over.
Output Truncation: Protecting the Context Window
A single exec call can produce enormous output. A verbose npm install, a database dump, a multi-gigabyte log file. Without truncation, the agent's context window fills with output it can't process, and the entire session goes off the rails.
Facio's exec tool caps output at 10,000 characters:
exec(command="docker logs api-server --tail=1000")
# Returns: first 10,000 characters of output, plus a notice if truncation occurred
The agent can use the truncation notice to decide whether to:
grepthe log file for specific patterns (usingexecagain with a focused command)- Write the full output to a file (
exec(command="docker logs api-server > /workspace/tmp/api-logs.txt")) andread_filethe relevant sections - Run the command with a more targeted scope (
docker logs --since=10minstead of--tail=1000)
The truncation isn't a limitation — it's a forcing function for the agent to write efficient commands. The agent that learns to scope its commands well stays productive in the runtime. The agent that runs unfiltered commands gets truncated output and learns quickly.
Dangerous Command Blocking
Facio's exec tool blocks the most destructive commands by default. The blocklist covers the obvious foot-guns:
rm -rf— recursive force deletedd— direct disk writesshutdown/reboot/halt— system power managementformat/mkfs— filesystem creation- And other patterns that have no legitimate use case in agent-driven workflows
When a blocked command is attempted, the agent receives a clear error: "Command blocked: dangerous command 'rm -rf' is not allowed." The agent doesn't silently fail — it knows the command was rejected and can choose an alternative.
This is the same HITL-adjacent principle that gates manage_mcp destructive operations: the runtime prevents the catastrophic mistake, not the human. For commands that need elevated risk, the human can configure the allowlist to permit them — but the default is safe, and changing it requires explicit human action.
Working Directory and Workspace Boundaries
By default, exec runs in the agent's current working directory. But the agent can scope commands to a specific directory:
exec(command="npm test", working_dir="projects/web-app")
For deployments and infrastructure work, this prevents the "wrong directory" problem. A git push to the wrong repo, a pytest run in the wrong project, a docker-compose up in the wrong stack — all prevented by explicit working directory.
For even tighter isolation, restrictToWorkspace config can limit file access to the workspace directory:
# Without restriction: agent can read/write any path on the host
# With restriction: agent is limited to /data/facio-1/.facio/workspace
This is the same pattern as Facio's Docker Quickstart — the agent's execution environment can be scoped to its intended boundaries.
Integration with the Audit Trail
Every exec call is logged:
- The command (verbatim)
- The working directory
- The timestamp
- The exit code
- The full output (up to 10,000 characters)
- The duration
This is the operational audit trail that complements the higher-level audit trail of HITL decisions, MCP operations, and credential access. The compliance question "what did this agent actually do on the system?" is answerable by querying the exec log.
For regulated industries, this is the difference between "the agent touched the production database" being a vague suspicion and a documented fact. The audit trail says: at 2026-06-10 09:47:32 UTC, the agent ran psql -c "SELECT COUNT(*) FROM users" against the production database. Duration: 0.4 seconds. Output: 1,247,389.
When to Use exec vs. File Tools
The agent has a complete file tool surface. Not every operation needs a shell. Here's the decision rule:
| Operation | Right tool |
|---|---|
| Read a file's content | read_file |
| Find files by name | glob |
| Search file contents | grep |
| Edit a specific section | edit_file |
| Create a new file | write_file |
| Run a build / test / deploy | exec |
| Check git status | exec (or read_file on .git/) |
| Run database migrations | exec |
| Install dependencies | exec |
| Process data with a script | exec (write script first, then run) |
The file tools are for working with files. The exec tool is for invoking the system — running binaries, executing scripts, interacting with processes. The agent picks the right tool for the operation, not the more powerful one for every job.
Production Patterns
Pattern 1: CI/CD Pipeline Automation
A deployment workflow:
1. exec("git status", working_dir="projects/api-server") → check for uncommitted changes
2. exec("git pull origin main", working_dir="projects/api-server") → update code
3. exec("npm install", timeout=180) → install dependencies
4. exec("npm run test", timeout=300) → run tests
5. exec("npm run build") → build artifacts
6. exec("docker build -t api-server:latest .", timeout=300) → containerize
7. exec("kubectl apply -f deployment.yaml", timeout=60) → deploy
8. exec("kubectl rollout status deployment/api-server", timeout=180) → wait for rollout
9. Write status to MEMORY.md → record successful deployment
Nine shell commands, all time-bounded, all logged, all part of a coordinated deployment workflow. The agent did the work; the audit trail captured every step.
Pattern 2: Data Pipeline with Script Generation
1. Write a Python data processing script to projects/etl/transform.py
2. exec("python projects/etl/transform.py", timeout=600) → execute
3. exec("wc -l projects/etl/output.csv") → verify output size
4. exec("head -5 projects/etl/output.csv") → verify output format
5. Deliver the output to the user
The agent writes the script (file tool), then runs it (exec tool), then verifies the output (exec tool). The full pipeline is reproducible — re-running the same commands produces the same results.
Pattern 3: System Diagnostics
A heartbeat task monitoring system health:
1. exec("df -h", working_dir="/") → check disk usage
2. exec("free -m") → check memory
3. exec("docker ps --format '{{.Names}}: {{.Status}}'") → check container health
4. exec("systemctl status nginx --no-pager | head -20") → check service status
5. Log findings via read_logs
6. Alert human if any check failed
Five quick diagnostic commands, all with default timeouts, all producing concise output. The agent runs a complete health check in under a minute.
The HITL Boundary: When exec Should Require Approval
Facio's exec tool doesn't gate individual commands behind HITL approval — that would make the tool too slow for routine work. But the broader pattern is HITL-gated at the workflow level:
- The agent's overall mission is approved via
ask_approval("Approve this deployment plan?") - The
execcommands within the approved plan run without re-approval - Destructive operations outside the approved plan should trigger a new approval request
For example: a deployment plan is approved. The agent runs git pull, npm install, npm test, docker build, kubectl apply. All approved by the initial go-ahead. But if the agent decides mid-execution to run kubectl delete namespace production to "clean up," that command should trigger a fresh ask_approval call — because it's outside the approved plan.
The HITL boundary is a discipline the agent follows, not a runtime enforcement. The agent's job is to recognize when a command exceeds the scope of its current approval and ask before executing.
Bottom Line
AI agents that can write code but not run it are intellectual tools, not workers. The deployment never ships. The test never runs. The migration never completes. The agent contributes analysis but not execution — which means every workflow still needs a human at the keyboard for the final mile.
Facio's exec tool gives agents the final mile. Real shell access, with safety limits, timeouts, output truncation, and full audit trails. The agent can run builds, deploy code, query databases, install dependencies, and execute data pipelines — all within boundaries that prevent catastrophic mistakes and log every action for compliance.
Because an agent that can reason but can't act is only useful in research. To ship, the agent needs to run the command.
See the exec tool documentation for timeout configuration, working directory patterns, and integration with the audit trail.