Back to blog

Product · Jun 10, 2026

Facio's exec Tool: How AI Agents Run Shell Commands Safely With Timeouts and Audit Trails

AI agents that can't execute code are research tools, not workers. Facio's exec tool gives agents controlled access to a real shell — with configurable timeouts, output truncation, dangerous-command blocking, and full audit trails. Here's how the safety architecture works and why it matters for production agents.

Shell CommandsCode ExecutionAgent ToolingSafetyAutomation

Facio's exec Tool: How AI Agents Run Shell Commands Safely With Timeouts and Audit Trails

An AI agent that can read files, write code, and search the web — but cannot execute what it has written — is a research tool, not a worker. The agent plans, drafts, and analyzes, but every execution step requires a human to copy-paste a command into a terminal. For research tasks and content creation, that's tolerable. For deployment pipelines, infrastructure automation, and DevOps workflows, it kills the productivity gain entirely.

Facio's exec tool gives agents controlled access to a real shell — with safety limits, configurable timeouts, output truncation, and full audit trails. Here's how the architecture works and why controlled shell access is the difference between an agent that helps and an agent that ships.

The Architecture: Real Shell, Constrained Surface

The exec tool runs shell commands directly on the host system — not in a sandboxed container, not in a restricted language subset. This is intentional. Agents working on real production codebases need access to the actual git, docker, psql, npm, and python binaries. Constraining the shell to a "safe" subset would make the tool useless for the workflows that need it most.

Instead of sandboxing, Facio uses a layered safety model:

LayerWhat it does
Command allowlist/blocklistDangerous commands (rm -rf, dd, shutdown, format, mkfs) are blocked at runtime
Timeout enforcementEvery command has a maximum execution time (default 60s, configurable up to 10 minutes)
Output truncationCommand output is capped at 10,000 characters to prevent context overflow
Working directoryCommands can be scoped to a specific working directory
Workspace boundariesOptional restrictToWorkspace config can limit file access to the workspace
Audit trailEvery command, every output, every timeout is logged permanently

The agent calls exec like a developer calls a terminal. The safety layers enforce discipline without the overhead of a sandbox.

exec(command="git status", working_dir="projects/api-server")
# Returns: clean exit, output of `git status`

exec(command="pytest tests/ -v", timeout=120)
# Returns: test results, truncated to 10,000 characters

exec(command="rm -rf /")
# Returns: "Command blocked: dangerous command 'rm -rf' is not allowed"

Timeouts: The Anti-Stuck-Agent Mechanism

Production agents that execute code inevitably hit slow operations. A long test suite, a network call to an unreachable service, a sleep command in a CI script. Without a timeout, the agent waits indefinitely — the session is stuck, the user can't get a response, the iteration budget burns.

Facio's exec tool enforces timeouts on every command:

  • Default timeout: 60 seconds. Suitable for most read-only commands, git operations, and quick scripts.
  • Configurable up to 10 minutes. Long builds, complex test suites, and data migrations need more.
  • Hard kill on timeout. The process is terminated, not just the request — no orphan processes.
  • Timeout response is informative. The agent receives: "Command exceeded timeout of 120s. Process terminated. Partial output: [first 10,000 chars]."

The agent learns to scope timeouts appropriately. A quick git log doesn't need 10 minutes. A full pytest run on a large codebase does. The timeout parameter is a first-class part of the tool's API, not a global config the agent has no control over.

Output Truncation: Protecting the Context Window

A single exec call can produce enormous output. A verbose npm install, a database dump, a multi-gigabyte log file. Without truncation, the agent's context window fills with output it can't process, and the entire session goes off the rails.

Facio's exec tool caps output at 10,000 characters:

exec(command="docker logs api-server --tail=1000")
# Returns: first 10,000 characters of output, plus a notice if truncation occurred

The agent can use the truncation notice to decide whether to:

  • grep the log file for specific patterns (using exec again with a focused command)
  • Write the full output to a file (exec(command="docker logs api-server > /workspace/tmp/api-logs.txt")) and read_file the relevant sections
  • Run the command with a more targeted scope (docker logs --since=10m instead of --tail=1000)

The truncation isn't a limitation — it's a forcing function for the agent to write efficient commands. The agent that learns to scope its commands well stays productive in the runtime. The agent that runs unfiltered commands gets truncated output and learns quickly.

Dangerous Command Blocking

Facio's exec tool blocks the most destructive commands by default. The blocklist covers the obvious foot-guns:

  • rm -rf — recursive force delete
  • dd — direct disk writes
  • shutdown / reboot / halt — system power management
  • format / mkfs — filesystem creation
  • And other patterns that have no legitimate use case in agent-driven workflows

When a blocked command is attempted, the agent receives a clear error: "Command blocked: dangerous command 'rm -rf' is not allowed." The agent doesn't silently fail — it knows the command was rejected and can choose an alternative.

This is the same HITL-adjacent principle that gates manage_mcp destructive operations: the runtime prevents the catastrophic mistake, not the human. For commands that need elevated risk, the human can configure the allowlist to permit them — but the default is safe, and changing it requires explicit human action.

Working Directory and Workspace Boundaries

By default, exec runs in the agent's current working directory. But the agent can scope commands to a specific directory:

exec(command="npm test", working_dir="projects/web-app")

For deployments and infrastructure work, this prevents the "wrong directory" problem. A git push to the wrong repo, a pytest run in the wrong project, a docker-compose up in the wrong stack — all prevented by explicit working directory.

For even tighter isolation, restrictToWorkspace config can limit file access to the workspace directory:

# Without restriction: agent can read/write any path on the host
# With restriction: agent is limited to /data/facio-1/.facio/workspace

This is the same pattern as Facio's Docker Quickstart — the agent's execution environment can be scoped to its intended boundaries.

Integration with the Audit Trail

Every exec call is logged:

  • The command (verbatim)
  • The working directory
  • The timestamp
  • The exit code
  • The full output (up to 10,000 characters)
  • The duration

This is the operational audit trail that complements the higher-level audit trail of HITL decisions, MCP operations, and credential access. The compliance question "what did this agent actually do on the system?" is answerable by querying the exec log.

For regulated industries, this is the difference between "the agent touched the production database" being a vague suspicion and a documented fact. The audit trail says: at 2026-06-10 09:47:32 UTC, the agent ran psql -c "SELECT COUNT(*) FROM users" against the production database. Duration: 0.4 seconds. Output: 1,247,389.

When to Use exec vs. File Tools

The agent has a complete file tool surface. Not every operation needs a shell. Here's the decision rule:

OperationRight tool
Read a file's contentread_file
Find files by nameglob
Search file contentsgrep
Edit a specific sectionedit_file
Create a new filewrite_file
Run a build / test / deployexec
Check git statusexec (or read_file on .git/)
Run database migrationsexec
Install dependenciesexec
Process data with a scriptexec (write script first, then run)

The file tools are for working with files. The exec tool is for invoking the system — running binaries, executing scripts, interacting with processes. The agent picks the right tool for the operation, not the more powerful one for every job.

Production Patterns

Pattern 1: CI/CD Pipeline Automation

A deployment workflow:

1. exec("git status", working_dir="projects/api-server") → check for uncommitted changes
2. exec("git pull origin main", working_dir="projects/api-server") → update code
3. exec("npm install", timeout=180) → install dependencies
4. exec("npm run test", timeout=300) → run tests
5. exec("npm run build") → build artifacts
6. exec("docker build -t api-server:latest .", timeout=300) → containerize
7. exec("kubectl apply -f deployment.yaml", timeout=60) → deploy
8. exec("kubectl rollout status deployment/api-server", timeout=180) → wait for rollout
9. Write status to MEMORY.md → record successful deployment

Nine shell commands, all time-bounded, all logged, all part of a coordinated deployment workflow. The agent did the work; the audit trail captured every step.

Pattern 2: Data Pipeline with Script Generation

1. Write a Python data processing script to projects/etl/transform.py
2. exec("python projects/etl/transform.py", timeout=600) → execute
3. exec("wc -l projects/etl/output.csv") → verify output size
4. exec("head -5 projects/etl/output.csv") → verify output format
5. Deliver the output to the user

The agent writes the script (file tool), then runs it (exec tool), then verifies the output (exec tool). The full pipeline is reproducible — re-running the same commands produces the same results.

Pattern 3: System Diagnostics

A heartbeat task monitoring system health:

1. exec("df -h", working_dir="/") → check disk usage
2. exec("free -m") → check memory
3. exec("docker ps --format '{{.Names}}: {{.Status}}'") → check container health
4. exec("systemctl status nginx --no-pager | head -20") → check service status
5. Log findings via read_logs
6. Alert human if any check failed

Five quick diagnostic commands, all with default timeouts, all producing concise output. The agent runs a complete health check in under a minute.

The HITL Boundary: When exec Should Require Approval

Facio's exec tool doesn't gate individual commands behind HITL approval — that would make the tool too slow for routine work. But the broader pattern is HITL-gated at the workflow level:

  • The agent's overall mission is approved via ask_approval ("Approve this deployment plan?")
  • The exec commands within the approved plan run without re-approval
  • Destructive operations outside the approved plan should trigger a new approval request

For example: a deployment plan is approved. The agent runs git pull, npm install, npm test, docker build, kubectl apply. All approved by the initial go-ahead. But if the agent decides mid-execution to run kubectl delete namespace production to "clean up," that command should trigger a fresh ask_approval call — because it's outside the approved plan.

The HITL boundary is a discipline the agent follows, not a runtime enforcement. The agent's job is to recognize when a command exceeds the scope of its current approval and ask before executing.

Bottom Line

AI agents that can write code but not run it are intellectual tools, not workers. The deployment never ships. The test never runs. The migration never completes. The agent contributes analysis but not execution — which means every workflow still needs a human at the keyboard for the final mile.

Facio's exec tool gives agents the final mile. Real shell access, with safety limits, timeouts, output truncation, and full audit trails. The agent can run builds, deploy code, query databases, install dependencies, and execute data pipelines — all within boundaries that prevent catastrophic mistakes and log every action for compliance.

Because an agent that can reason but can't act is only useful in research. To ship, the agent needs to run the command.


See the exec tool documentation for timeout configuration, working directory patterns, and integration with the audit trail.

Keep reading

More on Product

View category
Jun 9, 2026Product

Facio's Web Research Stack: How web_search and web_fetch Turn AI Agents Into Live Research Engines

AI agents with training data cutoffs are answering questions about a world that has moved on. Facio's web_search and web_fetch tools give agents direct access to the live web — search for sources, fetch the actual content, extract what's relevant, and reason over it. Combined with the search-reason loop, agents can run production-grade research workflows that match what humans do with browser tabs.

Jun 8, 2026Product

Facio's Notebook Editing: How AI Agents Read and Modify Jupyter Notebooks at the Cell Level

Jupyter notebooks are a mess for AI agents. The JSON structure obscures the actual code, cells are interleaved with outputs and metadata, and most agents treat the whole notebook as opaque text. Facio's notebook_edit tool works at the cell level — read, replace, insert, and delete individual code and markdown cells with surgical precision. Here's why notebook-aware editing matters for data science agents.

Jun 7, 2026Product

Facio's Media Generation Tools: How AI Agents Create Images and Videos Programmatically

AI agents that can only produce text are leaving half their potential on the table. Facio's generate_image and generate_video tools let agents create visual content programmatically — across OpenAI, Google Gemini, Replicate, and fal.ai — with provider-agnostic APIs, HITL approval gates, and direct delivery to any channel. Here's how autonomous visual content creation works in production.