Agents are useful because they have some autonomy: they can decide what steps to take, which tools to use, and when to stop. But this autonomy also means agents can make mistakes and make you look like a fool (see Hallucination). Guardrails / Human-in-the-loop are common approaches to minimize this risk.
Key concepts
Guardrails is a high-level conceptual term that generally refers to rules/checks that prevent agent from performing harmful/undesirable actions.
- E.g. for a customer service agent, make sure the agent does not return harmful content to the user.
- E.g. for an agent with access to your files, make sure agent cannot delete any files.
Guardrails can be implemented in various ways. E.g. blocking the agent from invoking certain tools (e.g. tool that deletes files). The guardrail can be an LLM itself (e.g. check to make sure the customer service agent does not return harmful content).
Human-in-the-loop is one type of guardrail. It refers to when a human is inserted into the agent decision process, often to review, approve or stop an action taken by the agent. E.g. every time the agent makes a financial transaction > $1000, first get human approval (micro-manage your agent).