Emma, a utility manager at an electric company, is excited about the new AI agent installed to assist with technical queries. She asks, “What is the number of insulators required for a 1kv line?” The AI promptly answers, “A 1kv line typically requires three insulators.” Impressed, Emma thinks, “Great! This AI agent can do everything.”
Encouraged, she asks another question: “Where exactly is tree growth causing the most problems with our power lines?” Answer: “Urban and rural areas with mature trees face frequent outages due to branches interfering with overhead lines.” Thanks, Captain Obvious! That wasn’t helpful at all.
Puzzled by the unhelpful, non-specific answer, Emma wonders: What’s missing? Maybe this AI agent thing is overhyped. What even is an AI agent?
AI Agent Definition
An AI agent has four key attributes:
- Acting on Behalf of Others: A human needs to be able to delegate tasks to the AI agent, and the agent acts on the human’s behalf.
- Autonomous Decision-Making: The agent needs to be able to make decisions independently without constant human intervention.
- Adapting to New Situations: The agent is not just a static script that runs the same process again and again. It must be able to adapt to dynamic, changing circumstances.
- Social Interaction with Others: To complete most tasks, collaboration with others is required. So, the agent needs to be able to socially interact with others to accomplish its goals. These interactions are either with other systems via APIs or with other humans via messaging, email, or voice.
What Agents Do Internally
Internally, AI agents perform a series of steps to achieve their objectives:
- Planning: The agent first makes a plan of the steps it needs to complete to achieve its goal.
- Critiquing: After the agent has come up with a plan, it then should use self-reflection to criticize its plan in order to fix potential flaws and improve the plan. This mirrors the actor-critic systems used in some reinforcement learning systems. Sometimes the critic is the same Large Language Model (LLM) that does the planning, or one might want to use an LLM that is optimized for generating critiques. The designer of the AI agent system may choose to perform multiple planning-critiquing loops to achieve the best possible outcome.
- Executing: The AI then carries out the project step-by-step.
- Adapting: As unforeseen circumstances occur, the agent needs to adjust to those circumstances, changing how it completes a step or modifying the plan. This is another self-reflection cycle that monitors the execution and adjusts to a changing environment.
- Summarizing: When the tasks have been completed, the AI agent should communicate back to the user who made the original request, summarizing what it has done and presenting the outcome.
- Feedback: After completing a task, the agent should reflect on how it accomplished the task, what it could have done better, and how to improve future operations.
How Do They Do It?
To be effective, the AI agent needs several internal capabilities:
- World Model: The AI needs to construct an internal representation of the world in which it operates, so it can simulate interactions and make plans grounded in that reality. For example, at Datch we help our customers map asset-centric ontologies that model relationships between their various enterprise systems. Our AI agents then use that ontological knowledge to help them understand the physical and digital worlds of each specific enterprise.
- Understanding Instructions: The AI needs to be able to interpret and act upon instructions, potentially asking follow-up questions if the task is unclear. A user could give these instructions using one-shot instructions or by giving it specific examples of the expected outcomes.
- Common Sense Knowledge: Common sense knowledge has historically been something AI has struggled to grasp. LLMs have a significant degree of common sense knowledge and can leverage that understanding to perform better reasoning.
- Detailed Contextual Knowledge: Beyond common sense knowledge, the AI needs detailed, specialized contextual knowledge in the relevant domain. A user might add relevant documentation into the context window or might choose to use a specific specialist LLM for certain sub-tasks.
- Memory: LLMs are stateless and lack inherent memory. To maintain continuity in conversations, they rely on including the entire conversation history with each new user prompt. However, an AI agent requires a more sophisticated memory system than simply including everything. It needs a well-structured memory that can selectively recall relevant information from past instructions, interactions, tasks, and experiences.
- Logical Reasoning: While LLMs don’t “reason” in the same way humans do, they have some computational process that can simulate reasoning if their training data includes examples of the types of problems they are likely to encounter while completing their task. These processes simulate some aspects of human reasoning and are critical to achieving the desired objectives.
What Can They Not Do?
Despite their advanced capabilities, AI agents have limitations:
- Task Complexity: Most current AI agents can complete small tasks independently; however, they often break down when given a more complex task that involves many steps. They either get stuck in loops, hit an error they cannot resolve, or end up performing nonsensical actions. However, the landscape is changing rapidly, and LLMs are increasingly able to handle more complexity agentic tasks end-to-end.
- Lack of Intent or Desire: AI agents do not possess consciousness or personal motivations. They operate entirely based on programmed objectives and data inputs.
- Lack of Genuine Understanding: While the AI can process information and give useful responses to questions, it arguably cannot actually comprehend the words (read about John Searle’s Chinese Room Argument for more on this point). This leads to AIs sometimes making obvious mistakes that no human would ever make in the same situation. For example, the famous question about counting the number of “r” letters in “strawberry,” which most LLMs get wrong.
- Speed and Cost of Execution: An AI agent needs to run many different LLM queries to complete a task. This inevitably takes a much longer time and costs more money than running a single prompt and LLM.
Conclusion
AI agents represent a significant step toward Artificial General Intelligence (AGI). They are powerful tools capable of acting autonomously to answer difficult questions, conduct complex analyses, and perform multi-step tasks. By interacting with both human and electronic systems—such as tools, APIs, and other agents—they execute tasks efficiently. Their internal processes enable them to plan, execute, and adapt actions effectively. However, their lack of intent, desire, and genuine understanding means that human oversight remains crucial. As we start building more and more of these AI agents, it is essential to understand their inner workings, recognize their limitations, and explore ways to enhance their capabilities.
Interested in learning more about AI agents could be leveraged specifically within your workflows? Drop your email in the box below and we will be happy to provide a free consultation.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
- List item one
- List item two
- List item three
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
- List item one
- List item two
- List item three
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
This is a quote.
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.