Can AI Learn From Its Surroundings to Achieve a Goal?
What would happen if you suddenly woke up on a deserted island?
You would likely start by defining a goal, such as finding shelter or food. And to do that, you’d have to understand where you are. You’d look around, trying to understand what kind of environment you’re in.
Then, based on what you learn, you would create a plan of action. For example, if you determine you have enough wood and palm leaves, you could collect them and build a shelter.
You can do this because your brain is capable of perception and reasoning. It can capture information from the environment, interpret it to understand how it works, and define the best way to proceed based on your goals or intentions.
Now, as advanced as popular large language models may be, they’re incapable of performing these kind of cognitive functions. In other words, they can’t understand their surroundings and environment. Instead, they’re only capable of finding patterns on a given set of data, and producing a response based on probability.
Until now.
By leveraging OpenCog Hyperon’s innovative artificial intelligence framework, SingularityNET and TrueAGI created an artificial intelligence agent that can be deployed into an unknown environment and learn about it to achieve a given goal.
In this article, we’ll explain how this innovative technology works and demonstrate it using the massively popular video game Minecraft.
Enter ROCCA, a Context-Aware, Problem-Solving AI Agent
The Rational OpenCog Control Agent—ROCCA—is an autonomous, plug-and-play agent that users can load into a digital environment—such as Minecraft—and define a goal. That’s all it takes!
Once that is done, the agent will analyze the environment by itself, understand how it works, and independently identify what actions it must take to achieve the goal it has been assigned. Furthermore, unlike most LLMs, ROCCA doesn’t need prior training—its ability to recognize the conditions and circumstances allow it to fulfill its goals even in uncertain or unknown environments.
In other words, ROCCA is able to explore and discover regularities and patterns within an environment and act rationally in response.
To do this, the agent follows a series of steps:
- First, it implements a reasoning-based pattern miner—in other words, it uses reasoning to identify specific patterns within the data.
- Then, it combines these rules to create plans to achieve its goals, including the different actions and time it may take to complete them.
- These plans are combined into a mixture.
- Finally, the agent selects the next action using Thompson Sampling—an approach that makes decisions based on the probability of success to maximize the rewards.
In short, ROCCA implements an observation-planning-action loop interleaved with learning and reasoning. The system that encompasses these processes, algorithms and technologies is what allows ROCCA to understand how an unknown environment works and act rationally to achieve a specific goal.
Control and Learning: How ROCCA “Thinks”
When deployed, ROCCA performs two core processes that allow it to observe and understand an unknown environment, and develop a plan to achieve its goals: Control and Learning.
In turn, these are divided into multiple sub-processes.
Control: Finding the Right Action to Achieve a Goal
The agent starts in a control phase. As it motor-babbles through its environment, it performs multiple control cycles, each decomposed into Observation, Planning and Action phases.
- Observation: During the observation phase, data coming from the environment is time-stamped and stored in the Atomspace—OpenCog Hyperon’s framework knowledge metagraph.
- Planning: The first step of the planning phase is to select a goal to fulfill. Once the goal has been selected, the agent searches the Atomspace for a cognitive schematic that matches the pattern queried. Cognitive Schematics are predictive implications relating the observed context, defined actions, time delay selected within a forward window and the selected goal. These are referred to as Cognitive Schematics, which are predictive implications relating contexts, actions and goals. All returned candidates are then filtered according to their contexts and those valid are handed to the next phase for performing action selection.
- Action: In this stage, the agent first selects the next action from the valid options by using Thompson Sampling. Then, it timestamps and stores the selected action in the AtomSpace. Finally, it runs the selected action and updates the environment.
Learning: Teaching AI to Improve Itself
Although we’ve simplified it in this article, discovering cognitive schematics that are as predictive and widely applicable as possible is extremely important as it is difficult.
To achieve this, ROCCA uses a combination of pattern mining and reasoning. Let’s go over each of these tasks in detail.
- Pattern Mining: Generally, this is a process of discovering meaningful, valuable, and often overlooked patterns or relationships in large datasets. It involves searching for recurring patterns, associations, or trends that can provide valuable insights or knowledge about the data. Particularly, ROCCA uses this process to discover regularities in the environment that help it understand its rules and dynamics—in a first stance—and to discover action plans that help it achieve its goals.
- Temporal Reasoning: This process refers to the capacity to analyze and reason about events, actions, and phenomena that occur over time. It involves capturing and processing temporal information, such as the order of events, durations, temporal relationships, and dependencies, in order to make predictions, infer causal relationships, or perform various forms of reasoning. ROCCA applies temporal reasoning to update existing cognitive schematics obtained by pattern mining, and discover new cognitive schematics by combining existing ones. For instance, it can develop multi-action plans by stringing single actions together, as well as generalize or specialize their contexts or goals.
Finding Diamonds in Minecraft With ROCCA
To demonstrate ROCCA’s learning and problem-solving capabilities, SingularityNET and TrueAGI created a Minecraft world that comprises a house filled with diamonds and a key. Then, it deployed the agent onto it.
ROCCA’s objective was to retrieve the key, located somewhere in the vicinity of the house, and then unlock the door of the house. Upon unlocking the door, the agent is able to collect a diamond and receive a reward.
The aim of the experiment was to make ROCCA learn from interacting with the Minecraft environment and collect as many diamonds as possible.
The experiment consisted of two iterations of training lasting fifty control cycles each, interleaved by a learning phase of a few hours. During the first iteration, no learning is taking place as the agent has no prior knowledge. The agent randomly explores the environment.
Afterwards, it enters a learning phase, discovering cognitive schematics through the pattern mining and temporal reasoning process described above, subsequently leading the agent to more frequently achieve its goal during the next training phase.
During this experiment, ROCCA was able to perform multiple random actions, evaluating their outcomes, and storing everything it perceived from the environment in the Atomspace. Then, it was able to find patterns within the data it had collected about the world, and combined it with reasoning to construct plans to reach its goal.
Finally, the agent was able to implement the plans it constructed to successfully find diamonds within the world. Simultaneously, it continues to explore the environment and gather data, further improving and optimizing its action plans.
Want to see ROCCA in action? Try Vereya, our custom Minecraft mod used to study neural-symbolic agents. Download Vereya from Github 🡥
AI Can Help Us Solve Problems Without Supervision
ROCCA demonstrates how the OpenCog Hyperon framework can be used to control artificial intelligence agents in uncertain environments, allowing them to learn about the context and identify patterns to construct action plans and successfully achieve specific goals.
Moreover, this agent is able to perform reasoning and inference processes, from learning to planning, to executing those plans. This grants them full independence as soon as they are deployed.
Even in its exploratory stages, ROCCA has the potential to foster greater capabilities for meta-learning and self-improvement of AI-powered systems, which is a significant limitation of current LLMs.
We are confident that this is only the beginning and that, based on the foundations ROCCA is laying, artificial intelligence systems will continue to become more effective, efficient, and independent at solving real-world problems.
Subscribe to our Newsletter
Email Address