Developing Gameplay with Q-Learning
1. Introduction
Welcome to this tutorial! Here, we'll walk you through how to use the Q-Learning reinforcement learning algorithm to develop gameplay in the Dora SSR game engine. Don't worry if you're new to machine learning and game development; this tutorial is designed to be easy to understand.
2. What is Reinforcement Learning and Q-Learning?
Reinforcement Learning is a type of machine learning in which an agent takes actions in an environment to earn rewards or penalties, learning to maximize cumulative rewards.
Q-Learning is a model-free reinforcement learning algorithm. It estimates the maximum expected reward for an action ( a ) taken in a state ( s ) by learning a state-action value function ( Q(s, a) ).
2.1 Applying Q-Learning in Game Development
In a game, the game character can be seen as the agent, and the game world is the environment. Through Q-Learning, the character can gradually learn the best actions to maximize rewards, like defeating enemies or collecting items, based on different states.
3. Understanding the QLearner Object
The Dora SSR engine provides a QLearner
object, which includes the methods needed to implement Q-Learning. Here are the main methods and properties:
- pack(hints, values): Combines multiple conditions into a unique state value.
- QLearner(gamma, alpha, maxQ): Creates a QLearner instance.
- update(state, action, reward): Updates Q-values based on the reward.
- getBestAction(state): Retrieves the best action for a given state.
- matrix: A matrix storing state, action, and corresponding Q-values.
- load(values): Loads Q-values from a known state-action pair matrix.
3.1 QLearner:pack() Detailed Explanation
Function Overview
The QLearner.pack()
method combines multiple discrete conditions into a unique state value. It accepts two parameters:
- hints: An integer array indicating the number of possible values for each condition.
- values: An integer array representing the current value for each condition.
Why is pack() Necessary?
In reinforcement learning, states are often made up of multiple features. To store and retrieve these states in a Q-table efficiently, we need to combine these features into a unique state identifier. The pack()
method serves this purpose.
Working Principle
Imagine we have two conditions:
- Weather conditions with three possibilities: sunny (0), cloudy (1), and rainy (2).
- Number of enemies with two possibilities: few (0) and many (1).
Thus, hints = {3, 2}
represents three values for the first condition and two for the second.
If it's currently cloudy and there are many enemies, then values = {1, 1}
.
Using pack(hints, values)
, we can convert values
into a unique state integer. For example:
- Lua
- Teal
- TypeScript
- YueScript
local ML = require("ML")
local state = ML.QLearner:pack({3, 2}, {1, 1})
print(state) -- Outputs a unique integer representing the current state
local ML = require("ML")
local state = ML.QLearner:pack({3, 2}, {1, 1})
print(state) -- Outputs a unique integer representing the current state
import { ML } from "Dora";
const state = ML.QLearner.pack([3, 2], [1, 1]);
print(state); // Outputs a unique integer representing the current state
_ENV = Dora
state = ML.QLearner\pack [3, 2], [1, 1]
print state -- Outputs a unique integer representing the current state
Mathematical Principle
The pack()
method combines multiple conditions by encoding each as a binary number and performing bitwise operations to produce a unique integer.
4. Step-by-Step Implementation
4.1 Importing the QLearner Module
First, import the ML
module and create a QLearner
instance:
- Lua
- Teal
- TypeScript
- YueScript
local ML = require("ML")
local qLearner = ML.QLearner(0.5, 0.5, 100.0) -- Adjust gamma, alpha, maxQ as needed
local ML = require("ML")
local qLearner = ML.QLearner(0.5, 0.5, 100.0) -- Adjust gamma, alpha, maxQ as needed
import { ML } from "Dora";
const qLearner = ML.QLearner(0.5, 0.5, 100.0); // Adjust gamma, alpha, maxQ as needed
_ENV = Dora
qLearner = ML.QLearner 0.5, 0.5, 100.0 -- Adjust gamma, alpha, maxQ as needed
Let's assume we want the game character to learn which weapon to use in different environments. Our conditions and actions might look like this:
- Conditions (State Features):
- Environment type (3 types): Forest (0), Desert (1), Snow (2)
- Enemy type (2 types): Infantry (0), Tank (1)
- Actions:
- Use handgun (1)
- Use rocket launcher (2)
- Use sniper rifle (3)
4.3 Constructing State Values with the pack() Method
- Lua
- Teal
- TypeScript
- YueScript
local hints = {3, 2} -- Number of values for each condition
local environment = 1 -- Desert
local enemy = 0 -- Infantry
local stateValues = {environment, enemy}
local state = ML.QLearner:pack(hints, stateValues)
local hints = {3, 2} -- Number of values for each condition
local environment = 1 -- Desert
local enemy = 0 -- Infantry
local stateValues = {environment, enemy}
local state = ML.QLearner:pack(hints, stateValues)
const hints = [3, 2]; // Number of values for each condition
const environment = 1; // Desert
const enemy = 0; // Infantry
const stateValues = [environment, enemy];
const state = ML.QLearner.pack(hints, stateValues);
hints = [3, 2] -- Number of values for each condition
environment = 1 -- Desert
enemy = 0 -- Infantry
stateValues = [environment, enemy]
state = ML.QLearner\pack hints, stateValues
4.4 Choosing an Action
- Lua
- Teal
- TypeScript
- YueScript
local action = qLearner:getBestAction(state)
if action == 0 then -- 0 indicates no best action
-- Choose a random action if no best action exists
action = math.random(1, 3)
end
local action = qLearner:getBestAction(state)
if action == 0 then -- 0 indicates no best action
-- Choose a random action if no best action exists
action = math.random(1, 3)
end
let action = qLearner.getBestAction(state);
if (action === 0) { // 0 indicates no best action
// Choose a random action if no best action exists
action = Math.floor(Math.random() * 3) + 1;
}
action = qLearner\getBestAction state
if action == 0 -- 0 indicates no best action
-- Choose a random action if no best action exists
action = math.random 1, 3
4.5 Performing the Action and Receiving Rewards
- Lua
- Teal
- TypeScript
- YueScript
local reward = 0
if action == 1 then
-- Logic for using the handgun
reward = 10 -- Hypothetical reward value
elseif action == 2 then
-- Logic for using the rocket launcher
reward = 20
elseif action == 3 then
-- Logic for using the sniper rifle
reward = 15
end
local reward = 0
if action == 1 then
-- Logic for using the handgun
reward = 10 -- Hypothetical reward value
elseif action == 2 then
-- Logic for using the rocket launcher
reward = 20
elseif action == 3 then
-- Logic for using the sniper rifle
reward = 15
end
let reward = 0;
if (action === 1) {
// Logic for using the handgun
reward = 10; // Hypothetical reward value
} else if (action === 2) {
// Logic for using the rocket launcher
reward = 20;
} else if (action === 3) {
// Logic for using the sniper rifle
reward = 15;
}
reward = switch action
when 1 -- Logic for using the handgun
10 -- Hypothetical reward value
when 2 -- Logic for using the rocket launcher
20
when 3 -- Logic for using the sniper rifle
15
4.6 Updating Q-Values
- Lua
- Teal
- TypeScript
- YueScript
qLearner:update(state, action, reward)
qLearner:update(state, action, reward)
qLearner.update(state, action, reward);
qLearner\update state, action, reward
4.7 Training Loop
Place the steps above in a loop to allow the agent to continually learn and update its strategy. A typical Q-Learning training process can be illustrated with the following flowchart:
5. Complete Code Example
Below is a complete Lua code example demonstrating how to use QLearner
in the Dora SSR engine to implement simple reinforcement learning. This example allows an agent to learn to choose the best weapon based on different environments and enemy types.
- Lua
- Teal
- TypeScript
- YueScript
-- Import the ML module
local ML = require("ML")
-- Create a QLearner instance with gamma, alpha, and maxQ set
local qLearner = ML.QLearner(0.5, 0.5, 100.0)
-- Define the number of possible values for each condition (hints)
-- Environment types: Forest (0), Desert (1), Snowy (2) => 3 types
-- Enemy types: Infantry (0), Tank (1) => 2 types
local hints = {3, 2}
-- Define action set
-- Use Handgun (1), Use Rocket Launcher (2), Use Sniper Rifle (3)
local actions = {1, 2, 3}
-- Simulate multiple learning iterations
for episode = 1, 1000 do
-- Randomly generate the current environment and enemy type
local environment = math.random(0, 2) -- 0: Forest, 1: Desert, 2: Snowy
local enemy = math.random(0, 1) -- 0: Infantry, 1: Tank
-- Use pack() method to combine current conditions into a unique state value
local stateValues = {environment, enemy}
local state = ML.QLearner:pack(hints, stateValues)
-- Attempt to get the best action for the given state
local action = qLearner:getBestAction(state)
-- If there is no best action, randomly select an action (exploration)
if action == 0 then
action = actions[math.random(#actions)]
else
-- With a certain probability, choose a random action to explore new strategies (ε-greedy strategy)
local explorationRate = 0.1 -- 10% chance to explore
if math.random() < explorationRate then
action = actions[math.random(#actions)]
end
end
-- Execute the action and get a reward based on the current environment and enemy type
local reward = 0
if action == 1 then -- Use Handgun
if enemy == 0 then -- Against Infantry (advantage)
reward = 20
else -- Against Tank (disadvantage)
reward = -10
end
elseif action == 2 then -- Use Rocket Launcher
if enemy == 1 then -- Against Tank (advantage)
reward = 30
else -- Against Infantry (disadvantage)
reward = 0
end
elseif action == 3 then -- Use Sniper Rifle
if environment == 2 then -- In Snowy environment (advantage)
reward = 25
else
reward = 10
end
end
-- Update Q value
qLearner:update(state, action, reward)
end
-- Test learning results
print("Learning complete, starting tests...")
-- Define test scenarios
local testScenarios = {
{environment = 0, enemy = 0}, -- Forest, against Infantry
{environment = 1, enemy = 1}, -- Desert, against Tank
{environment = 2, enemy = 0}, -- Snowy, against Infantry
}
for i, scenario in ipairs(testScenarios) do
local stateValues = {scenario.environment, scenario.enemy}
local state = ML.QLearner:pack(hints, stateValues)
local action = qLearner:getBestAction(state)
-- Display test results
local envNames = {"Forest", "Desert", "Snowy"}
local enemyNames = {"Infantry", "Tank"}
local actionNames = {"Handgun", "Rocket Launcher", "Sniper Rifle"}
print(string.format("Scenario %d: Environment-%s, Enemy-%s => Recommended Use %s",
i,
envNames[scenario.environment + 1],
enemyNames[scenario.enemy + 1],
actionNames[action]))
end
-- Import the ML module
local ML = require("ML")
-- Create a QLearner instance with gamma, alpha, and maxQ set
local qLearner = ML.QLearner(0.5, 0.5, 100.0)
-- Define the number of possible values for each condition (hints)
-- Environment types: Forest (0), Desert (1), Snowy (2) => 3 types
-- Enemy types: Infantry (0), Tank (1) => 2 types
local hints = {3, 2}
-- Define action set
-- Use Handgun (1), Use Rocket Launcher (2), Use Sniper Rifle (3)
local actions = {1, 2, 3}
-- Simulate multiple learning iterations
for episode = 1, 1000 do
-- Randomly generate the current environment and enemy type
local environment = math.random(0, 2) -- 0: Forest, 1: Desert, 2: Snowy
local enemy = math.random(0, 1) -- 0: Infantry, 1: Tank
-- Use pack() method to combine current conditions into a unique state value
local stateValues = {environment, enemy}
local state = ML.QLearner:pack(hints, stateValues)
-- Attempt to get the best action for the given state
local action = qLearner:getBestAction(state)
-- If there is no best action, randomly select an action (exploration)
if action == 0 then
action = actions[math.random(#actions)]
else
-- With a certain probability, choose a random action to explore new strategies (ε-greedy strategy)
local explorationRate = 0.1 -- 10% chance to explore
if math.random() < explorationRate then
action = actions[math.random(#actions)]
end
end
-- Execute the action and get a reward based on the current environment and enemy type
local reward = 0
if action == 1 then -- Use Handgun
if enemy == 0 then -- Against Infantry (advantage)
reward = 20
else -- Against Tank (disadvantage)
reward = -10
end
elseif action == 2 then -- Use Rocket Launcher
if enemy == 1 then -- Against Tank (advantage)
reward = 30
else -- Against Infantry (disadvantage)
reward = 0
end
elseif action == 3 then -- Use Sniper Rifle
if environment == 2 then -- In Snowy environment (advantage)
reward = 25
else
reward = 10
end
end
-- Update Q value
qLearner:update(state, action, reward)
end
-- Test learning results
print("Learning complete, starting tests...")
-- Define test scenarios
local testScenarios = {
{environment = 0, enemy = 0}, -- Forest, against Infantry
{environment = 1, enemy = 1}, -- Desert, against Tank
{environment = 2, enemy = 0}, -- Snowy, against Infantry
}
for i, scenario in ipairs(testScenarios) do
local stateValues = {scenario.environment, scenario.enemy}
local state = ML.QLearner:pack(hints, stateValues)
local action = qLearner:getBestAction(state)
-- Display test results
local envNames = {"Forest", "Desert", "Snowy"}
local enemyNames = {"Infantry", "Tank"}
local actionNames = {"Handgun", "Rocket Launcher", "Sniper Rifle"}
print(string.format("Scenario %d: Environment-%s, Enemy-%s => Recommended Use %s",
i,
envNames[scenario.environment + 1],
enemyNames[scenario.enemy + 1],
actionNames[action]))
end
// Import the ML module
import { ML } from "Dora";
// Create a QLearner instance with gamma, alpha, and maxQ set
const qLearner = ML.QLearner(0.5, 0.5, 100.0);
// Define the number of possible values for each condition (hints)
// Environment types: Forest (0), Desert (1), Snowy (2) => 3 types
// Enemy types: Infantry (0), Tank (1) => 2 types
const hints = [3, 2];
// Define action set
// Use Handgun (1), Use Rocket Launcher (2), Use Sniper Rifle (3)
const actions = [1, 2, 3];
// Simulate multiple learning iterations
for (let episode = 1; episode <= 1000; episode++) {
// Randomly generate the current environment and enemy type
const environment = math.random(0, 2); // 0: Forest, 1: Desert, 2: Snowy
const enemy = math.random(0, 1); // 0: Infantry, 1: Tank
// Use pack() method to combine current conditions into a unique state value
const stateValues = [environment, enemy];
const state = ML.QLearner.pack(hints, stateValues);
// Attempt to get the best action for the given state
let action = qLearner.getBestAction(state);
// If there is no best action, randomly select an action (exploration)
if (action === 0) {
action = actions[math.random(actions.length) - 1];
} else {
// With a certain probability, choose a random action to explore new strategies (ε-greedy strategy)
const explorationRate = 0.1; // 10% chance to explore
if (math.random() < explorationRate) {
action = actions[math.random(actions.length) - 1];
}
}
// Execute the action and get a reward based on the current environment and enemy type
let reward = 0;
if (action === 1) { // Use Handgun
if (enemy === 0) { // Against Infantry (advantage)
reward = 20;
} else { // Against Tank (disadvantage)
reward = -10;
}
} else if (action === 2) { // Use Rocket Launcher
if (enemy === 1) { // Against Tank (advantage)
reward = 30;
} else { // Against Infantry (disadvantage)
reward = 0;
}
} else if (action === 3) { // Use Sniper Rifle
if (environment === 2) { // In Snowy environment (advantage)
reward = 25;
} else {
reward = 10;
}
}
// Update Q value
qLearner.update(state, action, reward);
}
// Test learning results
print("Learning complete, starting tests...");
// Define test scenarios
const testScenarios = [
{ environment: 0, enemy: 0 }, // Forest, against Infantry
{ environment: 1, enemy: 1 }, // Desert, against Tank
{ environment: 2, enemy: 0 }, // Snowy, against Infantry
];
for (let i = 0; i < testScenarios.length; i++) {
const scenario = testScenarios[i];
const stateValues = [scenario.environment, scenario.enemy];
const state = ML.QLearner.pack(hints, stateValues);
const action = qLearner.getBestAction(state);
// Display test results
const envNames = ["Forest", "Desert", "Snowy"];
const enemyNames = ["Infantry", "Tank"];
const actionNames = ["Handgun", "Rocket Launcher", "Sniper Rifle"];
print(string.format("Scenario %d: Environment-%s, Enemy-%s => Recommended Use %s",
i + 1,
envNames[scenario.environment],
enemyNames[scenario.enemy],
actionNames[action - 1]));
}
-- Import the ML module
_ENV = Dora
-- Create a QLearner instance with gamma, alpha, and maxQ set
qLearner = ML.QLearner 0.5, 0.5, 100.0
-- Define the number of possible values for each condition (hints)
-- Environment types: Forest (0), Desert (1), Snowy (2) => 3 types
-- Enemy types: Infantry (0), Tank (1) => 2 types
hints = [3, 2]
-- Define action set
-- Use Handgun (1), Use Rocket Launcher (2), Use Sniper Rifle (3)
actions = [1, 2, 3]
-- Simulate multiple learning iterations
for episode = 1, 1000
-- Randomly generate the current environment and enemy type
environment = math.random 0, 2 -- 0: Forest, 1: Desert, 2: Snowy
enemy = math.random 0, 1 -- 0: Infantry, 1: Tank
-- Use pack() method to combine current conditions into a unique state value
stateValues = [environment, enemy]
state = ML.QLearner\pack hints, stateValues
-- Attempt to get the best action for the given state
action = qLearner\getBestAction state
-- If there is no best action, randomly select an action (exploration)
if action == 0
action = actions[math.random #actions]
else
-- With a certain probability, choose a random action to explore new strategies (ε-greedy strategy)
explorationRate = 0.1 -- 10% chance to explore
if math.random! < explorationRate
action = actions[math.random #actions]
-- Execute the action and get a reward based on the current environment and enemy type
reward = 0
reward = switch action
when 1 -- Use Handgun
if enemy == 0 -- Against Infantry (advantage)
20
else -- Against Tank (disadvantage)
-10
when 2 -- Use Rocket Launcher
if enemy == 1 -- Against Tank (advantage)
30
else -- Against Infantry (disadvantage)
0
when 3 -- Use Sniper Rifle
if environment == 2 -- In Snowy environment (advantage)
25
else
10
-- Update Q value
qLearner\update state, action, reward
-- Test learning results
print "Learning complete, starting tests..."
testScenarios =
* environment: 0 -- Forest, against Infantry
enemy: 0
* environment: 1 -- Desert, against Tank
enemy: 1
* environment: 2 -- Snowy, against Infantry
enemy: 0
for i, scenario in ipairs testScenarios
stateValues = [scenario.environment, scenario.enemy]
state = ML.QLearner\pack hints, stateValues
action = qLearner\getBestAction state
-- Display test results
envNames = ["Forest", "Desert", "Snowy"]
enemyNames = ["Infantry", "Tank"]
actionNames = ["Handgun", "Rocket Launcher", "Sniper Rifle"]
print string.format "Scenario %d: Environment-%s, Enemy-%s => Recommended Use %s",
i,
envNames[scenario.environment + 1],
enemyNames[scenario.enemy + 1],
actionNames[action]
5.1 Code Explanation
1. Import Modules and Create a QLearner Instance
- Lua
- Teal
- TypeScript
- YueScript
local ML = require("ML")
local qLearner = ML.QLearner(0.5, 0.5, 100.0)
local ML = require("ML")
local qLearner = ML.QLearner(0.5, 0.5, 100.0)
import { ML } from "Dora";
const qLearner = ML.QLearner(0.5, 0.5, 100.0);
_ENV = Dora
qLearner = ML.QLearner 0.5, 0.5, 100.0
Create a QLearner instance and set gamma, alpha, and maxQ.
- gamma: The discount factor, influencing the weight of future rewards.
- alpha: The learning rate, determining the influence of new information on Q-value updates.
- maxQ: The maximum limit of Q-values to prevent them from growing indefinitely.
2. Define State Features and Action Set
- Lua
- Teal
- TypeScript
- YueScript
local hints = {3, 2}
local actions = {1, 2, 3}
local hints = [3, 2]
local actions = [1, 2, 3]
const hints = [3, 2];
const actions = [1, 2, 3];
hints = [3, 2]
actions = [1, 2, 3]
Define the number of possible values for each condition (hints) and the action set. Note that the minimum action number starts from 1.
3. Run Learning Iterations
- Lua
- Teal
- TypeScript
- YueScript
for episode = 1, 1000 do
-- Learning process
end
for episode = 1, 1000 do
-- Learning process
end
for (let episode = 1; episode <= 1000; episode++) {
// Learning process
}
for episode = 1, 1000
-- Learning process
Use loops to simulate multiple episodes, allowing the agent to learn from different states. You can set the number of episodes to 1000 or adjust it based on needs—more episodes can help the agent gain more experience.
4. Randomly Generate Environment and Enemy Type
- Lua
- Teal
- TypeScript
- YueScript
local environment = math.random(0, 2)
local enemy = math.random(0, 1)
local environment = math.random 0, 2
local enemy = math.random 0, 1
const environment = math.random(0, 2);
const enemy = math.random(0, 1);
environment = math.random 0, 2
enemy = math.random 0, 1
Simulate different game scenarios where the environment and enemy types are randomly generated. This simulates diverse situations, helping the agent gain more experience.
5. Construct State Value Using pack()
Method
- Lua
- Teal
- TypeScript
- YueScript
local stateValues = {environment, enemy}
local state = ML.QLearner:pack(hints, stateValues)
local stateValues = [environment, enemy]
local state = ML.QLearner\pack hints, stateValues
const stateValues = [environment, enemy];
const state = ML.QLearner.pack(hints, stateValues);
stateValues = [environment, enemy]
state = ML.QLearner\pack hints, stateValues
Combine multiple conditions generated in this episode into a unique state integer for storage and retrieval in the Q-table.
6. Choose an Action
- Lua
- Teal
- TypeScript
- YueScript
local action = qLearner:getBestAction(state)
if action == 0 then
action = actions[math.random(#actions)]
else
local explorationRate = 0.1
if math.random() < explorationRate then
action = actions[math.random(#actions)]
end
end
local action = qLearner:getBestAction(state)
if action == 0 then
action = actions[math.random(#actions)]
else
local explorationRate = 0.1
if math.random() < explorationRate then
action = actions[math.random(#actions)]
end
end
if (action === 0) {
action = actions[math.random(actions.length) - 1];
} else {
const explorationRate = 0.1;
if (math.random() < explorationRate) {
action = actions[math.random(actions.length) - 1];
}
}
action = qLearner\getBestAction state
if action == 0
action = actions[math.random #actions]
else
explorationRate = 0.1
if math.random! < explorationRate
action = actions[math.random #actions]
Use the getBestAction(state)
method to get the known best action for the current state and employ an ε-greedy strategy
for exploration. This strategy balances exploiting known information and exploring new options.
7. Execute Action and Get Reward
- Lua
- Teal
- TypeScript
- YueScript
local reward = 0
if action == 1 then
-- Calculate reward based on action and current state
end
local reward = 0
if action == 1 then
-- Calculate reward based on action and current state
end
let reward = 0;
if (action === 1) {
// Calculate reward based on action and current state
}
reward = 0
reward = switch action
when 1 -- Using pistol
-- Calculate reward based on action and current state
Set the reward based on game logic. In a real game, the reward might be calculated after a series of actions.
8. Update Q Value
- Lua
- Teal
- TypeScript
- YueScript
qLearner:update(state, action, reward)
qLearner:update(state, action, reward);
qLearner.update(state, action, reward);
qLearner\update state, action, reward
Update the Q value with the received reward to improve strategy, allowing the agent to select optimal actions in different states.
9. Test the Learning Outcome
- Lua
- Teal
- TypeScript
- YueScript
for i, scenario in ipairs(testScenarios) do
-- Test different scenarios to view agent decisions
end
for i, scenario in ipairs testScenarios do
-- Test different scenarios to view agent decisions
end
for (let i = 0; i < testScenarios.length; i++) {
// Test different scenarios to view agent decisions
}
for i, scenario in ipairs testScenarios
-- Test different scenarios to view agent decisions
Perform tests and display the agent's decisions across different scenarios. Note that the decisions may vary due to randomly generated scenarios.
5.2 Sample Output
Learning completed, starting tests...
Scenario 1: Environment-Forest, Enemy-Infantry => Suggested weapon: Pistol
Scenario 2: Environment-Desert, Enemy-Tank => Suggested weapon: Rocket Launcher
Scenario 3: Environment-Snow, Enemy-Infantry => Suggested weapon: Sniper Rifle
6. Summary
With this complete code example, we achieved the following goals:
- Using the QLearner:pack() Method: Combined multiple conditions into a unique state value for easy storage and retrieval in the Q-table.
- Built a Reinforcement Learning Loop: Allowed the agent to try actions, gain rewards, and update its strategy across different states.
- Implemented ε-greedy Strategy: Balanced following known best strategies while maintaining exploration.
- Tested Learning Outcomes: Validated whether the agent learned to select optimal actions in predefined scenarios.
We hope this example helps you understand how to use the Q-Learning algorithm in the Dora SSR engine to develop game mechanics. For any questions, feel free to join our community for discussion!