This project implements a value iteration-based multi-agent reinforcement learning algorithm to solve the Nash equilibrium problem in multi-agent systems.
The algorithm is based on the following key components:
- Value Iteration: Used to update Q-values and optimal policies for each agent.
- Actor-Critic Network: Approximates the value function and policy function.
- Gradient Clipping: Prevents gradient explosion problems.
- Adaptive Learning Rate: Decays over time to ensure algorithm convergence.
main_simulation()
: Main simulation loopvalue_iteration()
: Performs value iteration updatescompute_Mi()
: Calculates the Mi matrix for each agentactor_critic_network()
: Implements the Actor-Critic networktracking_error()
: Computes tracking errorssystem_dynamics()
: Simulates system dynamics
Here are some key results from running the algorithm:
This graph shows how the weights of the Critic and Actor networks change over time.
This graph displays the tracking errors for each agent over time.
This graph demonstrates how the state of each agent changes over time.
This graph shows how the control inputs for each agent change over time.
- Ensure your MATLAB environment is properly configured.
- Run the
main_simulation()
function to start the simulation. - Results will be automatically saved in the
result
directory.
- Algorithm performance may be affected by initial parameter settings.
- For large-scale systems, adjustment of learning rates and iteration numbers may be necessary.
- Implement more complex reward functions
- Explore other types of Actor-Critic architectures
- Test algorithm performance on real physical systems