Reinforcement Learning for Inverted Pendulum Control
Overview
This project involved applying Reinforcement Learning (RL) algorithms to control both single and double inverted pendulum systems. Using algorithms such as Q-Learning, DQN, DDPG, and PPO, we implemented controllers to achieve swing-up and stabilization tasks. The project explored the dynamic complexities of inverted pendulum systems and highlighted the effectiveness of RL techniques for non-linear control problems.
Results
- Single Inverted Pendulum:
- Achieved a 100% success rate for swing-up and stabilization tasks under ideal conditions.
- Maintained a 90% success rate under noisy conditions with a simulation time of 30 seconds.
- Double Inverted Pendulum:
- Successfully stabilized the pendulum but encountered challenges in achieving swing-up with model-free RL methods.
- Performance Metrics:
- Trained RL models for swing-up and stabilization tasks in under 50,000 episodes.
- Demonstrated the effectiveness of custom reward functions for dynamic control tasks.
Report PDF | GitHub (Chinese README)



Technical Details
- Algorithms Applied:
- Q-Learning and DQN: Explored discrete action spaces for initial experiments.
- A2C and PPO: Achieved robust performance for stabilization tasks in continuous action spaces.
- DDPG: Provided smooth control for swing-up tasks with deterministic policy gradients.
- Custom Toolkit:
- Developed RL agents from scratch using PyTorch, including functions for initialization, model updates, and action sampling.
- Designed visualization tools to monitor reward curves and training metrics.
- Reward Design:
- Swing-up Task: Rewarded higher pendulum angles while penalizing velocity at the peak.
- Stabilization Task: Encouraged minimal deviation from the vertical position and low angular velocity.
Challenges
- Swing-Up Task:
- Coordinating motion during the throw-and-catch process was challenging, especially under noisy conditions.
- Solution: Implemented collaborative agents for swing-up and stabilization, with separate reward functions for each sub-task.
- Double Inverted Pendulum:
- Model-free RL struggled with the system’s chaotic behavior.
- Solution: Transitioned to model-based approaches like PILCO for better state-action-reward predictions.
Reflection and Insights
This project deepened my understanding of reinforcement learning and its application to real-world control problems. It highlighted the importance of tailored reward functions and robust algorithm selection for dynamic systems. The challenges in handling chaotic behaviors inspired further exploration into model-based strategies to enhance RL performance.
Team and Role
- Team: Worked collaboratively with two teammates on RL model implementation and evaluation.
- My Role:
- Focused on the single inverted pendulum tasks, including algorithm selection and reward function design.
- Developed custom RL agents using PyTorch, optimizing hyperparameters for efficient training.
- Led the implementation of the collaborative “throw-catch” process for swing-up tasks.
Reinforcement Learning for Inverted Pendulum Control
https://liferli.com/2024/05/20/projects/rl-inverted-pendulum/