Reinforcement Learning for Inverted Pendulum Control

Overview

This project involved applying Reinforcement Learning (RL) algorithms to control both single and double inverted pendulum systems. Using algorithms such as Q-Learning, DQN, DDPG, and PPO, we implemented controllers to achieve swing-up and stabilization tasks. The project explored the dynamic complexities of inverted pendulum systems and highlighted the effectiveness of RL techniques for non-linear control problems.

Results

  • Single Inverted Pendulum:
    • Achieved a 100% success rate for swing-up and stabilization tasks under ideal conditions.
    • Maintained a 90% success rate under noisy conditions with a simulation time of 30 seconds.
  • Double Inverted Pendulum:
    • Successfully stabilized the pendulum but encountered challenges in achieving swing-up with model-free RL methods.
  • Performance Metrics:
    • Trained RL models for swing-up and stabilization tasks in under 50,000 episodes.
    • Demonstrated the effectiveness of custom reward functions for dynamic control tasks.

Report PDF | GitHub (Chinese README)

Swing-up from a stationary state
Swing-up under noisy conditions
Stabilization of the double inverted pendulum

Technical Details

  • Algorithms Applied:
    • Q-Learning and DQN: Explored discrete action spaces for initial experiments.
    • A2C and PPO: Achieved robust performance for stabilization tasks in continuous action spaces.
    • DDPG: Provided smooth control for swing-up tasks with deterministic policy gradients.
  • Custom Toolkit:
    • Developed RL agents from scratch using PyTorch, including functions for initialization, model updates, and action sampling.
    • Designed visualization tools to monitor reward curves and training metrics.
  • Reward Design:
    • Swing-up Task: Rewarded higher pendulum angles while penalizing velocity at the peak.
    • Stabilization Task: Encouraged minimal deviation from the vertical position and low angular velocity.

Challenges

  • Swing-Up Task:
    • Coordinating motion during the throw-and-catch process was challenging, especially under noisy conditions.
    • Solution: Implemented collaborative agents for swing-up and stabilization, with separate reward functions for each sub-task.
  • Double Inverted Pendulum:
    • Model-free RL struggled with the system’s chaotic behavior.
    • Solution: Transitioned to model-based approaches like PILCO for better state-action-reward predictions.

Reflection and Insights

This project deepened my understanding of reinforcement learning and its application to real-world control problems. It highlighted the importance of tailored reward functions and robust algorithm selection for dynamic systems. The challenges in handling chaotic behaviors inspired further exploration into model-based strategies to enhance RL performance.

Team and Role

  • Team: Worked collaboratively with two teammates on RL model implementation and evaluation.
  • My Role:
    • Focused on the single inverted pendulum tasks, including algorithm selection and reward function design.
    • Developed custom RL agents using PyTorch, optimizing hyperparameters for efficient training.
    • Led the implementation of the collaborative “throw-catch” process for swing-up tasks.