Mathematical Contest In Modeling

Overview

As part of the Mathematical Contest in Modeling (MCM), our team developed models to analyze and predict trends in the game “Wordle.” Leveraging ARIMA for time-series forecasting, BP neural networks for player behavior prediction, and K-means clustering for difficulty classification, we provided actionable insights for game developers and researchers. The project demonstrated the applicability of machine learning techniques to analyze complex patterns in player data.

Results

  • Achievements:
    • Modeled and predicted participant numbers with ARIMA, achieving high accuracy (R² = 0.982).
    • Classified Wordle words into “easy,” “medium,” and “difficult” categories using K-means clustering.
    • Predicted distribution of attempts for words based on their features with BP neural networks.
  • Model Outputs:
    • Predicted the number of participants on March 1, 2023, to be between 10,288 and 10,624.
    • Classified the word EERIE as of medium difficulty based on clustering results.
    • Visualized player behavior with accurate predictions for multiple words.

[Report PDF]

Technical Details

  • Time-Series Forecasting (ARIMA):
    • Applied ARIMA (1,1,0) to model participant trends over time.
    • Conducted statistical tests (ADF and Ljung-Box) to ensure model validity.
    • Predicted the number of participants with high reliability.
  • Attempts Percentage Prediction (BP Neural Network):
    • Extracted word features, including isolation level, priority, and elimination value.
    • Designed and trained a BP neural network with 70% training data, achieving optimal performance for predicting attempt distributions.
    • Key Metrics: RMSE = 3.57, MAPE = 45.28%.
  • Difficulty Classification (K-means):
    • Clustered words into three difficulty levels using player attempt distributions.
    • Evaluated and classified the word EERIE as medium difficulty based on clustering results.
    • Improved model robustness by excluding outliers and noisy data.

Challenges

  • Data limitations: A small dataset (~359 samples) constrained model generalization.
  • Non-linear relationships between features and outcomes made interpretability challenging.
  • Overfitting risks in ARIMA required careful parameter tuning and validation.

Reflection and Insights

This competition provided valuable experience in applying statistical and machine learning techniques to real-world problems. It reinforced the importance of data cleaning, feature extraction, and model validation. The integration of multiple models (ARIMA, BP neural networks, and K-means) highlighted the versatility of these methods in addressing diverse challenges.

Team and Role

  • Team: Collaborated with two teammates on data analysis, modeling, and report writing.
  • My Role:
    • Played a leading role in modeling tasks, including the design and implementation of ARIMA, BP neural networks, and K-means clustering.
    • Assisted in programming tasks, focusing on feature engineering and model optimization.
    • Contributed significantly to the report writing by summarizing results, interpreting findings, and drafting technical sections.