training error bigger than testing (validation) error (XGBoostRegressor)



  • I'm building a XGBRegressor() model to do time series forecasting with 96 rows of data. But after i tuned my model with grid search, i rarely have the testing error exceed the training error. The evaluation metric i use is RMSE. Can anyone tell me what i did wrong with my model and what should i do?

    This is my code

    
    # Create the parameter grid: gbm_param_grid
    gbm_param_grid = {
        'learning_rate' : [0.01, 0.05, 0.1, 0.2],
        'subsample': [0.2, 0.4, 0.6, 0.8],
        'colsample_bytree': [0.2, 0.4, 0.6, 0.8],
        'n_estimators': [1,10,100],
        'max_depth': [3, 4, 5, 6, 7, 8],
        'gamma': [0,0.1,0.2],
        'reg_alpha': [0, 0.001, 0.002],
    }
    
    # Instantiate the regressor: gbm
    gbm = xgb.XGBRegressor(objective="reg:squarederror")
    
    # timeseries CV
    tscv = TimeSeriesSplit(n_splits=4)
    
    # Perform grid search: grid_mse
    grid_mse = GridSearchCV(estimator=gbm, param_grid=gbm_param_grid,
                            scoring='neg_mean_squared_error', cv=tscv, verbose=1)
    grid_mse.fit(X_train, y_train)
    
    # Print the best parameters and lowest RMSE
    print("Best parameters found: ", grid_mse.best_params_)
    print("Lowest RMSE found: ", np.sqrt(np.abs(grid_mse.best_score_)))
    

Log in to reply