Basis in Machine Learning

Basis in Machine Learning

2018, Sep 24    

About Train, Validation and Test Sets in Machine Learning

Quote the base definitions from Tarang Shah

A visualisation of the splits

Training Dataset: The sample of data used to fit the model.

Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.

Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset. The Test dataset provides the gold standard used to evaluate the model. It is only used once a model is completely trained(using the train and validation sets)