Bagging vs Boosting

Bagging and boosting are ensemble techniques that reduce errors and increase stability of the final model by combining multiple models. The principle idea is to group weak learners to form one strong learner. Errors from machine learning models are usually due to variance, noise or bias and ensemble techniques work to reduce variance and bias.

How are they similar?

Both bagging and boosting are similar in that based on an original dataset $D$ where random sampling with replacement is used to create $m$ new datasets. By sampling with replacement, some observations are repeated in each new training dataset $D_i$. It is generally expected that around 63.2% (i.e., $1-1/e$) are unique data points, and the rest are duplicates).

A machine learning model can be trained on each of these datasets from $D_0 ... D_m$, so we have multiple learners (or models)

How are they different?




Probability of data points

Each element has the same probability of occuring in a new dataset.

Each element is weighted and may appear in new datasets more often.

Different training mechanism

Each classifier is built independently.

Each classifier is trained on the data accounting for previous classifier's success. After each training step the weights are re-distributed and misclassified data has its weights increased to emphasize the most difficult prediction cases. Thus, subsequent learners focus on successfully predicting these difficult cases.

Averaging process for multiple learners

A final result is obtained by averaging the response of $m$ learners.

A learner with a good classification result on training data is assigned a higher weight than a poor one (i.e., weight based on accuracy of learner). Thus, boosting requires keeping track of learner's errors. Some boosting techniques have a threshold whereby learner are discarded if their performance is low (<50%). The final result can be obtained via a majority voting mechanism across all learners.

Which is more useful?

Bagging and Boosting are useful as they decrease the variance of a single estimate, given that the process is a combination of several estimates from different modules. Thus, the resulting model has greater stability.

If a single model has poor predictive performance, boosting is the better option. This is because boosting introduces multiple learners and focuses on overweighting learners that are strong.

If a single model is overfitting, bagging is the better option. This is because bagging introduces multiple learners and weights them equally.


Comments powered by Disqus