We have learnt to predict something with one model, one set of selected features (or columns), and one set of parameters.
And in EP 7, we can predict it with one model and one set of selected features. Parameters are the results of model selection process.
What if we cannot select features, let’s say there are too many features to pick up?
Ensemble is a class embedded in scikit-learn package. It can calculate the best results from combining many algorithms on different feature sets.
For example, we have many data dimension of residents such as geo-location, size, land price, number of floors, referent web rating etc. and we need to predict a price of a house in downtown. In this case we are experiencing the tough problems against those many features and this is what the Ensemble is for.
This time is the sample Ensemble types: Bagging and Random Forest.
Bagging stands for Bootstrap Aggregating. It creates different estimators on random dataset over all features. Here are some main parameter of this.
Specify estimator type, Decision tree by default.
Number of different estimators, 10 by default.
Number of sample sets for training model
import sklearn.ensemble and create
BaggingRegressor() with a
DecisionTreeRegressor() inside. Apply
n_estimators as 5 and
max_samples as 25. After prediction we found its MedAE is 65,667.
Now we created 3 more models with different values of
max_samples. The first one is the best here.
As the latest episode, we try run
GridSearchCV() over it.
The best estimator after computing can create a model with MedAE by only 63,443 points. This uses 16 features out of 44 features from the source.
Right now we go for Random Forest. Random Forest is different from Bagging at Random Forest computes on some features.
We can apply Random Forest estimator in the same way as Decision tree. Just put parameters and
Ooh, we made an estimator from Random Forest and it’s better and Bagging’s one. This MedAE is just 14,334 with 17 features occupied.
I can say this one is quite complex for me and need more practice.
Let’s see what’s next and I gonna share to you all.