# Note of data science training EP 7: Metrics – It is qualified

prev: Note of data science training EP 6: Decision Tree – At a point of distraction

After we played Supervised ML algorithms, Regression and Classification. At this time, we need to validate how strong our models are.

# Technical terms

**Bias**

Bias data goes away from “” values. For example, we found number 100 from the employees’ age list while there should be less than 50 years old.**normal****Variance**

This is located out of “” values. For example, we found OT data of a factory department which is between 0 – 20 hours as it should be narrower such as 0 – 5 hours. This causes our prediction be more difficult.**trending****Overfitting**

It is an event our model try to. As a result, our model will have low bias but high variance. Decision tree is an example of this case.**capture all of the data****Underfitting**

This is opposite to overfitting case, it tries to capture trending data. Our model will have high bias but low variance. Regressions are the examples.

Ref: https://towardsdatascience.com/understanding-the-bias-variance-tradeoff-165e6942b229

# What are tools to test models

## Regressors’ benchmarks

We had known for some from EP 4 and here is recap.

- \(r^2\) score

Comparing predictions to the real result. Higher is better and maximum at 1. This indicates how much performance our models are compared to the base line (refer to dummy model in the last episode). - \(MedAE\) or Median Absolute Error

Median of the errors. Lower is better. - \(MAE\) or Mean Absolute Error

Average of errors. Lower is better. This identifies how many outliers in data. - \(MSE\) or Mean Square Error

Average of errors power 2. The lower is the better. It refers how many errors affecting our model not to be normal distribution.

Reference link:

- https://medium.com/@george.drakos62/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0
- https://peltarion.com/knowledge-center/documentation/evaluation-view/regression-loss-metrics

## Classifiers’s benchmarks

- Accuracy score

Ratio of correct predictions and number of predictions. Worst at 0 and best at 1. - Precision score

Ratio of correct positive prediction and number of positive predictions. Worst at 0 and best at 1. - Recall score

Ratio of correct positive prediction and number of positive real data. Worst at 0 and best at 1. - \(F_1\) score

Calculated by the formula \(F_1 = (\frac{Precision^{-1} + Recall^{-1}}{2})^{-1} = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}\). This will be used in most of real-world problems as this is robust to large number of negative real data. Worst at 0 and best at 1. - Confusion matrix

This matrix enumerates a number of each prediction and real data. Alternative way is`pd.crosstab()`

which displays them as percentages.

Reference link:

- https://www.bualabs.com/archives/1968/what-is-confusion-matrix-what-is-metrics-accuracy-precision-recall-f1-score-difference-metrics-ep-1/
- https://medium.com/analytics-vidhya/accuracy-vs-f1-score-6258237beca2

## Model Selection

How can we build a model that generates best scores? Here is a solution. It’s a module `sklearn.model_selection.GridSearchCV()`

For the example, we defined a `GridSearchCV()`

with a parameter set as “criterion” is “gini” and “max_depth” starts at 3 and less than 10. “cv” is a default value for cross-validation algorithm (5 for newer version of the library at this writing time).

As a result, the sample model can be the best when we define “max_depth” as 3 as shown in `.best_params`

and the `.best_score`

can be that high at 0.76.

These are basic model evaluations and I found lots of way while researching about it. You can try new methods to test your models and feel free to share me. 😄

See ya next time.

Bye~

next: Note of data science training EP 8: Ensemble – Avenger's ensemble