Using the Figure below, classify the following penguins (in the table) as either a Adelie or Gentoo penguin.
species | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | |
---|---|---|---|---|---|
? | 49.1 | 14.8 | 220.0 | 5150.0 | |
? | 37.7 | 19.8 | 198.0 | 3500.0 |
Using the Figure below, classify the following penguins as either a Adelie, Gentoo, or Adelie penguin
species | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | |
---|---|---|---|---|---|
? | 35.2 | 15.9 | 186.0 | 3050.0 | |
? | 51.3 | 18.2 | 197.0 | 3750.0 |
Demonstrate why Entropy or Gini impurity is better than classification error for identifying which of the following is a better splitting scenario:
Although generally Gini impurity is lower in both child nodes compared to the parent node, demonstrate using a 1D dataset with ordered classes A, B, A, A, A that this is not always the case.
If a decision tree is overfitting to the training set, would it be a good idea to try decreasing max_depth
?
max_depth
has a regularization effect on the model.
If a decision tree is underfitting the training set, would it be a good idea to scale the input features?
Assume we have three base classifiers in a majority voting ensemble and $C_j(j \in {0,1})$. Each classifier predicts the following: $$ C_1(x) \rightarrow 0, C_2(x) \rightarrow 0, C_3(x) \rightarrow 1 $$ What is the predicted class of the majority voting ensemble if...
a. ...no weights are assigned?
b. ...$C_1$ and $C_2$ have a weight of 0.2, and $C_3$ has a weight of 0.6?
c. ...the classifiers have weights as in b, but instead predict $C_1(x) \rightarrow [0.9,0.1], C_2(x) \rightarrow [0.8,0.2], C_3(x) \rightarrow [0.4,0.6]$
If you trained five different models on the same training data, and they all achieve 95% precision, would combining these classifier lead to better results? Explain your reasoning.
Why might out-of-bag evaluation slightly improve training performance when tuning hyperparameters than cross-validation?
Why are "Extra-Trees" more random than regular "Random Forests"? Why would you want to use "Extra-Trees"? Do you think "Extra-Trees" would be faster or slower to train?