Question
This quantity, which ranges from zero to one-half, represents the probability that a point in the dataset will be misclassified based on the rest of the data. For 10 points each:
[10h] Name this quantity that equals the sum of pk (“p-sub-k”) times “one minus pk” over all classes k. A standard criterion used by the CART algorithm chooses splits to minimize this quantity.
ANSWER: Gini impurity [or Gini index; reject “Gini coefficient”]
[10m] The CART algorithm uses Gini impurity to form these constructs. These constructs are formed via bootstrap samples and then aggregated in a “random” method named for containing multiple of them.
ANSWER: decision trees [prompt on trees] (Random forests average over multiple decision trees.)
[10e] Random forests can be used for both classification and this other task, its continuous analogue. A common approach to this task minimizes the sum of squared residuals.
ANSWER: regression [accept least squares regression; accept ordinary least squares regression; prompt on ordinary least squares or OLS]
<Nageswaran, Other Science>
Summary
2024 ESPN @ Stanford | 03/09/2024 | Y | 2 | 10.00 | 100% | 0% | 0% |
Data
Berkeley A | Berkeley B | 0 | 0 | 10 | 10 |
Free Agents | Stanford | 0 | 0 | 10 | 10 |