Question
This quantity, which ranges from zero to one-half, represents the probability that a point in the dataset will be misclassified based on the rest of the data. For 10 points each:
[10h] Name this quantity that equals the sum of pk (“p-sub-k”) times “one minus pk” over all classes k. A standard criterion used by the CART algorithm chooses splits to minimize this quantity.
ANSWER: Gini impurity [or Gini index; reject “Gini coefficient”]
[10m] The CART algorithm uses Gini impurity to form these constructs. These constructs are formed via bootstrap samples and then aggregated in a “random” method named for containing multiple of them.
ANSWER: decision trees [prompt on trees] (Random forests average over multiple decision trees.)
[10e] Random forests can be used for both classification and this other task, its continuous analogue. A common approach to this task minimizes the sum of squared residuals.
ANSWER: regression [accept least squares regression; accept ordinary least squares regression; prompt on ordinary least squares or OLS]
<Nageswaran, Other Science>
Summary
2024 ESPN @ Brown | 04/06/2024 | Y | 2 | 15.00 | 100% | 50% | 0% |
2024 ESPN @ Cambridge | 04/06/2024 | Y | 2 | 10.00 | 100% | 0% | 0% |
2024 ESPN @ Chicago | 03/23/2024 | Y | 5 | 10.00 | 80% | 20% | 0% |
2024 ESPN @ Columbia | 03/23/2024 | Y | 7 | 11.43 | 100% | 14% | 0% |
2024 ESPN @ Duke | 03/23/2024 | Y | 2 | 10.00 | 50% | 50% | 0% |
2024 ESPN @ Online | 06/01/2024 | Y | 4 | 12.50 | 100% | 25% | 0% |