200字范文,内容丰富有趣,生活中的好帮手!
200字范文 > R语言_决策树rpart中的cp值

R语言_决策树rpart中的cp值

时间:2020-05-28 11:41:42

相关推荐

R语言_决策树rpart中的cp值

在建立决策树时,需要平衡“准确度和复杂度”。

Why we need this 'bias-variance tradeoff'/ Why we need pruning(剪枝)?

用#splits(分枝数量)来体现树的复杂度,树越大越复杂,对training set的准确度也越高。

如果不顾一切地让树增大,会导致overfitting/over learning(对训练集表现得好,but for future data, it may perform badly)。

What is cp used for?

The main role of this parameter is to save computing time by pruning off splits that are obviously not worthwhile.

cp(complexity parameter) is used to 'control this bias-variance tradeoff'. A prefect cp leads to a perfect balance.

cp is used to judge whether a split is allowed/whether to prune.

What iscp?

Any split that does not decrease the overall lack of fit by a factor of cp is not attempted. For instance, with anova splitting, this means that the overall R-squared must increase by cp at each step.

Essentially,the user informs the program that any split which does not improve the fit by cp will likely be pruned off bycross-validation, and that hence the program need not pursue it.

当增加一个节点引起的分类精确度变化量小于树复杂度变化的cp倍时,则须剪去该节点。

如何度量树的精确度变化?对训练集做10-fold交叉验证(cross validation)。(在rpart.control函数中的参数xval(number of cross-validations)默认值为10)

Therefore if we set this parameter small, we can get a big tree.

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。