200字范文,内容丰富有趣,生活中的好帮手!
200字范文 > 吴恩达机器学习笔记(五)正则化Regularization

吴恩达机器学习笔记(五)正则化Regularization

时间:2020-03-27 21:20:05

相关推荐

吴恩达机器学习笔记(五)正则化Regularization

正则化(regularization)

过拟合问题(overfitting)

Underfitting(欠拟合)–>high bias(高偏差)Overfitting(过拟合)–>high variance(高方差)Overfitting:If we have too many features, the learned hypothesis

may fit the training set very well , but fail to generalize to new examples (predict prices on new examples).模型泛化能力差

addressing overfitting

options:

1)reduce number of features(减少特征数量)

–Manually select which features to keep

–Model selection algorithm(模型选择算法)

2)regularization(正则化)

–keep all the features but reduce magnitude/values(但减少参数的大小/值) of parameters.

–Works well when we have a lot of features,each of which contributes a bit to predicting y.

代价函数Cost function(正则化代价函数)

the effect of penalizing two of the parameter values being large.

加入惩罚增大了两个参数带来的效果。

对 θj\theta_jθj​ 加入惩罚项:

In regularized linear regression,we choose θ\thetaθ to minimize.

Regularization线性回归代价函数:

J(θ)=12m[∑i=1m(hθ(x(i))−y(i))2+λ∑j=1mθj2]J(\theta)=\frac{1}{2m}\left[ \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^m\theta_j^2\right]J(θ)=2m1​[i=1∑m​(hθ​(x(i))−y(i))2+λj=1∑m​θj2​]

目标: min⁡θJ(θ)\underset{\theta}{\min}J(\theta)θmin​J(θ)

λ\lambdaλ:regularization parameter(正则参数)

λ很大的结果?

线性回归的正则化(Regularized linear regression)

梯度下降(Gradient descent)

梯度下降算法:

repeat:θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x0(i)\theta_0:= \theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}θ0​:=θ0​−αm1​i=1∑m​(hθ​(x(i))−y(i))x0(i)​

θj:=θj−α1m[∑i=1m(hθ(x(i))−y(i))x0(i)+λθj]\theta_j:= \theta_j-\alpha\frac{1}{m}\left[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}+\lambda\theta_j\right]θj​:=θj​−αm1​[i=1∑m​(hθ​(x(i))−y(i))x0(i)​+λθj​]

等价于:

θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x0(i)\theta_0:= \theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}θ0​:=θ0​−αm1​i=1∑m​(hθ​(x(i))−y(i))x0(i)​

θj:=θj(1−α1m)−α1m∑i=1m(hθ(x(i))−y(i))xj(i)\theta_j:= \theta_j(1-\alpha\frac{1}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}θj​:=θj​(1−αm1​)−αm1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​

正规方程(Normal equation)

正规方程:

假设: m≤n(examples≤features)m\leq n(examples\leq features)m≤n(examples≤features)

θ=(XTX)−1XTy\theta=(X^TX)^{-1}X^Ty θ=(XTX)−1XTy

if λ>0,

θ=(XTX+λ[011⋱1]⏟(n+1)×(n+1))−1XTy\theta=\left(X^TX+\lambda\underbrace{\begin{bmatrix} 0 \\ & 1 & &&\\&&1\\&&&⋱\\&&&&1 \end{bmatrix} }_{(n+1)\times(n+1)}\right)^{-1}X^Tyθ=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛​XTX+λ(n+1)×(n+1)⎣⎢⎢⎢⎢⎡​0​1​1​⋱​1​⎦⎥⎥⎥⎥⎤​​​⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞​−1XTy

只要λ>0,那么括号内的矩阵一定不是奇异矩阵,也就是可逆的。

逻辑回归的正则化(Regularization logistic regression)

逻辑回归代价函数:

J(θ)=−1m∑i=1m(y(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i))))+λ2m∑j=1mθj2J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(h_\theta(x^{(i)}))+(1-y^{(i)})\log(1-h_\theta(x^{(i)})))+\frac{\lambda}{2m}\sum_{j=1}^{m}\theta_j^2J(θ)=−m1​i=1∑m​(y(i)log(hθ​(x(i)))+(1−y(i))log(1−hθ​(x(i))))+2mλ​j=1∑m​θj2​

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。