正则化(regularization)
过拟合问题(overfitting)
Underfitting(欠拟合)–>high bias(高偏差)Overfitting(过拟合)–>high variance(高方差)Overfitting:If we have too many features, the learned hypothesismay fit the training set very well , but fail to generalize to new examples (predict prices on new examples).模型泛化能力差
addressing overfitting
options:
1)reduce number of features(减少特征数量)
–Manually select which features to keep
–Model selection algorithm(模型选择算法)
2)regularization(正则化)
–keep all the features but reduce magnitude/values(但减少参数的大小/值) of parameters.
–Works well when we have a lot of features,each of which contributes a bit to predicting y.
代价函数Cost function(正则化代价函数)
the effect of penalizing two of the parameter values being large.
加入惩罚增大了两个参数带来的效果。
对 θj\theta_jθj 加入惩罚项:
In regularized linear regression,we choose θ\thetaθ to minimize.
Regularization线性回归代价函数:
J(θ)=12m[∑i=1m(hθ(x(i))−y(i))2+λ∑j=1mθj2]J(\theta)=\frac{1}{2m}\left[ \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^m\theta_j^2\right]J(θ)=2m1[i=1∑m(hθ(x(i))−y(i))2+λj=1∑mθj2]
目标: minθJ(θ)\underset{\theta}{\min}J(\theta)θminJ(θ)
λ\lambdaλ:regularization parameter(正则参数)
λ很大的结果?
线性回归的正则化(Regularized linear regression)
梯度下降(Gradient descent)
梯度下降算法:
repeat:θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x0(i)\theta_0:= \theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)
θj:=θj−α1m[∑i=1m(hθ(x(i))−y(i))x0(i)+λθj]\theta_j:= \theta_j-\alpha\frac{1}{m}\left[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}+\lambda\theta_j\right]θj:=θj−αm1[i=1∑m(hθ(x(i))−y(i))x0(i)+λθj]
等价于:
θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x0(i)\theta_0:= \theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)
θj:=θj(1−α1m)−α1m∑i=1m(hθ(x(i))−y(i))xj(i)\theta_j:= \theta_j(1-\alpha\frac{1}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}θj:=θj(1−αm1)−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
正规方程(Normal equation)
正规方程:
假设: m≤n(examples≤features)m\leq n(examples\leq features)m≤n(examples≤features)
θ=(XTX)−1XTy\theta=(X^TX)^{-1}X^Ty θ=(XTX)−1XTy
if λ>0,
θ=(XTX+λ[011⋱1]⏟(n+1)×(n+1))−1XTy\theta=\left(X^TX+\lambda\underbrace{\begin{bmatrix} 0 \\ & 1 & &&\\&&1\\&&&⋱\\&&&&1 \end{bmatrix} }_{(n+1)\times(n+1)}\right)^{-1}X^Tyθ=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛XTX+λ(n+1)×(n+1)⎣⎢⎢⎢⎢⎡011⋱1⎦⎥⎥⎥⎥⎤⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞−1XTy
只要λ>0,那么括号内的矩阵一定不是奇异矩阵,也就是可逆的。
逻辑回归的正则化(Regularization logistic regression)
逻辑回归代价函数:
J(θ)=−1m∑i=1m(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i))))+λ2m∑j=1mθj2J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(h_\theta(x^{(i)}))+(1-y^{(i)})\log(1-h_\theta(x^{(i)})))+\frac{\lambda}{2m}\sum_{j=1}^{m}\theta_j^2J(θ)=−m1i=1∑m(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i))))+2mλj=1∑mθj2