200字范文 > 【深度学习】吴恩达深度学习-Course2改善深层神经网络：超参数调试正则化以及优化-

【深度学习】吴恩达深度学习-Course2改善深层神经网络：超参数调试正则化以及优化-

时间：2021-03-25 20:12:59

视频链接：【中英字幕】吴恩达深度学习课程第二课 — 改善深层神经网络：超参数调试、正则化以及优化

参考链接：

【中英】【吴恩达课后测验】Course 2 - 改善深层神经网络 - 第三周测验吴恩达 — 神经网络与深度学习 — L2W3练习

中文习题

1.如果在大量的超参数中搜索最佳的参数值，那么应该尝试在网格中搜索而不是使用随机值，以便更系统的搜索，而不是依靠运气，请问这句话是正确的吗？

A.不对

B.对

2.每个超参数如果设置得不好，都会对训练产生巨大的负面影响，因此所有的超参数都要调整好，请问这是正确的吗？

A.不对

B.对

3.在超参数搜索过程中，你尝试只照顾一个模型（使用熊猫策略）还是一起训练大量的模型（鱼子酱策略）在很大程度上取决于：

A.是否使用批量（batch）或小批量优化（mini-batch optimization）

B.神经网络中局部最小值（鞍点）的存在性

C.在你能力范围内，你能够拥有多大的计算能力

D.需要调整的超参数的数量

4.如果您认为(β)（动量超参数）介于0.9和0.99之间，那么推荐采用以下哪一种方法来对(β)值进行取样？

r = np.random.rand()beta = r * 0.09 + 0.9

r = np.random.rand() beta = 1 - 10 ** ( - r - 1 )

r = np.random.rand() beta = 1 - 10 ** ( - r + 1 )

r = np.random.rand() beta = r * 0.9 + 0.09

5.找到好的超参数的值是非常耗时的，所以通常情况下你应该在项目开始时做一次，并尝试找到非常好的超参数，这样你就不必再次重新调整它们。请问这正确吗？

A.不对

B.对

6.在视频中介绍的批量标准化中，如果将其应用于神经网络的第l层，您应该对谁进行标准化？

A. ( z [ l ] ) (z^{[l]}) (z[l])

B. ( W [ l ] ) (W^{[l]}) (W[l])

C. ( a [ l ] ) (a^{[l]}) (a[l])

D. ( b [ l ] ) (b^{[l]}) (b[l])

7.在标准化公式 z n o r m ( i ) = z ( i ) − m u s i g m a 2 + e p s i l o n z^{(i)}_{norm}=\frac{z^{(i)}-mu}{\sqrt{sigma^2+epsilon}} znorm(i)=sigma2+epsilon z(i)−mu，为什么要使用 e p s i l o n ( ϵ ) epsilon(ϵ) epsilon(ϵ)？

A.为了更准确地标准化

B.为了避免除零操作

C.为了加速收敛

D.防止(mu)太小

8.批标准化中关于γ和β的以下哪些陈述是正确的？（多选）

A.对于每个层，有一个全局值γ和一个全局值β，适用于于该层中的所有隐藏单元。

B.γ和β是算法的超参数，我们通过随机采样进行调整

C.它们确定了给定层的线性变量 z [ l ] z^{[l]} z[l]的均值和方差

D.最佳值是 γ = s i g m a 2 + e p s i l o n , β = m u γ=\sqrt{sigma^2+epsilon},β=mu γ=sigma2+epsilon ,β=mu

E.它们可以用Adam、动量的梯度下降或RMSprop，而不仅仅是用梯度下降来学习

9.在训练了具有批标准化的神经网络之后，在用新样本评估神经网络的时候，您应该：

A.如果你在256个例子的mini-batch上实现了批标准化，那么如果你要在一个测试例子上进行评估，你应该将这个例子重复256次，这样你就可以使用和训练时大小相同的mini-batch进行预测。

B.使用最新的mini-batch的 μ μ μ和 σ 2 σ^2 σ2值来执行所需的标准化

C.跳过用 μ μ μ和 σ 2 σ^2 σ2值标准化的步骤，因为一个例子不需要标准化

D.执行所需的标准化，使用在训练期间，通过指数加权平均值得出的 μ μ μ和 σ 2 σ^2 σ2

10.关于深度学习编程框架的这些陈述中，哪一个是正确的？（选出所有正确项）

A.即使一个项目目前是开源的，项目的良好管理有助于确保它即使在长期内仍然保持开放，而不是仅仅为了一个公司而关闭或修改。

B.通过编程框架，您可以使用比低级语言（如Python）更少的代码来编写深度学习算法。

C.深度学习编程框架的运行需要基于云的机器。

英文习题

1.If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False?

A.False

B.True

2.Every hyperparameter, if set poorly, can have a huge negative impact on training, and so all hyperparameters are about equally important to tune well. True or False?

A.False

B.True

3.During hyperparameter search, whether you try to babysit one model (“Panda” strategy) or train a lot of models in parallel (“Caviar”) is largely determined by:

A.Whether you use batch or mini-batch optimization

B.The presence of local minima (and saddle points) in your neural network

C.The amount of computational power you can access

D.The number of hyperparameters you have to tune

4.If you think β (hyperparameter for momentum) is between on 0.9 and 0.99, which of the following is the recommended way to sample a value for beta?

r = np.random.rand()beta = r * 0.09 + 0.9

r = np.random.rand() beta = 1 - 10 ** ( - r - 1 )

r = np.random.rand() beta = 1 - 10 ** ( - r + 1 )

r = np.random.rand() beta = r * 0.9 + 0.09

5.Finding good hyperparameter values is very time-consuming. So typically you should do it once at the start of the project, and try to find very good hyperparameters so that you don’t ever have to revisit tuning them again. True or false?

A.False

B.True

6. In batch normalization as presented in the videos, if you apply it on the lth layer of your neural network, what are you normalizing?

A. ( z [ l ] ) (z^{[l]}) (z[l])

B. ( W [ l ] ) (W^{[l]}) (W[l])

C. ( a [ l ] ) (a^{[l]}) (a[l])

D. ( b [ l ] ) (b^{[l]}) (b[l])

7. In the normalization formula z n o r m ( i ) = z ( i ) − m u s i g m a 2 + e p s i l o n z^{(i)}_{norm}=\frac{z^{(i)}-mu}{\sqrt{sigma^2+epsilon}} znorm(i)=sigma2+epsilon z(i)−mu, why do we use e p s i l o n ( ϵ ) epsilon(ϵ) epsilon(ϵ)?

A. For more accurate standardization

B. To avoid division by zero

C. To accelerate convergence

D. Prevent (MU) from being too small

8.Which of the following statements about γ and β in Batch Norm are true?(multiple choice)

A. For each layer, there is a global value γ And a global value β， Applies to all hidden cells in this layer.

B. γ and β Is the super parameter of the algorithm, which we adjust by random sampling

C. They determine the mean and variance of the linear variable $Z ^ {[l]} $for a given layer

D. The best value is γ = s i g m a 2 + e p s i l o n , β = m u γ=\sqrt{sigma^2+epsilon}, β= mu γ=sigma2+epsilon ,β=mu

E. They can learn with Adam, gradient descent of momentum or rmsprop, not just gradient descent

9.After training a neural network with Batch Norm, at test time, to evaluate the neural network on a new example you should:

A. If you have achieved batch standardization on 256 Mini batch examples, if you want to evaluate on a test example, you should repeat this example 256 times so that you can use the same Mini batch size as during training for prediction.

B. Using the latest Mini batch μ μ μ And σ 2 σ^ 2 σ2 value to perform the required standardization

C. Skip use μ μ μ And σ 2 σ^ 2 σ2 value standardization step, because an example does not need standardization

D. Perform the required standardization using the results obtained by exponentially weighted averages during training μ μ μ And σ 2 σ^2 σ2

10.Which of these statements about deep learning programming frameworks are true? (Check all that apply)

A.A programming framework allows you to code up deep learning algorithms with typically fewer lines of code than a lower-level language such as Python.

B.Even if a project is currently open source, good governance of the project helps ensure that the it remains open even in the long term, rather than become closed or modified to benefit only one company.

C.Deep learning programming frameworks require cloud-based machines to run.

参考答案

A。

网格取值类似下图：

随机取值：

因为我们并不知道哪个超参数更重要，所以需要用随机取值的方法。A。类似上一题，超参数分为重要&不重要，有的超参数相对没那么重要，我们不必花费精力去不断调整它。C。熊猫策略（Babysitting one model）适用于庞大数组，但没有足够的计算资源/足够的CPU和GPU前提下。金鱼策略（Caviar）适用于能够同时试验多种模型的情况（即计算资源足够）。B。 r的取值只能在0~1之间，当r=0时， β = 1 − 1 0 − 1 = 1 − 0.1 = 0.9 β=1-10^{-1}=1-0.1=0.9 β=1−10−1=1−0.1=0.9，当r=1时， β = 1 − 1 0 − 2 = 1 − 0.01 = 0.99 β=1-10^{-2}=1-0.01=0.99 β=1−10−2=1−0.01=0.99.A。模型中细微的变化可能都会导致您需要重头开始找到好的参数。A。实践中，通常先归一化 z [ l ] z^{[l]} z[l]再激活。B。视频中说ϵ用于使数值稳定，也就是保证不会出现除以0的情况。CE。这个我似乎拿捏不准，就不放个人理解了。D。在测试过程中， μ μ μ和 σ 2 σ^2 σ2都是通过加权平均之前的 μ μ μ和 σ 2 σ^2 σ2得来的。AB。吴恩达教授说的是有时公司会将一部分功能放到云平台上以进行收费，而不是如C选项那么说。

【深度学习】吴恩达深度学习-Course2改善深层神经网络：超参数调试正则化以及优化-第三周超参数调试 Batch正则化和程序框架作业

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。