200字范文 > Keras防止过拟合（一）Dropout层源码细节

Keras防止过拟合（一）Dropout层源码细节

时间：2021-12-22 14:46:14

在使用深度学习模型时，会遇到两种问题，过拟合和欠拟合。其中，解决欠拟合的方法有增大数据集，优化模型等等，根据具体问题具体对待。过拟合的问题，可以通过Dropout，添加L1,L2正规项等等很简单的方法解决，网上有许多文章介绍这些方法。但其代码具体如何实现，实现中的细节问题，却缺少描述。我就写写解决过拟合这方面的博客，刚好也是督促自己学习！

Dropout原理

有许多优质文章讲过Dropout操作的原理，大家可以去搜一搜。我在这里只简单提一下，重点讲解Keras是如何实现的。

上图是在神经网络中使用Dropout，其实就是在前向传播时抛弃一部分连接，让某些神经元不工作，这样可以提高泛化能力，防止过度依赖局部特征。

为了实现Dropout，可以考虑到：

1.神经元停止工作的方法是，将神经元的值变为0。

2.随机抛弃一部分。

源码细节

Dropout在keras中的源码位于keras\layers\core.py中，官方文档有如下介绍：

其部分代码如下：

def __init__(self, rate, noise_shape=None, seed=None, **kwargs):super(Dropout, self).__init__(**kwargs)self.rate = min(1., max(0., rate))self.noise_shape = noise_shapeself.seed = seedself.supports_masking = Truedef _get_noise_shape(self, inputs):if self.noise_shape is None:return self.noise_shapesymbolic_shape = K.shape(inputs)noise_shape = [symbolic_shape[axis] if shape is None else shapefor axis, shape in enumerate(self.noise_shape)]return tuple(noise_shape)def call(self, inputs, training=None):if 0. < self.rate < 1.:noise_shape = self._get_noise_shape(inputs)def dropped_inputs():return K.dropout(inputs, self.rate, noise_shape,seed=self.seed)return K.in_train_phase(dropped_inputs, inputs,training=training)return inputs

有三个参数，rate，noise_shape，seed. 官方文档中有介绍，其中，noise_shape有些难以理解，等等按照代码解释。

观察代码，其核心部分是：

def dropped_inputs():return K.dropout(inputs, self.rate, noise_shape,seed=self.seed)return K.in_train_phase(dropped_inputs, inputs,training=training)

先解释最后的返回部分，K.in_train_phase函数：

其作用是保证只在训练的时候执行dropout操作，不然的话，模型在测试时也执行了dropout,相当于将本来已经训练好了的模型又抛弃了一部分。

K.dropout函数，位于keras\backend\tensorflow_backend.py中（我用tensorflow作为后端），源码：

def dropout(x, level, noise_shape=None, seed=None):if seed is None:seed = np.random.randint(10e6)return tf.nn.dropout(x, rate=level, noise_shape=noise_shape, seed=seed)

好吧，还不是具体实现，那再去看看tf.nn.dropout源码，找了一圈发现，其位于tensorflow\python\ops\nn_ops.py中，实现功能的是def dropout_v2(x, rate, noise_shape=None, seed=None, name=None)函数。

对这个函数，tf源文件中有说明，其中比较重要的部分：

Computes dropout: randomly sets elements to zero to prevent overfitting.

Inputs elements are randomly set to zero (and the other elements are rescaled). This encourages each node to be independently useful, as it cannot rely on the output of other nodes.

More precisely: With probabilityrateelements ofxare set to0. The remaining elements are scaled up by1.0 / (1 - rate), so that the expected value is preserved.

这三句话，表达的意思主要是：

1.随机设置元素为0。

2.随机设置以保证节点独立性，节点互不依赖。

3.没有被设置为0的节点，将扩大为1.0 / (1 - rate)倍，以保证数学期望一致。

前两条最开始就有分析到，第三条需要注意一下，因为我在看有些博主自写的dropout源码中并没有考虑到这点。

具体实现代码核心部分（终于到如何实现了…）：

keep_prob = 1 - ratescale = 1 / keep_prob scale = ops.convert_to_tensor(scale, dtype=x_dtype)ret = gen_math_ops.mul(x, scale)noise_shape = _get_noise_shape(x, noise_shape)random_tensor = random_ops.random_uniform(noise_shape, seed=seed, dtype=x_dtype)# NOTE: if (1.0 + rate) - 1 is equal to rate, then that float is selected,# hence a >= comparison is used.keep_mask = random_tensor >= rateret = gen_math_ops.mul(ret, gen_math_ops.cast(keep_mask, x_dtype))return ret

通过观察，如果不使用noise_shape的话，先是使所有元素的值扩大1.0 / (1 - rate)倍。之后，tf是通过随机一个与x.shape相同的随机矩阵（值都在[0,1]之间），之后将其所有大于等于rate的值保留，其他抛弃，最后再将其与扩大倍数后的x(ret)相乘，即可完成随机rate比率元素为0.

最后，说明一下noise_shape的实现。关于noise_shape的作用，keras中文文档的介绍有点迷，tf源码中也有介绍，还有一个例子：

By default, each element is kept or dropped independently. Ifnoise_shapeis specified, it must be broadcastable to the shape ofx, and only dimensions withnoise_shape[i] ==shape(x)[i]will make independent decisions. This is useful for dropping whole channels from an image or sequence. For example:

tf.random.set_seed(0)
x = tf.ones([3,10])
tf.nn.dropout(x, rate = 2/3, noise_shape=[1,10], seed=1).numpy()
array([[0., 0., 0., 3., 3., 0., 3., 3., 3., 0.],
[0., 0., 0., 3., 3., 0., 3., 3., 3., 0.],
[0., 0., 0., 3., 3., 0., 3., 3., 3., 0.]], dtype=float32)

从描述和例子来看，noise_shape的作用是，在一个轴上，所有的随机为0的元素，位置都是一样的。

具体实现十分简单，上面已经讲过，tf是通过随机一个矩阵的方式来决定哪些位置为0.

若使用了noise_shape，则随机矩阵不按照x.shape,而是按照noise_shape。按照代码来看也是如此，不使用noise_shape时，noise_shape为x.shape.至于noise_shape = _get_noise_shape(x, noise_shape)，研究了之后发现，可能是为了防止BUG设置的（不是很确定），但我自己测试这个模块的作用时，_get_noise_shape()的返回值就是noise_shape。

例子中x.shape为[3,10],noise_shape为[1,10].不使用noise_shape时，random_tensor为[3,10],使用时为[1,10]. [3,10]的矩阵与[3,10]的矩阵相乘，则所有元素相当于都按rate比率随机取0。而[1,10]矩阵与[3,10]矩阵相乘，则轴1上元素取0位置都相同。也就像keras中文文档所说，掩层在所有时间步都相同。

以上，就是dropout层源码的解析，在真正使用时。我们只需要在一层后面加上Dropout层就行了。例如，在全连接网络后加入丢弃率为0.3的dropout层。

nn = Dense(128)(model_input)nn = Dropout(0.3)(nn)

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。