by Thomas Simonini

通过托马斯·西蒙尼(Thomas Simonini)

人工智能如何学习生成猫的图片 (How AI can learn to generate pictures of cats)

In , the research paper Generative Adversarial Nets(GAN) by Goodfellow et al. was a breakthrough in the field of generative models.

,研究论文Generative Adversarial Nets(GAN)由Goodfellow等人撰写。 是生成模型领域的突破。

Leading researcher Yann Lecun himself called adversarial nets “the coolest idea in machine learning in the last twenty years.”

首席研究员Yann Lecun自己称对抗网络是“过去来机器学习中最酷的想法。”

Today, thanks to this architecture, we’re going to build an AI that generates realistic pictures of cats. How awesome is that?!

今天,由于有了这种架构,我们将构建一个可以生成逼真的猫照片的AI。 那太棒了!!

To view the full working code, see my Github repository. It will help if you already have some experience in Python, Deep Learning and Tensorflow, and CNNs (Convolutional Neural Nets).

要查看完整的工作代码,请参阅我的Github存储库 。 如果您已经对Python,深度学习和Tensorflow以及CNN(卷积神经网络)有一定的经验,它将对您有所帮助。

If you new in Deep Learning, please check this excellent series of articles:


Machine Learning is Fun!The world’s easiest introduction to Machine Learning

机器学习很有趣!全球最简单的Machine Learning简介

什么是DCGAN? (What is DCGAN?)

Deep Convolutional Generative Adverserial Networks (or DCGAN) are a deep learning architecture that generate outputs similar to the data in the training set.


This model replaces the fully connected layers of the generative adversarial network model with convolution layers.


To explain how DCGAN works, let’s use themetaphor of the art expert and the counterfeiter.


The counterfeiter (a.k.a. “the generator”) tries to produce fake Van Gogh paintings and pass them off as real.


On the other hand, the art expert (a.k.a., “the discriminator”) tries to catch the counterfeiter by using their knowledge of real Van Gogh paintings.


Over time, the art expert gets better at detecting counterfeit paintings, and the counterfeiter gets better at faking them.


As we see, DCGANs are composed of two separate deep neural networks competing against each other.


The generator is a counterfeiter trying to produce seemingly real data. It has no idea of what the real data is, but it learns to adjust from the feedback of the other model.生成器是一个伪造者,试图产生看似真实的数据。 它不知道实际数据是什么,但会从其他模型的反馈中学习进行调整。

The discriminatoris a inspector trying to determine what the fake counterfeit data is (by comparing it with real data), while trying to not raise false positives on the real data. The output results of this model will serve for the backpropagation of the generator.

鉴别器是一名检查员,试图确定伪造的伪造数据是什么(通过将其与真实数据进行比较),同时尝试不对真实数据造成误报。 该模型的输出结果将用于生成器的反向传播。

The generator takes a random noise vector and generates a picture.生成器获取随机噪声矢量并生成图片。 This picture is fed into the discriminator, which compares the training set against the generated image.该图片被馈送到鉴别器,鉴别器将训练集与生成的图像进行比较。 The discriminator returns a number between 0 (fake image) and 1 (real image).鉴别器返回介于0(伪图像)和1(真实图像)之间的数字。

让我们创建一个DCGAN! (Let’s create a DCGAN!)

Now, we’re ready to create our AI.


In this part, we will focus on the main elements of our model. If you want to check out the whole code, use the notebook here.

在这一部分中,我们将专注于模型的主要元素。 如果您想查看整个代码,请在此处使用笔记本。

输入项 (Inputs)

Here, we create the inputs placeholders: inputs_real for the discriminator and inputs_z for the generator.


Note that we use two learning rates, one for the generator and one for the discriminator.


DCGANs are very sensitive to hyperparameters, so it’s very important to tune them precisely.


鉴别器和生成器 (The discriminator and the generator)

We usetf.variable_scopefor two reasons.


First, we want to make sure that all variables names start with generator / discriminator. This will help out later when we train the two networks.

首先,我们要确保所有变量名称都以generator / discriminator开头。 这将在以后训练两个网络时提供帮助。

Second, we want to reuse these networks with different inputs:


For the generator: we’re going to train it, but also sample fake images from it after training.对于生成器:我们将对其进行训练,但还要在训练后从中采样伪造的图像。 For the discriminator: we need to share variables between the fake and real input images.对于鉴别器:我们需要在假输入图像和真实输入图像之间共享变量。

Now let’s create the discriminator. Remember, it takes as an input a real or fake imageand outputs a score.

现在让我们创建鉴别器。 请记住,它以真实或伪造图像作为输入并输出分数。

Some technical remarks:


The principle is todouble the filter size at each convolution layer.


It’s not recommended to use downsampling. Instead, we use only strided convolutional layers.不建议使用下采样。 相反,我们仅使用跨步卷积层。

We use batch normalization at each layer (except for the input layer), because it reduces the covariance shift. For more information, check this great article.

我们在每一层(输入层除外)使用批处理归一化,因为它可以减少协方差漂移。 有关更多信息,请查看这篇出色的文章 。

We utilize Leaky ReLU as an activation function, because it helps to avoid the vanishing gradient effect.我们利用Leaky ReLU作为激活功能,因为它有助于避免消失的梯度效应。

Then, we create the generator. Remember, it takes as an input a random noise vector (z) and outputsa fake image, thanks to transposed convolution layers.

然后,我们创建生成器。 请记住,它将随机噪声矢量(z)作为输入并输出伪造的图像,这要归功于转置的卷积层。

The idea is that at each layer we halve the filter size, and double the size of the picture.


The generator has been found to perform best using tanh as the output activation function.


鉴别器和发电机损耗 (Discriminator and generator losses)

Because we train the generator and discriminator at the same time, we need to calculate losses forbothnetworks.


We want the discriminator to output 1 when it “thinks” an image is real, and 0 for fake images. Therefore, we need to set up the losses to reflect that.

我们希望鉴别器在“认为”图像是真实图像时输出1,对于伪图像则输出0。 因此,我们需要设置损失以反映这一点。

The discriminator loss is the sum of loss for real and fake images:


d_loss = d_loss_real + d_loss_fake

d_loss_realis the loss when the discriminator predicts an image is fake, when in fact it was a real image. It is calculated as follows:

d_loss_real是辨别器预测图像是假的(实际上是真实图像)时的损失。 计算公式如下:

Used_logits_realand labelsare all 1 (since all real data is real)


labels = tf.ones_like(tensor) * (1 - smooth)We use label smoothing: it means reducing the labels a bit from 1.0 to 0.9in order to help the discriminator generalize better.

labels = tf.ones_like(tensor) * (1 - smooth)我们使用标签平滑:这意味着将标签从1.0减少到0.9为了帮助鉴别者更好地概括。

d_loss_fakeis the loss when the discriminator predict an image is real, when in fact is was a fake image.


Used_logits_fakeand labelsare all 0.


The generator loss again uses thed_logits_fakefrom the discriminator. This time the labels are all 1, because the generator wants to fool the discriminator.

生成器损耗再次使用鉴别器的d_logits_fake。 这次标签全为1,因为生成器要欺骗鉴别器。

优化器 (Optimizers)

After calculating the losses, we need to update the generator and discriminator separately.


To do this, we need to get the variables for each part by usingtf.trainable_variables(). This creates a list of all the variables we’ve defined in our graph.

为此,我们需要使用tf.trainable_variables()获得每个零件的变量。 这将创建我们在图形中定义的所有变量的列表。

训练 (Training)

Here, we’re implementing the training function.


The idea is relatively simple:


We’re saving the model each five epochs.我们每五个时期保存一次模型。 We’re saving a picture in images folder each ten batches trained.每训练十批,我们就会将图片保存在images文件夹中。

We’re displaying theg_loss , d_lossand the image generated each 15 epochs. This is for a simple reason: Jupyter notebook can bug if too many pictures are displayed.

我们正在显示g_loss , d_loss和每15个时代生成的图像。 原因很简单:如果显示过多图片,Jupyter笔记本可能会出错。

Or, we can directly generate real images by loading the saved model (this will save you 20 hours of training).或者,我们可以通过加载保存的模型直接生成真实图像(这将为您节省20个小时的培训)。

如何运行 (How to run it)

You can’t run this on your personal computer — unless you have your own GPUs or are ready to wait maybe 10 years!


Instead, you must use cloud GPU services, such as AWS or FloydHub.


Personally, I trained this DCGAN for 20 hours with Microsoft Azure and their Deep Learning Virtual Machine.

我个人使用Microsoft Azure及其深度学习虚拟机对该DCGAN进行了20小时的培训。

Disclaimer: I don’t have any business relations with Azure. I just loved their excellent customer service!

免责声明:我与Azure没有任何业务关系。 我只是喜欢他们的优质客户服务!

If you have trouble running it on a virtual machine, follow this excellent article here.


That’s all, I hope that this tutorial has been helpful!


If you’ve improved the model, don’t hesitate to make a pull request.


