用 GAN 创建你自己的朋友

贡献者

介绍

似乎最近加密货币、 NFT和 web3 运动都风靡一时！数字资产以惊人的价格在市场上上市，几乎每个名人都在推出他们自己的 NFT 系列。虽然你的加密资产可能需要纳税，例如在加拿大，但今天我们将探索一些有趣且免税的方式来生成你自己的程序生成CryptoPunks分类。

It seems that cryptocurrencies, NFTs, and the web3 movement are all the rage these days! Digital assets are being listed on marketplaces for astounding amounts of money, and just about every celebrity is debuting their own NFT collection. While your crypto assets may be taxable, such as in Canada, today we'll explore some fun and tax-free ways to generate your own assortment of procedurally generated CryptoPunks.

生成对抗网络，通常称为GAN ，是一类特定的深度学习模型，旨在从输入数据集中学习以创建（生成！ ）与原始训练集的元素非常相似的新材料。著名的是， thispersondoesnotexist.com网站因使用名为 StyleGAN2 的模型生成的逼真合成图像而走红。 GAN 在机器学习领域获得了关注，现在被用于生成各种图像、文本，甚至音乐！

Generative Adversarial Networks, often known just as GANs, are a specific class of deep-learning models that are designed to learn from an input dataset to create (generate!) new material that is convincingly similar to elements of the original training set. Famously, the website thispersondoesnotexist.com went viral with lifelike, yet synthetic, images of people generated with a model called StyleGAN2. GANs have gained traction in the machine learning world, and are now being used to generate all sorts of images, text, and even music!

今天我们将简要介绍一下 GAN 背后的高级直觉，然后我们将围绕预训练的 GAN 构建一个小型演示，看看有什么大惊小怪的。下面是我们将要放在一起的内容：

Today we'll briefly look at the high-level intuition behind GANs, and then we'll build a small demo around a pre-trained GAN to see what all the fuss is about. Here's a peek at what we're going to be putting together:

先决条件

Prerequisites

确保你已经安装了gradio Python 包。要使用预训练模型，还要安装 torch 和 torchvision 。

Make sure you have the gradio Python package already installed. To use the pretrained model, also install torch and torchvision.

GAN：一个非常简短的介绍

最初在Goodfellow 等人中提出。 2014 年，GAN 由神经网络组成，这些神经网络相互竞争，意图超越对方。一个称为生成器的网络负责生成图像。另一个网络，即鉴别器，一次从生成器接收一张图像以及来自训练数据集的真实图像。然后鉴别器必须猜测：哪个图像是假的？

Originally proposed in Goodfellow et al. 2014, GANs are made up of neural networks which compete with the intention of outsmarting each other. One network, known as the generator, is responsible for generating images. The other network, the discriminator, receives an image at a time from the generator along with a real image from the training data set. The discriminator then has to guess: which image is the fake?

生成器不断训练以创建鉴别器更难识别的图像，而鉴别器每次正确检测到假货时都会提高生成器的门槛。随着网络参与这种竞争（对抗！ ）关系，生成的图像会改进到人眼无法区分的程度！

The generator is constantly training to create images which are trickier for the discriminator to identify, while the discriminator raises the bar for the generator every time it correctly detects a fake. As the networks engage in this competitive (adversarial!) relationship, the images that get generated improve to the point where they become indistinguishable to human eyes!

要更深入地了解 GAN，你可以查看这篇关于 Analytics Vidhya 的优秀文章或这篇PyTorch 教程。不过现在，我们将深入演示！

For a more in-depth look at GANs, you can take a look at this excellent post on Analytics Vidhya or this PyTorch tutorial. For now, though, we'll dive into a demo!

第 1 步 — 创建发电机模型

要使用 GAN 生成新图像，你只需要生成器模型。生成器可以使用许多不同的架构，但对于此演示，我们将使用具有以下架构的预训练 GAN 生成器模型：

To generate new images with a GAN, you only need the generator model. There are many different architectures that the generator could use, but for this demo we'll use a pretrained GAN generator model with the following architecture:

from torch import nn

class Generator(nn.Module):
    # Refer to the link below for explanations about nc, nz, and ngf
    # https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html#inputs
    def __init__(self, nc=4, nz=100, ngf=64):
        super(Generator, self).__init__()
        self.network = nn.Sequential(
            nn.ConvTranspose2d(nz, ngf * 4, 3, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 3, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 0, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh(),
        )

    def forward(self, input):
        output = self.network(input)
        return output

我们从@teddykoker 的这个 repo中获取生成器，你还可以在其中看到原始的鉴别器模型结构。

We're taking the generator from this repo by @teddykoker, where you can also see the original discriminator model structure.

实例化模型后，我们将从存储在nateraw/cryptopunks-gan的 Hugging Face Hub 加载权重：

After instantiating the model, we'll load in the weights from the Hugging Face Hub, stored at nateraw/cryptopunks-gan:

from huggingface_hub import hf_hub_download
import torch

model = Generator()
weights_path = hf_hub_download('nateraw/cryptopunks-gan', 'generator.pth')
model.load_state_dict(torch.load(weights_path, map_location=torch.device('cpu'))) # Use 'cuda' if you have a GPU available

第 2 步 — 定义 `predict` 函数

predict 功能是让 Gradio 工作的关键！我们通过 Gradio 界面选择的任何输入都将通过我们的 predict 函数传递，该函数应该对输入进行操作并生成我们可以使用 Gradio 输出组件显示的输出。对于 GAN，通常将随机噪声作为输入传递到我们的模型中，因此我们将生成一个随机数张量并将其传递给模型。然后我们可以使用 torchvision 的 save_image 函数将模型的输出保存为 png 文件，并返回文件名：

The predict function is the key to making Gradio work! Whatever inputs we choose through the Gradio interface will get passed through our predict function, which should operate on the inputs and generate outputs that we can display with Gradio output components. For GANs it's common to pass random noise into our model as the input, so we'll generate a tensor of random numbers and pass that through the model. We can then use torchvision's save_image function to save the output of the model as a png file, and return the file name:

from torchvision.utils import save_image

def predict(seed):
    num_punks = 4
    torch.manual_seed(seed)
    z = torch.randn(num_punks, 100, 1, 1)
    punks = model(z)
    save_image(punks, "punks.png", normalize=True)
    return 'punks.png'

我们给我们的 predict 函数一个 seed 参数，这样我们就可以用一个种子来修复随机张量的生成。如果我们想通过传递相同的种子再次看到他们，我们将能够重现朋克。

We're giving our predict function a seed parameter, so that we can fix the random tensor generation with a seed. We'll then be able to reproduce punks if we want to see them again by passing in the same seed.

笔记！ 我们的模型需要一个尺寸为 100x1x1 的输入张量来进行单个推理，或者 (BatchSize)x100x1x1 来生成一批图像。在这个演示中，我们将从一次生成 4 个朋克开始。

Note! Our model needs an input tensor of dimensions 100x1x1 to do a single inference, or (BatchSize)x100x1x1 for generating a batch of images. In this demo we'll start by generating 4 punks at a time.

第 3 步 — 创建一个 Gradio 界面

此时，你甚至可以运行带有 predict(<SOME_NUMBER>) 代码，你会在文件系统中的 ./punks.png 找到新生成的 punks。不过，为了制作真正的交互式演示，我们将使用 Gradio 构建一个简单的界面。我们的目标是：

At this point you can even run the code you have with predict(<SOME_NUMBER>), and you'll find your freshly generated punks in your file system at ./punks.png. To make a truly interactive demo, though, we'll build out a simple interface with Gradio. Our goals here are to:

设置滑块输入，以便用户可以选择“种子”值
Set a slider input so users can choose the "seed" value
在我们的输出中使用图像组件来展示生成的朋克
Use an image component for our output to showcase the generated punks
使用我们的 predict() 获取种子并生成图像
Use our predict() to take the seed and generate the images

使用 gr.Interface() ，我们可以通过单个函数调用定义所有这些：

With gr.Interface(), we can define all of that with a single function call:

import gradio as gr

gr.Interface(
    predict,
    inputs=[
        gr.Slider(0, 1000, label='Seed', default=42),
    ],
    outputs="image",
).launch()

启动界面应该会向你展示如下内容：

Launching the interface should present you with something like this:

第 4 步 — 更多朋克！

一次生成 4 个朋克是一个好的开始，但也许我们想要控制每次生成的数量。向我们的 Gradio 界面添加更多输入就像向我们传递给 gr.Interface inputs 列表中添加另一个项目一样简单：

Generating 4 punks at a time is a good start, but maybe we'd like to control how many we want to make each time. Adding more inputs to our Gradio interface is as simple as adding another item to the inputs list that we pass to gr.Interface:

gr.Interface(
    predict,
    inputs=[
        gr.Slider(0, 1000, label='Seed', default=42),
        gr.Slider(4, 64, label='Number of Punks', step=1, default=10), # Adding another slider!
    ],
    outputs="image",
).launch()

新输入将传递给我们的 predict() 函数，因此我们必须对该函数进行一些更改以接受新参数：

The new input will be passed to our predict() function, so we have to make some changes to that function to accept a new parameter:

def predict(seed, num_punks):
    torch.manual_seed(seed)
    z = torch.randn(num_punks, 100, 1, 1)
    punks = model(z)
    save_image(punks, "punks.png", normalize=True)
    return 'punks.png'

当你重新启动你的界面时，你应该会看到第二个滑块，它可以让你控制朋克的数量！

When you relaunch your interface, you should see a second slider that'll let you control the number of punks!

第 5 步 - 抛光

你的 Gradio 应用程序非常好用，但你可以添加一些额外的东西让它真正为聚光灯做好准备 ✨

Your Gradio app is pretty much good to go, but you can add a few extra things to really make it ready for the spotlight ✨

我们可以添加一些示例，用户可以通过将其添加到 gr.Interface 来轻松试用：

We can add some examples that users can easily try out by adding this to the gr.Interface:

gr.Interface(
    # ...
    # keep everything as it is, and then add
    examples=[[123, 15], [42, 29], [456, 8], [1337, 35]],
).launch(cache_examples=True) # cache_examples is optional

examples 参数采用列表列表，其中子列表中的每个项目都按照我们列出 inputs 相同顺序排序。所以在我们的例子中， [seed, num_punks] 。试一试！

The examples parameter takes a list of lists, where each item in the sublists is ordered in the same order that we've listed the inputs. So in our case, [seed, num_punks]. Give it a try!

你还可以尝试向 gr.Interface 添加 title 、 description 和 article 。这些参数中的每一个都接受一个字符串，所以尝试一下，看看会发生什么👀 article 也将接受 HTML，正如之前指南中探讨的那样！

You can also try adding a title, description, and article to the gr.Interface. Each of those parameters accepts a string, so try it out and see what happens 👀 article will also accept HTML, as explored in a previous guide!

当你全部完成后，你可能会得到这样的结果：

When you're all done, you may end up with something like this:

作为参考，这是我们的完整代码：

For reference, here is our full code:

import torch
from torch import nn
from huggingface_hub import hf_hub_download
from torchvision.utils import save_image
import gradio as gr

class Generator(nn.Module):
    # Refer to the link below for explanations about nc, nz, and ngf
    # https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html#inputs
    def __init__(self, nc=4, nz=100, ngf=64):
        super(Generator, self).__init__()
        self.network = nn.Sequential(
            nn.ConvTranspose2d(nz, ngf * 4, 3, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 3, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 0, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh(),
        )

    def forward(self, input):
        output = self.network(input)
        return output

model = Generator()
weights_path = hf_hub_download('nateraw/cryptopunks-gan', 'generator.pth')
model.load_state_dict(torch.load(weights_path, map_location=torch.device('cpu'))) # Use 'cuda' if you have a GPU available

def predict(seed, num_punks):
    torch.manual_seed(seed)
    z = torch.randn(num_punks, 100, 1, 1)
    punks = model(z)
    save_image(punks, "punks.png", normalize=True)
    return 'punks.png'

gr.Interface(
    predict,
    inputs=[
        gr.Slider(0, 1000, label='Seed', default=42),
        gr.Slider(4, 64, label='Number of Punks', step=1, default=10),
    ],
    outputs="image",
    examples=[[123, 15], [42, 29], [456, 8], [1337, 35]],
).launch(cache_examples=True)

恭喜！你已经构建了你自己的 GAN 驱动的 CryptoPunks 生成器，带有精美的 Gradio 界面，任何人都可以轻松使用。现在你可以在 Hub 中搜索更多 GAN （或训练你自己的 GAN）并继续制作更棒的演示🤗

Congratulations! You've built out your very own GAN-powered CryptoPunks generator, with a fancy Gradio interface that makes it easy for anyone to use. Now you can scour the Hub for more GANs (or train your own) and continue making even more awesome demos 🤗

< 上一个

Building A Pictionary App

下一个 >

Creating A Chatbot