使用 Vision Transformers 进行图像分类

相关空间： https://huggingface.co/spaces/abidlabs/vision-transformer 标签：VISION, TRANSFORMERS, HUB

介绍

图像分类是计算机视觉的核心任务。构建更好的分类器来对图片中存在的对象进行分类是一个活跃的研究领域，因为它的应用范围从面部识别到制造质量控制。

Image classification is a central task in computer vision. Building better classifiers to classify what object is present in a picture is an active area of research, as it has applications stretching from facial recognition to manufacturing quality control.

最先进的图像分类器基于最初为 NLP 任务普及的transformers架构。这种架构通常称为视觉转换器 (ViT)。此类模型非常适合与 Gradio 的图像输入组件一起使用，因此在本教程中，我们将构建一个网络演示来使用 Gradio 对图像进行分类。我们将能够在一行 Python中构建整个 Web 应用程序，它看起来像这样（尝试其中一个示例！）：

State-of-the-art image classifiers are based on the transformers architectures, originally popularized for NLP tasks. Such architectures are typically called vision transformers (ViT). Such models are perfect to use with Gradio's image input component, so in this tutorial we will build a web demo to classify images using Gradio. We will be able to build the whole web application in a single line of Python, and it will look like this (try one of the examples!):

让我们开始吧！

Let's get started!

先决条件

Prerequisites

确保你已经安装了gradio Python 包。

Make sure you have the gradio Python package already installed.

第一步——选择视觉图像分类模型

首先，我们需要一个图像分类模型。在本教程中，我们将使用来自Hugging Face Model Hub的模型。该中心包含数千个模型，涵盖数十种不同的机器学习任务。

First, we will need an image classification model. For this tutorial, we will use a model from the Hugging Face Model Hub. The Hub contains thousands of models covering dozens of different machine learning tasks.

展开左侧栏中的“任务”类别，然后选择“图像分类”作为我们感兴趣的任务。然后，你将在 Hub 上看到所有旨在对图像进行分类的模型。

Expand the Tasks category on the left sidebar and select "Image Classification" as our task of interest. You will then see all of the models on the Hub that are designed to classify images.

在撰写本文时，最受欢迎的是 google/vit-base-patch16-224 ，它已在分辨率为 224x224 像素的 ImageNet 图像上进行了训练。我们将使用此模型进行演示。

At the time of writing, the most popular one is google/vit-base-patch16-224, which has been trained on ImageNet images at a resolution of 224x224 pixels. We will use this model for our demo.

第 2 步 — 使用 Gradio 加载 Vision Transformer 模型

使用来自 Hugging Face Hub 的模型时，我们不需要为演示定义输入或输出组件。同样，我们不需要关心预处理或后处理的细节。所有这些都是从模型标签中自动推断出来的。

When using a model from the Hugging Face Hub, we do not need to define the input or output components for the demo. Similarly, we do not need to be concerned with the details of preprocessing or postprocessing. All of these are automatically inferred from the model tags.

除了 import 语句外，只需一行 Python 代码即可加载和启动演示。

Besides the import statement, it only takes a single line of Python to load and launch the demo.

我们使用 gr.Interface.load() 方法并传入模型路径，包括 huggingface/ 以指定它来自 Hugging Face Hub。

We use the gr.Interface.load() method and pass in the path to the model including the huggingface/ to designate that it is from the Hugging Face Hub.

import gradio as gr

gr.Interface.load(
             "huggingface/google/vit-base-patch16-224",
             examples=["alligator.jpg", "laptop.jpg"]).launch()

请注意，我们又添加了一个参数 examples ，它允许我们使用一些预定义的示例预填充我们的界面。

Notice that we have added one more parameter, the examples, which allows us to prepopulate our interfaces with a few predefined examples.

这会产生以下界面，你可以在浏览器中尝试。当你输入图像时，它会自动进行预处理并发送到 Hugging Face Hub API，在那里它通过模型并作为人类可解释的预测返回。尝试上传你自己的图片！

This produces the following interface, which you can try right here in your browser. When you input an image, it is automatically preprocessed and sent to the Hugging Face Hub API, where it is passed through the model and returned as a human-interpretable prediction. Try uploading your own image!

你完成了！在一行代码中，你已经为图像分类器构建了一个网络演示。如果你想与其他人分享，请尝试在 launch() 界面时设置 share=True ！

And you're done! In one line of code, you have built a web demo for an image classifier. If you'd like to share with others, try setting share=True when you launch() the Interface!

< 上一个

Image Classification In Tensorflow

下一个 >

Connecting To A Database