## 将 Gradio 用于表格数据科学工作流程

相关空间: https://huggingface.co/spaces/scikit-learn/gradio-skops-integration,https : //huggingface.co/spaces/scikit-learn/tabular-playground,https : //huggingface.co/spaces /merve/gradio-analysis-dashboard

## Using Gradio for Tabular Data Science Workflows

介绍

表格数据科学是机器学习中使用最广泛的领域,其问题范围从客户细分到流失预测。 在表格数据科学工作流程的各个阶段,将你的工作传达给利益相关者或客户可能很麻烦; 这使数据科学家无法专注于重要的事情,例如数据分析和模型构建。 数据科学家最终可能会花费数小时来构建一个仪表板,该仪表板接收数据框并返回图表,或者返回数据集中的预测或聚类图表。 在本指南中,我们将介绍如何使用 gradio 来改进你的数据科学工作流程。 我们还会讲到如何使用 gradioskops只用一行代码来构建界面!

Tabular data science is the most widely used domain of machine learning, with problems ranging from customer segmentation to churn prediction. Throughout various stages of the tabular data science workflow, communicating your work to stakeholders or clients can be cumbersome; which prevents data scientists from focusing on what matters, such as data analysis and model building. Data scientists can end up spending hours building a dashboard that takes in dataframe and returning plots, or returning a prediction or plot of clusters in a dataset. In this guide, we'll go through how to use gradio to improve your data science workflows. We will also talk about how to use gradio and skops to build interfaces with only one line of code!

###先决条件

### Prerequisites

确保你已经安装了gradio Python 包。

Make sure you have the gradio Python package already installed.

让我们创建一个简单的界面!

我们将了解如何创建一个简单的 UI,以根据产品信息预测故障。

We will take a look at how we can create a simple UI that predicts failures based on product information.

import gradio as gr
import pandas as pd
import joblib
import datasets

inputs = [gr.Dataframe(row_count = (2, "dynamic"), col_count=(4,"dynamic"), label="Input Data", interactive=1)]

outputs = [gr.Dataframe(row_count = (2, "dynamic"), col_count=(1, "fixed"), label="Predictions", headers=["Failures"])]

model = joblib.load("model.pkl")

# we will give our dataframe as example
df = datasets.load_dataset("merve/supersoaker-failures")
df = df["train"].to_pandas()

def infer(input_dataframe):
  return pd.DataFrame(model.predict(input_dataframe))

gr.Interface(fn = infer, inputs = inputs, outputs = outputs, examples = [[df.head(2)]]).launch()

让我们分解上面的代码。

Let's break down above code.

  • fn :采用输入数据帧并返回预测的推理函数。

    fn: the inference function that takes input dataframe and returns predictions.

  • inputs :我们接受输入的组件。 我们将输入定义为具有 2 行和 4 列的数据框,最初看起来像具有上述形状的空数据框。 当 row_count 设置为 dynamic 时,你不必依赖输入到预定义组件的数据集。

    inputs: the component we take our input with. We define our input as dataframe with 2 rows and 4 columns, which initially will look like an empty dataframe with the aforementioned shape. When the row_count is set to dynamic, you don't have to rely on the dataset you're inputting to pre-defined component.

  • outputs :存储输出的数据框组件。 这个 UI 可以取单个或多个样本进行推断,对于一列中的每个样本返回 0 或 1,所以我们在上面将 row_count 设为 2, col_count 设为 1。 headers 是由数据框的标题名称组成的列表。

    outputs: The dataframe component that stores outputs. This UI can take single or multiple samples to infer, and returns 0 or 1 for each sample in one column, so we give row_count as 2 and col_count as 1 above. headers is a list made of header names for dataframe.

  • examples :你可以通过拖放 CSV 文件来传递输入,也可以通过 examples 传递 pandas DataFrame,界面将自动获取哪些标头。

    examples: You can either pass the input by dragging and dropping a CSV file, or a pandas DataFrame through examples, which headers will be automatically taken by the interface.

我们现在将为最小数据可视化仪表板创建一个示例。 你可以在相关空间中找到更全面的版本。

We will now create an example for a minimal data visualization dashboard. You can find a more comprehensive version in the related Spaces.

import gradio as gr
import pandas as pd
import datasets
import seaborn as sns
import matplotlib.pyplot as plt

df = datasets.load_dataset("merve/supersoaker-failures")
df = df["train"].to_pandas()
df.dropna(axis=0, inplace=True)

def plot(df):
  plt.scatter(df.measurement_13, df.measurement_15, c = df.loading,alpha=0.5)
  plt.savefig("scatter.png")
  df['failure'].value_counts().plot(kind='bar')
  plt.savefig("bar.png")
  sns.heatmap(df.select_dtypes(include="number").corr())
  plt.savefig("corr.png")
  plots = ["corr.png","scatter.png", "bar.png"]
  return plots

inputs = [gr.Dataframe(label="Supersoaker Production Data")]
outputs = [gr.Gallery(label="Profiling Dashboard").style(grid=(1,3))]

gr.Interface(plot, inputs=inputs, outputs=outputs, examples=[df.head(100)], title="Supersoaker Failures Analysis Dashboard").launch()

我们将使用与训练模型相同的数据集,但这次我们将制作一个仪表板来可视化它。

We will use the same dataset we used to train our model, but we will make a dashboard to visualize it this time.

  • fn :将根据数据创建图的函数。

    fn: The function that will create plots based on data.

  • inputs :我们使用与上面相同的 Dataframe 组件。

    inputs: We use the same Dataframe component we used above.

  • outputsGallery 组件用于保存我们的可视化效果。

    outputs: The Gallery component is used to keep our visualizations.

  • examples :我们将以数据集本身作为示例。

    examples: We will have the dataset itself as the example.

使用 skops 一行代码轻松加载表格数据界面

skops 是一个建立在 huggingface_hubsklearn 之上的库。 借助最近对 skopsgradio 集成,你可以用一行代码构建表格数据界面!

skops is a library built on top of huggingface_hub and sklearn. With the recent gradio integration of skops, you can build tabular data interfaces with one line of code!

import gradio as gr

# title and description are optional
title = "Supersoaker Defective Product Prediction"
description = "This model predicts Supersoaker production line failures. Drag and drop any slice from dataset or edit values as you wish in below dataframe component."

gr.Interface.load("huggingface/scikit-learn/tabular-playground", title=title, description=description).launch()

使用 skops 推送到 Hugging Face Hub 的 sklearn 模型包括一个 config.json 文件,其中包含一个带有列名的示例输入,正在解决的任务(可以是 tabular-classificationtabular-regression )。 gradio 从任务类型构造 Interface 并使用列名和示例输入来构建它。 你可以参考有关在 Hub 上托管模型的 skops 文档,了解如何使用 skops 将你的模型推送到 Hub。

sklearn models pushed to Hugging Face Hub using skops include a config.json file that contains an example input with column names, the task being solved (that can either be tabular-classification or tabular-regression). From the task type, gradio constructs the Interface and consumes column names and the example input to build it. You can refer to skops documentation on hosting models on Hub to learn how to push your models to Hub using skops.