运行后台任务

相关空间: https://huggingface.co/spaces/freddyaboulton/gradio-google-forms 标签:TASKS, SCHEDULED, TABULAR, DATA

介绍

本指南解释了如何从 gradio 应用程序运行后台任务。 后台任务是你希望在应用的请求-响应生命周期之外执行一次或定期执行的操作。 后台任务的示例包括定期将数据同步到外部数据库或通过电子邮件发送模型预测报告。

This guide explains how you can run background tasks from your gradio app. Background tasks are operations that you'd like to perform outside the request-response lifecycle of your app either once or on a periodic schedule. Examples of background tasks include periodically synchronizing data to an external database or sending a report of model predictions via email.

概述

我们将创建一个简单的“Google 表单样式”应用程序来收集 gradio 库用户的反馈。 我们将使用本地 sqlite 数据库来存储我们的数据,但我们会定期将数据库的状态与HuggingFace 数据集同步,以便始终备份我们的用户评论。 同步将在每 60 秒运行一次的后台任务中发生。

We will be creating a simple "Google-forms-style" application to gather feedback from users of the gradio library. We will use a local sqlite database to store our data, but we will periodically synchronize the state of the database with a HuggingFace Dataset so that our user reviews are always backed up. The synchronization will happen in a background task running every 60 seconds.

在演示结束时,你将拥有一个像这样的完整工作的应用程序:

At the end of the demo, you'll have a fully working application like this one:

第 1 步 - 编写你的数据库逻辑 💾

我们的应用程序将存储审阅者的姓名、他们对 gradio 的评分(从 1 到 5),以及他们想要分享的关于图书馆的任何评论。 让我们编写一些代码来创建一个数据库表来存储这些数据。 我们还将编写一些函数以将评论插入该表并获取最新的 10 条评论。

Our application will store the name of the reviewer, their rating of gradio on a scale of 1 to 5, as well as any comments they want to share about the library. Let's write some code that creates a database table to store this data. We'll also write some functions to insert a review into that table and fetch the latest 10 reviews.

我们将使用 sqlite3 库连接到我们的 sqlite 数据库,但 gradio 可以与任何库一起使用。

We're going to use the sqlite3 library to connect to our sqlite database but gradio will work with any library.

代码将如下所示:

The code will look like this:

DB_FILE = "./reviews.db"
db = sqlite3.connect(DB_FILE)

# Create table if it doesn't already exist
try:
    db.execute("SELECT * FROM reviews").fetchall()
    db.close()
except sqlite3.OperationalError:
    db.execute(
        '''
        CREATE TABLE reviews (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
                              created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
                              name TEXT, review INTEGER, comments TEXT)
        ''')
    db.commit()
    db.close()

def get_latest_reviews(db: sqlite3.Connection):
    reviews = db.execute("SELECT * FROM reviews ORDER BY id DESC limit 10").fetchall()
    total_reviews = db.execute("Select COUNT(id) from reviews").fetchone()[0]
    reviews = pd.DataFrame(reviews, columns=["id", "date_created", "name", "review", "comments"])
    return reviews, total_reviews

def add_review(name: str, review: int, comments: str):
    db = sqlite3.connect(DB_FILE)
    cursor = db.cursor()
    cursor.execute("INSERT INTO reviews(name, review, comments) VALUES(?,?,?)", [name, review, comments])
    db.commit()
    reviews, total_reviews = get_latest_reviews(db)
    db.close()
    return reviews, total_reviews

让我们也写一个函数来在 gradio 应用程序加载时加载最新的评论:

Let's also write a function to load the latest reviews when the gradio application loads:

def load_data():
    db = sqlite3.connect(DB_FILE)
    reviews, total_reviews = get_latest_reviews(db)
    db.close()
    return reviews, total_reviews

第 2 步 - 创建一个渐变应用 ⚡

现在我们已经定义了数据库逻辑,我们可以使用 gradio 创建一个动态网页来征求用户的反馈!

Now that we have our database logic defined, we can use gradio create a dynamic web page to ask our users for feedback!

with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            name = gr.Textbox(label="Name", placeholder="What is your name?")
            review = gr.Radio(label="How satisfied are you with using gradio?", choices=[1, 2, 3, 4, 5])
            comments = gr.Textbox(label="Comments", lines=10, placeholder="Do you have any feedback on gradio?")
            submit = gr.Button(value="Submit Feedback")
        with gr.Column():
            data = gr.Dataframe(label="Most recently created 10 rows")
            count = gr.Number(label="Total number of reviews")
    submit.click(add_review, [name, review, comments], [data, count])
    demo.load(load_data, None, [data, count])

第 3 步 - 与 HuggingFace 数据集同步🤗

我们可以在第 2 步之后调用 demo.launch() 并拥有一个功能齐全的应用程序。 但是,我们的数据将本地存储在我们的机器上。 如果 sqlite 文件被意外删除,我们将失去所有评论! 让我们将数据备份到 HuggingFace hub 上的数据集。

We could call demo.launch() after step 2 and have a fully functioning application. However, our data would be stored locally on our machine. If the sqlite file were accidentally deleted, we'd lose all of our reviews! Let's back up our data to a dataset on the HuggingFace hub.

在继续之前在这里创建一个数据集。

Create a dataset here before proceeding.

现在在脚本的顶部,我们将使用huggingface hub 客户端库连接到我们的数据集并提取最新的备份。

Now at the top of our script, we'll use the huggingface hub client library to connect to our dataset and pull the latest backup.

TOKEN = os.environ.get('HUB_TOKEN')
repo = huggingface_hub.Repository(
    local_dir="data",
    repo_type="dataset",
    clone_from="",
    use_auth_token=TOKEN
)
repo.git_pull()

shutil.copyfile("./data/reviews.db", DB_FILE)

请注意,你必须从 HuggingFace 的“设置”选项卡获取访问令牌才能使上述代码正常工作。 在脚本中,令牌是通过环境变量安全访问的。

Note that you'll have to get an access token from the "Settings" tab of your HuggingFace for the above code to work. In the script, the token is securely accessed via an environment variable.

access_token

现在我们将创建一个后台任务,以每 60 秒将我们的本地数据库同步到数据集中心。 我们将使用AdvancedPythonScheduler来处理调度。 然而,这并不是唯一可用的任务调度库。 随意使用你喜欢的任何东西。

Now we will create a background task to synch our local database to the dataset hub every 60 seconds. We will use the AdvancedPythonScheduler to handle the scheduling. However, this is not the only task scheduling library available. Feel free to use whatever you are comfortable with.

备份数据的函数如下所示:

The function to back up our data will look like this:

from apscheduler.schedulers.background import BackgroundScheduler

def backup_db():
    shutil.copyfile(DB_FILE, "./data/reviews.db")
    db = sqlite3.connect(DB_FILE)
    reviews = db.execute("SELECT * FROM reviews").fetchall()
    pd.DataFrame(reviews).to_csv("./data/reviews.csv", index=False)
    print("updating db")
    repo.push_to_hub(blocking=False, commit_message=f"Updating data at {datetime.datetime.now()}")

scheduler = BackgroundScheduler()
scheduler.add_job(func=backup_db, trigger="interval", seconds=60)
scheduler.start()

第 4 步(奖励)- 部署到 HuggingFace 空间

你可以使用 HuggingFace Spaces平台免费部署此应用程序 ✨

You can use the HuggingFace Spaces platform to deploy this application for free ✨

如果你以前没有使用过 Spaces,请在此处按照之前的指南进行操作。 你将不得不使用 HUB_TOKEN 环境变量作为指南中的秘密。

If you haven't used Spaces before, follow the previous guide here. You will have to use the HUB_TOKEN environment variable as a secret in the Guides.

结论

恭喜! 你知道如何按计划从你的 gradio 应用运行后台任务 ⏲️。

Congratulations! You know how to run background tasks from your gradio app on a schedule ⏲️.

在此处检查在 Spaces 上运行的应用程序。 完整代码在这里

Checkout the application running on Spaces here. The complete code is here