Gradio & LLM 代理 🤝

大型语言模型 (LLM) 非常令人印象深刻,但如果我们能够赋予它们完成专门任务的技能,它们可以变得更加强大。

Large Language Models (LLMs) are very impressive but they can be made even more powerful if we could give them skills to accomplish specialized tasks.

gradio_tools库可以将任何Gradio应用程序变成代理可以用来完成其任务的工具。 例如,LLM 可以使用 Gradio 工具转录它在网上找到的录音,然后为你总结。 或者它可以使用不同的 Gradio 工具将 OCR 应用于你 Google Drive 上的文档,然后回答有关它的问题。

The gradio_tools library can turn any Gradio application into a tool that an agent can use to complete its task. For example, an LLM could use a Gradio tool to transcribe a voice recording it finds online and then summarize it for you. Or it could use a different Gradio tool to apply OCR to a document on your Google Drive and then answer questions about it.

本指南将展示如何使用 gradio_tools 授予你的 LLM Agent 访问世界上托管的最先进的 Gradio 应用程序的权限。 尽管 gradio_tools 与不止一种代理框架兼容,但在本指南中我们将重点介绍Langchain Agents

This guide will show how you can use gradio_tools to grant your LLM Agent access to the cutting edge Gradio applications hosted in the world. Although gradio_tools are compatible with more than one agent framework, we will focus on Langchain Agents in this guide.

一些背景

什么是代理?

What are agents?

LangChain 代理是一种大型语言模型 (LLM),它接受用户输入并根据使用的众多工具之一报告输出。

A LangChain agent is a Large Language Model (LLM) that takes user input and reports an output based on using one of many tools at its disposal.

什么是收音机?

What is Gradio?

Gradio是构建机器学习 Web 应用程序并与世界共享它们的事实上的标准框架——所有这些都只需要 python! 🐍

Gradio is the defacto standard framework for building Machine Learning Web Applications and sharing them with the world - all with just python! 🐍

gradio_tools - 一个端到端的例子

要开始使用 gradio_tools ,你需要做的就是导入和初始化你的工具并将它们传递给 langchain 代理!

To get started with gradio_tools, all you need to do is import and initialize your tools and pass them to the langchain agent!

在下面的示例中,我们导入了 StableDiffusionPromptGeneratorTool 来为稳定扩散创建良好的提示,导入 StableDiffusionTool 来使用我们改进的提示创建图像,导入 ImageCaptioningTool 来为生成的图像添加字幕,导入 TextToVideoTool 来根据提示创建视频。

In the following example, we import the StableDiffusionPromptGeneratorTool to create a good prompt for stable diffusion, the StableDiffusionTool to create an image with our improved prompt, the ImageCaptioningTool to caption the generated image, and the TextToVideoTool to create a video from a prompt.

然后我们告诉我们的代理创建一个狗骑滑板的图像,但请提前改进我们的提示。 我们还要求它为生成的图像添加标题并为其创建视频。 代理可以决定使用哪个工具,而无需我们明确告诉它。

We then tell our agent to create an image of a dog riding a skateboard, but to please improve our prompt ahead of time. We also ask it to caption the generated image and create a video for it. The agent can decide which tool to use without us explicitly telling it.

import os

if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY must be set")

from langchain.agents import initialize_agent
from langchain.llms import OpenAI
from gradio_tools import (StableDiffusionTool, ImageCaptioningTool, StableDiffusionPromptGeneratorTool,
                          TextToVideoTool)

from langchain.memory import ConversationBufferMemory

llm = OpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history")
tools = [StableDiffusionTool().langchain, ImageCaptioningTool().langchain,
         StableDiffusionPromptGeneratorTool().langchain, TextToVideoTool().langchain]

agent = initialize_agent(tools, llm, memory=memory, agent="conversational-react-description", verbose=True)
output = agent.run(input=("Please create a photo of a dog riding a skateboard "
                          "but improve my prompt prior to using an image generator."
                          "Please caption the generated image and create a video for it using the improved prompt."))

你会注意到我们正在使用 gradio_tools 附带的一些预构建工具。 请参阅此文档以获取 gradio_tools 附带的工具的完整列表。 如果你想使用当前不在 gradio_tools 中的工具,添加你自己的工具非常容易。 这就是下一节将要介绍的内容。

You'll note that we are using some pre-built tools that come with gradio_tools. Please see this doc for a complete list of the tools that come with gradio_tools. If you would like to use a tool that's not currently in gradio_tools, it is very easy to add your own. That's what the next section will cover.

gradio_tools - 创建你自己的工具

核心抽象是 GradioTool ,只要你实现标准界面,它就可以让你为你的 LLM 定义一个新工具:

The core abstraction is the GradioTool, which lets you define a new tool for your LLM as long as you implement a standard interface:

class GradioTool(BaseTool):

    def __init__(self, name: str, description: str, src: str) -> None:

    @abstractmethod
    def create_job(self, query: str) -> Job:
        pass

    @abstractmethod
    def postprocess(self, output: Tuple[Any] | Any) -> str:
        pass

要求是: 1. 工具名称 2. 工具描述。 这很关键! 代理根据他们的描述决定使用哪种工具。 精确并确保包括工具的输入和输出应该是什么样子的示例。 3. Gradio 应用程序的 url 或空间 id,例如 freddyaboulton/calculator 。 基于这个值, gradio_tool 将创建一个gradio 客户端实例,通过 API 查询上游应用程序。 如果你不熟悉 gradio 客户端库,请务必点击链接了解更多信息。 4. create_job - 给定一个字符串,此方法应解析该字符串并从客户端返回一个作业。 大多数时候,这就像将字符串传递给客户端的 submit 函数一样简单。 有关在此处创建作业的更多信息 5. 后处理 - 给定作业结果,将其转换为 LLM 可以显示给用户的字符串。 6.可选——一些库,例如MiniChain ,可能需要一些关于该工具使用的底层 gradio 输入和输出类型的信息。 默认情况下,这将返回 gr.Textbox() 但如果你想提供更准确的信息,请实施该工具的 _block_input(self, gr)_block_output(self, gr) 方法。 gr 变量是 gradio 模块( import gradio as gr 的结果)。 它将由 GradiTool 父类自动导入并传递给 _block_input_block_output 方法。

The requirements are: 1. The name for your tool 2. The description for your tool. This is crucial! Agents decide which tool to use based on their description. Be precise and be sure to include example of what the input and the output of the tool should look like. 3. The url or space id, e.g. freddyaboulton/calculator, of the Gradio application. Based on this value, gradio_tool will create a gradio client instance to query the upstream application via API. Be sure to click the link and learn more about the gradio client library if you are not familiar with it. 4. create_job - Given a string, this method should parse that string and return a job from the client. Most times, this is as simple as passing the string to the submit function of the client. More info on creating jobs here 5. postprocess - Given the result of the job, convert it to a string the LLM can display to the user. 6. Optional - Some libraries, e.g. MiniChain, may need some info about the underlying gradio input and output types used by the tool. By default, this will return gr.Textbox() but if you'd like to provide more accurate info, implement the _block_input(self, gr) and _block_output(self, gr) methods of the tool. The gr variable is the gradio module (the result of import gradio as gr). It will be automatically imported by the GradiTool parent class and passed to the _block_input and _block_output methods.

就是这样!

And that's it!

创建工具后,打开对 gradio_tools 存储库的拉取请求! 我们欢迎所有贡献。

Once you have created your tool, open a pull request to the gradio_tools repo! We welcome all contributions.

示例工具 - 稳定扩散

以下是 StableDiffusion 工具的代码示例:

Here is the code for the StableDiffusion tool as an example:

from gradio_tool import GradioTool
import os

class StableDiffusionTool(GradioTool):
    """Tool for calling stable diffusion from llm"""

    def __init__(
        self,
        name="StableDiffusion",
        description=(
            "An image generator. Use this to generate images based on "
            "text input. Input should be a description of what the image should "
            "look like. The output will be a path to an image file."
        ),
        src="gradio-client-demos/stable-diffusion",
        hf_token=None,
    ) -> None:
        super().__init__(name, description, src, hf_token)

    def create_job(self, query: str) -> Job:
        return self.client.submit(query, "", 9, fn_index=1)

    def postprocess(self, output: str) -> str:
        return [os.path.join(output, i) for i in os.listdir(output) if not i.endswith("json")][0]

    def _block_input(self, gr) -> "gr.components.Component":
        return gr.Textbox()

    def _block_output(self, gr) -> "gr.components.Component":
        return gr.Image()

关于这个实现的一些注意事项: 1. GradioTool 的所有实例都有一个名为 client 属性,它指向底层的gradio client 。 这就是你应该在 create_job 方法中使用的内容。 2. create_job 只是将 query string 传递给客户端的 submit 函数,并带有一些硬编码的其他参数,即 negative prompt string 和 guidance scale。 我们可以修改我们的工具以在后续版本中也接受来自输入字符串的这些值。 3. postprocess 方法简单地返回稳定扩散空间创建的图像库中的第一张图像。 我们使用 os 模块来获取图像的完整路径。

Some notes on this implementation: 1. All instances of GradioTool have an attribute called client that is a pointed to the underlying gradio client. That is what you should use in the create_job method. 2. create_job just passes the query string to the submit function of the client with some other parameters hardcoded, i.e. the negative prompt string and the guidance scale. We could modify our tool to also accept these values from the input string in a subsequent version. 3. The postprocess method simply returns the first image from the gallery of images created by the stable diffusion space. We use the os module to get the full path of the image.

结论

你现在知道如何使用 1000 多个在野外运行的 gradio 空间来扩展你的 LLM 的能力! 同样,我们欢迎对gradio_tools库的任何贡献。 我们很高兴看到大家构建的工具!

You now know how to extend the abilities of your LLM with the 1000s of gradio spaces running in the wild! Again, we welcome any contributions to the gradio_tools library. We're excited to see the tools you all build!