欢迎回来

登录 EAKE AI,继续您的智能之旅

忘记密码?
还没有账号?立即注册

Outlines 结构化生成

2026-05-22 · Skills中心

Outlines 结构化生成

Outlines:结构化 JSON/正则/Pydantic LLM 输出生成

Outlines 结构化文本生成

使用场景

需要以下场景时使用 Outlines:

  • 保证生成 JSON/XML/代码的有效性,而非生成后解析
  • 使用 Pydantic 模型获得类型安全的输出
  • 支持本地模型(Transformers、llama.cpp、vLLM)
  • 最大化推理速度,零开销的结构化生成
  • 基于 JSON Schema 自动生成有效输出
  • 在语法级别控制 token 采样

GitHub Stars: 8,000+ | 来自: dottxt.ai(前身 .txt)

安装


# 基础安装
pip install outlines

# 带特定后端
pip install outlines transformers  # Hugging Face 模型
pip install outlines llama-cpp-python  # llama.cpp
pip install outlines vllm  # vLLM 高吞吐

快速入门

基础示例:分类


import outlines
from typing import Literal

# 加载模型
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 带类型约束生成
prompt = "评论 '这个产品太棒了!' 的情感: "
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])
sentiment = generator(prompt)

print(sentiment)  # "positive"(保证是其中之一)

使用 Pydantic 模型


from pydantic import BaseModel
import outlines

class User(BaseModel):
    name: str
    age: int
    email: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 基于 Pydantic 模型生成
generator = outlines.generate.json(model, User)
result = generator("创建一个名为 Alice 的用户,30 岁,邮箱 alice@example.com。")
print(result)
# {"name": "Alice", "age": 30, "email": "alice@example.com"}

使用 JSON Schema


schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "rating": {"type": "number", "minimum": 1, "maximum": 5},
        "review": {"type": "string"}
    },
    "required": ["title", "rating"]
}

generator = outlines.generate.json(model, schema)
result = generator("Write a book review for 'The Great Gatsby'.")
print(result)

使用正则表达式


import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 严格匹配正则格式
phone_gen = outlines.generate.regex(model, r'\+?1?\d{9,15}')
phone = phone_gen("Get me a phone number.")
print(phone)  # "+1234567890"

核心概念

生成器类型

生成器用途示例
jsonJSON 对象Pydantic、JSON Schema
regex正则匹配电话号码、日期
choice选项分类、评分

模型后端


# Transformers(本地)
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# llama.cpp
model = outlines.models.llamacpp("path/to/model.gguf")

# vLLM
model = outlines.models.vllm("meta-llama/Llama-3-8B-Instruct")

# OpenAI
model = outlines.models.openai("gpt-4")

词汇引导(Token 目视)

Outlines 通过在生成时限制 token 采样词汇(而非生成后校验),实现零开销结构化生成。

  • 不浪费样本在无效 token 上
  • 保证输出格式正确
  • 比 ReAct/自研解析器更快

进阶用法

带验证的 Pydantic


from pydantic import BaseModel, field_validator
import outlines

class User(BaseModel):
    name: str
    age: int
    email: str

    @field_validator('age')
    @classmethod
    def age_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError('Age must be positive')
        return v

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, User)
result = generator("Create a user named Bob, age 25, email bob@test.com")

列表生成


import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 生成字符串列表(逗号分隔)
list_gen = outlines.generate.format(model, str)
items = list_gen("List three fruits: apple, ")
print(items)  # "banana, orange"

# 生成带格式的列表
generator = outlines.generate.json(model, {"type": "array", "items": {"type": "string"}})
result = generator("List three colors")

函数调用


from pydantic import BaseModel
import outlines

class ExtractInfo(BaseModel):
    name: str
    organization: str | None = None
    title: str | None = None

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, ExtractInfo)

text = "John Smith is the CEO of Acme Corp."
result = generator(f"Extract: {text}")
# {"name": "John Smith", "organization": "Acme Corp.", "title": "CEO"}

XML 生成


xml_schema = """<{name:str}{age:int}{email:str}"""

generator = outlines.generate.format(model, xml_schema)
result = generator("Generate user data in XML format.")

常见工作流

工作流 1:RAG 提取


from pydantic import BaseModel
import outlines

class DocumentMetadata(BaseModel):
    title: str
    author: str | None
    date: str | None
    summary: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, DocumentMetadata)

text = "The paper 'Attention Is All You Need' by Vaswani et al. (2017) introduced the Transformer architecture."

result = generator(f"Extract metadata from: {text}")
# {"title": "Attention Is All You Need", "author": "Vaswani et al.", "date": "2017", "summary": "..."}

工作流 2:批量分类


from typing import Literal
import outlines

Sentiment = Literal["positive", "negative", "neutral"]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])

reviews = [
    "这个产品非常好用!",
    "太差了,完全不好用。",
    "一般般,中规中矩。"
]

for review in reviews:
    sentiment = generator(f"情感分析: {review}")
    print(f"'{review}' -> {sentiment}")

工作流 3:代码生成


import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 生成 Python 函数格式
py_schema = """@def greet(name: str, age: int) -> str:
    return"""

generator = outlines.generate.format(model, py_schema)
code = generator("Write a greet function.")
print(code)
# @def greet(name: str, age: int) -> str:
#     return f"Hello, {name}! You are {age} years old."

配置参数

采样参数


import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 自定义采样
result = outlines.generate.choice(
    model,
    ["positive", "negative", "neutral"],
    sampler=outlines.samplers.TopK(k=50)
)("This movie is amazing!")

# 贪婪采样
result = outlines.generate.choice(
    model,
    ["positive", "negative", "neutral"],
    sampler=outlines.samplers.Greedy()
)("This movie is amazing!")

常见问题

format格式化Python dict、列表
问题解决方案
生成进入死循环确保 Schema/Regex 完整,无歧义分支
输出不完全的 JSON设置 max_tokens 确保完整生成
不支持的模型检查 outlines 版本,使用 transformers 后端

资源链接

  • GitHub: https://github.com/dottxt-ai/outlines
  • 文档: https://dottxt-ai.github.io/outlines/
  • Pydantic 集成: https://dottxt-ai.github.io/outlines/reference/pydantic/
  • 正则语法: https://dottxt-ai.github.io/outlines/reference/regex/

评论区

发表评论


Pydantic 验证失败添加 field_validator 或使用 nullable 字段