系列:「学AI,懂这些Python就够了」—— 面向AI智能体开发者的Python速成指南

标签:Python requests httpx pydantic dataclasses AI Agent

难度:⭐⭐⭐☆☆


前言

如果只让你装4个Python库来做AI开发,我会毫不犹豫地推荐这4个:

  • requests:经典HTTP客户端,调用REST API的第一选择
  • httpx:requests的现代化替代品,原生支持异步
  • pydantic:数据验证框架,让API参数不再"裸奔"
  • dataclasses:标准库中的数据结构利器,配置和模型的最佳归宿

这四个库在AI智能体开发中无处不在——调API、验参数、管配置、建模型,每一个环节都离不开它们。

本文是 「学AI,懂这些Python就够了」 系列的第三篇,带你深入掌握这4个AI开发必备库。


1. requests:HTTP调用的瑞士军刀

“requests是那种你用过一次就不想再用urllib的库” —— 几乎所有Python开发者

1.1 基础请求

import requests

# GET 请求 —— 最常用的操作
response = requests.get("https://api.github.com/users/octocat")
print(response.status_code)  # 200
print(response.json())       # 直接解析JSON

# 带参数的GET请求
response = requests.get(
    "https://api.example.com/search",
    params={"q": "python", "page": 1, "per_page": 20}
)

# POST 请求 —— 发送JSON数据
response = requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers={"Authorization": "Bearer sk-xxx"},
    json={
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Hello"}],
    },
    timeout=30,  # 必须设置超时!
)

⚠️ 重要:永远不要忘记设置 timeout 参数!否则一个卡住的请求会让你的程序永远等下去。

1.2 Session —— 连接复用提升性能

# ❌ 每次请求都建立新连接(慢)
for i in range(100):
    requests.get("https://api.example.com/data")

# ✅ 使用 Session 复用连接
with requests.Session() as session:
    # 设置公共请求头
    session.headers.update({
        "Authorization": "Bearer sk-xxx",
        "User-Agent": "MyAgent/1.0",
    })
    # 设置默认超时
    session.timeout = (5, 30)  # (连接超时, 读取超时)

    for i in range(100):
        response = session.get("https://api.example.com/data")

1.3 错误处理与重试机制

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_robust_session() -> requests.Session:
    """创建一个带重试机制的Session"""
    session = requests.Session()

    # 配置重试策略
    retry_strategy = Retry(
        total=3,                    # 最多重试3次
        backoff_factor=1,           # 重试间隔 = backoff_factor * (2^(retry-1))
        status_forcelist=[429, 500, 502, 503, 504],  # 哪些状态码需要重试
        allowed_methods=["GET", "POST"],  # 允许重试的HTTP方法
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("https://", adapter)
    session.mount("http://", adapter)

    return session

# 使用
def safe_api_call(url: str) -> dict:
    session = create_robust_session()
    try:
        response = session.get(url, timeout=10)
        response.raise_for_status()  # 非2xx状态码会抛出异常
        return response.json()
    except requests.exceptions.Timeout:
        print("请求超时")
        return None
    except requests.exceptions.HTTPError as e:
        print(f"HTTP错误: {e}")
        return None
    except requests.exceptions.ConnectionError:
        print("网络连接错误")
        return None

1.4 实战:LLM API调用封装

class SimpleLLMClient:
    """基于requests的LLM API客户端"""

    def __init__(self, api_key: str, base_url: str = "https://api.openai.com"):
        self.session = requests.Session()
        self.session.headers.update({
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
        })
        self.base_url = base_url

    def chat(self, messages: list, model: str = "gpt-4", **kwargs) -> str:
        """发送对话请求,返回文本响应"""
        payload = {
            "model": model,
            "messages": messages,
            "temperature": kwargs.get("temperature", 0.7),
            "max_tokens": kwargs.get("max_tokens", 2048),
        }

        response = self.session.post(
            f"{self.base_url}/v1/chat/completions",
            json=payload,
            timeout=60,
        )
        response.raise_for_status()
        data = response.json()

        # 同时返回Token使用信息
        usage = data.get("usage", {})
        print(f"[Token] 输入: {usage.get('prompt_tokens')}, "
              f"输出: {usage.get('completion_tokens')}")

        return data["choices"][0]["message"]["content"]

    def stream_chat(self, messages: list, model: str = "gpt-4") -> str:
        """流式对话 —— 逐字返回"""
        payload = {
            "model": model,
            "messages": messages,
            "stream": True,
        }

        with self.session.post(
            f"{self.base_url}/v1/chat/completions",
            json=payload,
            timeout=60,
            stream=True,  # 关键:流式接收
        ) as response:
            response.raise_for_status()
            full_text = ""
            for line in response.iter_lines():
                if line.startswith(b"data: "):
                    data = line[6:]  # 去掉 "data: " 前缀
                    if data == b"[DONE]":
                        break
                    import json
                    chunk = json.loads(data)
                    content = chunk["choices"][0]["delta"].get("content", "")
                    if content:
                        full_text += content
                        print(content, end="", flush=True)
            return full_text

2. httpx:现代化的同步+异步HTTP客户端

2.1 httpx vs requests:为什么需要升级

特性 requests httpx
同步请求
异步请求(async/await)
HTTP/2 支持
完整的类型提示 部分
API风格 经典 与requests几乎一样
连接池
流式响应

结论:如果你只需要同步请求,requests够用。但如果你的项目涉及异步编程(这在AI开发中几乎是必然的),那就直接上 httpx。

2.2 同步用法(和 requests 几乎一样)

import httpx

# 基本用法 —— 语法和requests完全一致
response = httpx.get("https://api.github.com/users/octocat")
data = response.json()

# Client 上下文管理器 —— 连接复用
with httpx.Client(
    base_url="https://api.openai.com",
    headers={"Authorization": "Bearer sk-xxx"},
    timeout=30.0,
) as client:
    # 路径自动拼接base_url
    response = client.post(
        "/v1/chat/completions",
        json={"model": "gpt-4", "messages": [...]}
    )

2.3 异步用法 —— httpx的真正杀手锏

import httpx
import asyncio

async def fetch_multiple_apis():
    """并发调用多个API —— httpx的异步能力"""
    async with httpx.AsyncClient(timeout=30.0) as client:
        # 创建多个异步任务
        tasks = [
            client.get(f"https://api.example.com/data/{i}")
            for i in range(5)
        ]
        # 并发执行所有请求
        responses = await asyncio.gather(*tasks, return_exceptions=True)
        return [r.json() for r in responses if not isinstance(r, Exception)]

# 运行
# asyncio.run(fetch_multiple_apis())

2.4 实战:全功能异步API客户端

import httpx
import asyncio
from typing import Optional, AsyncIterator

class AsyncAIClient:
    """支持流式和非流式的AI API客户端"""

    def __init__(
        self,
        api_key: str,
        base_url: str = "https://api.openai.com",
        max_concurrent: int = 10,
    ):
        self._client: Optional[httpx.AsyncClient] = None
        self.api_key = api_key
        self.base_url = base_url
        self._semaphore = asyncio.Semaphore(max_concurrent)

    async def __aenter__(self):
        self._client = httpx.AsyncClient(
            base_url=self.base_url,
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json",
            },
            timeout=httpx.Timeout(30.0, connect=5.0),
        )
        return self

    async def __aexit__(self, *args):
        await self._client.aclose()

    async def chat(self, messages: list, model: str = "gpt-4") -> dict:
        """非流式对话(带并发控制)"""
        async with self._semaphore:
            response = await self._client.post(
                "/v1/chat/completions",
                json={"model": model, "messages": messages, "stream": False},
            )
            response.raise_for_status()
            return response.json()

    async def chat_stream(self, messages: list, model: str = "gpt-4") -> AsyncIterator[str]:
        """流式对话 —— 返回异步迭代器"""
        async with self._semaphore:
            async with self._client.stream(
                "POST",
                "/v1/chat/completions",
                json={"model": model, "messages": messages, "stream": True},
            ) as response:
                response.raise_for_status()
                async for line in response.aiter_lines():
                    if line.startswith("data: "):
                        data = line[6:]
                        if data == "[DONE]":
                            break
                        import json
                        chunk = json.loads(data)
                        content = chunk["choices"][0]["delta"].get("content", "")
                        if content:
                            yield content

    async def batch_chat(
        self, prompts: list[list[dict]], model: str = "gpt-4"
    ) -> list:
        """批量并发对话 —— 充分利用异步优势"""
        tasks = [self.chat(messages, model) for messages in prompts]
        return await asyncio.gather(*tasks, return_exceptions=True)

# 使用示例
async def demo():
    async with AsyncAIClient(api_key="sk-xxx") as client:
        # 流式对话
        async for chunk in client.chat_stream(
            [{"role": "user", "content": "写一首关于AI的诗"}]
        ):
            print(chunk, end="", flush=True)

3. pydantic:类型安全的数据库

在AI开发中,大量数据以JSON形式在API之间流动。pydantic 让这些数据"有形状、有规则、可验证"。

3.1 基础模型定义

from pydantic import BaseModel, Field
from typing import Optional, List
from datetime import datetime

class ChatMessage(BaseModel):
    """对话消息的数据模型"""
    role: str = Field(..., description="消息角色:system/user/assistant")
    content: str = Field(..., min_length=1, description="消息内容")
    name: Optional[str] = Field(None, description="可选的发送者名称")

class ChatRequest(BaseModel):
    """Chat Completion 请求模型"""
    model: str = Field(default="gpt-4", description="模型名称")
    messages: List[ChatMessage] = Field(..., min_length=1)
    temperature: float = Field(default=0.7, ge=0, le=2.0)
    max_tokens: int = Field(default=2048, ge=1, le=128000)
    stream: bool = Field(default=False)

# 自动验证和类型转换
request = ChatRequest(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],  # 字典自动转为ChatMessage
    temperature="0.7",  # 字符串自动转为float!
    max_tokens="2048",  # 字符串自动转为int!
)
print(request.model_dump_json(indent=2))

3.2 自定义验证器

from pydantic import BaseModel, Field, field_validator
import re

class AgentConfig(BaseModel):
    """智能体配置模型 —— 带自定义验证"""
    name: str = Field(..., min_length=2, max_length=50)
    system_prompt: str = Field(..., min_length=10)
    model: str = "gpt-4"
    temperature: float = Field(default=0.7, ge=0, le=2.0)
    tools: List[str] = Field(default_factory=list)

    @field_validator("name")
    @classmethod
    def name_must_be_valid(cls, v: str) -> str:
        """自定义验证:名称只能包含字母、数字、下划线、中文"""
        if not re.match(r'^[\w一-鿿\s-]+$', v):
            raise ValueError("名称只能包含字母、数字、中文、下划线和连字符")
        return v.strip()

    @field_validator("temperature")
    @classmethod
    def round_temperature(cls, v: float) -> float:
        """自动四舍五入到2位小数"""
        return round(v, 2)

# 验证失败会给出清晰的错误信息
try:
    config = AgentConfig(
        name="A",  # 太短!
        system_prompt="Hi",  # 太短!
    )
except Exception as e:
    print(e)
    # 2 validation errors:
    # name: 长度至少为2
    # system_prompt: 长度至少为10

3.3 嵌套模型 —— 复杂配置一把梭

from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum

class ModelProvider(str, Enum):
    OPENAI = "openai"
    ANTHROPIC = "anthropic"
    LOCAL = "local"

class ModelConfig(BaseModel):
    provider: ModelProvider = ModelProvider.OPENAI
    name: str = "gpt-4"
    temperature: float = 0.7
    max_tokens: int = 2048

class MemoryConfig(BaseModel):
    type: str = "buffer"  # buffer / vector / hybrid
    max_messages: int = 20
    vector_db_url: Optional[str] = None

class ToolConfig(BaseModel):
    name: str
    description: str
    enabled: bool = True
    parameters: dict = Field(default_factory=dict)

class FullAgentConfig(BaseModel):
    """智能体的完整配置 —— 嵌套模型"""
    agent_name: str
    model: ModelConfig = Field(default_factory=ModelConfig)  # 嵌套!
    memory: MemoryConfig = Field(default_factory=MemoryConfig)
    tools: List[ToolConfig] = Field(default_factory=list)
    system_prompt: str

# 从JSON/YAML文件加载配置
config = FullAgentConfig(**yaml.safe_load(open("agent_config.yaml")))
print(config.model_dump_json(indent=2))

3.4 实战:API响应验证

from pydantic import BaseModel
from typing import Optional, List

class TokenUsage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

class Choice(BaseModel):
    index: int
    message: ChatMessage
    finish_reason: Optional[str] = None

class ChatResponse(BaseModel):
    """Chat Completion 响应模型"""
    id: str
    object: str
    created: int
    model: str
    choices: List[Choice]
    usage: TokenUsage

    @property
    def content(self) -> str:
        """快捷获取响应文本"""
        return self.choices[0].message.content

    @property
    def total_cost(self) -> float:
        """快速计算成本(以GPT-4为例)"""
        input_cost = self.usage.prompt_tokens * 0.03 / 1000
        output_cost = self.usage.completion_tokens * 0.06 / 1000
        return round(input_cost + output_cost, 4)

# 使用
def call_llm(messages: list) -> ChatResponse:
    response_data = requests.post(
        "https://api.openai.com/v1/chat/completions",
        json={"model": "gpt-4", "messages": messages},
    ).json()
    return ChatResponse(**response_data)  # 自动验证+解析

response = call_llm([{"role": "user", "content": "Hello"}])
print(response.content)       # 直接获取文本
print(f"${response.total_cost}")  # 直接获取成本

4. dataclasses:标准库中的数据容器

pydantic 功能强大但需要额外安装。如果你只需要纯数据结构(不需要运行时验证),dataclasses 是更轻量的选择。

4.1 基础用法

from dataclasses import dataclass, field
from typing import List

@dataclass
class Document:
    """知识库文档的数据类"""
    title: str
    content: str
    source: str
    chunk_id: str = ""
    metadata: dict = field(default_factory=dict)  # 可变默认值必须用default_factory!

# 自动生成 __init__、__repr__、__eq__ 方法
doc = Document(
    title="Python异步编程",
    content="asyncio是Python的异步编程库...",
    source="https://docs.python.org/3/library/asyncio.html",
)
print(doc)  # Document(title='Python异步编程', content='asyncio是...', ...)

4.2 不可变数据类

from dataclasses import dataclass

@dataclass(frozen=True)  # frozen=True 使实例不可变
class ModelInfo:
    name: str
    provider: str
    max_tokens: int
    cost_per_1k: float

# 类似命名元组,但更灵活
gpt4 = ModelInfo("gpt-4", "openai", 8192, 0.03)
# gpt4.max_tokens = 128000  # 这会报错!FrozenInstanceError

4.3 post_init —— 初始化后处理

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class AgentRun:
    """智能体运行记录"""
    agent_name: str
    input_text: str
    output_text: str = ""
    started_at: datetime = field(default_factory=datetime.now)
    finished_at: datetime | None = None
    token_usage: dict = field(default_factory=dict)

    def __post_init__(self):
        """初始化后自动执行"""
        if not self.input_text.strip():
            raise ValueError("输入不能为空")
        # 自动截断过长的输入
        if len(self.input_text) > 10000:
            self.input_text = self.input_text[:10000] + "..."

4.4 dataclasses vs pydantic:如何选择

维度 dataclasses pydantic
类型验证 ❌ 仅类型提示 ✅ 运行时强制验证
类型转换 ✅ 自动转换(如 str→int)
JSON序列化 ❌ 需手动实现 ✅ 内置 model_dump / model_dump_json
性能 ⚡ 更快(无验证开销) 🐢 较慢(有验证)
依赖 📦 标准库 📦 需安装
Field约束 ❌ 无 ✅ gt/ge/lt/le/min_length 等
自定义验证 ❌ 需手写代码 ✅ @field_validator
嵌套模型 ✅(但无验证) ✅(含验证)
适用场景 内部数据结构、配置类 API参数验证、用户输入校验
# 经验法则:
# ✅ 用 dataclasses:内部数据传递、配置对象、简单的数据容器
# ✅ 用 pydantic:API接口数据、用户输入验证、需要序列化/反序列化的数据

5. 库的选择指南

5.1 HTTP客户端选择

你的场景是什么?
├── 简单的同步API调用 → requests(最稳定、生态最好)
├── 需要异步支持 → httpx(异步+同步双模)
├── HTTP/2 或现代特性 → httpx
├── 老旧项目维护 → requests(别乱动)
└── 新项目、高并发 → httpx(一步到位)

5.2 数据模型选择

你的需求是什么?
├── 简单的数据容器,不关心验证 → dataclasses
├── API请求/响应,需要严格验证 → pydantic
├── 配置文件(从YAML/JSON加载)→ pydantic(自动验证)
├── 高性能场景(大量对象创建)→ dataclasses
└── 需要与FastAPI/LangChain集成 → pydantic

5.3 依赖安装

pip install requests     # HTTP客户端(经典)
pip install httpx        # HTTP客户端(现代化,支持异步)
pip install pydantic     # 数据验证框架
# dataclasses 是 Python 3.7+ 内置标准库,无需安装

6. 总结与下一步

本篇核心知识一览

AI开发四大库
├── requests
│   ├── GET/POST基础请求 + Session连接复用
│   ├── 重试机制(urllib3.Retry)
│   └── 流式响应(stream=True + iter_lines)
├── httpx
│   ├── 同步API(与requests几乎一样)
│   ├── 异步API(AsyncClient + async with)
│   └── HTTP/2 + 类型提示
├── pydantic
│   ├── BaseModel + Field约束
│   ├── @field_validator 自定义验证
│   ├── 嵌套模型 + model_dump序列化
│   └── 自动类型转换
└── dataclasses
    ├── @dataclass + field
    ├── frozen=True 不可变对象
    ├── __post_init__ 初始化后处理
    └── asdict() / astuple() 转换

快速参考卡片

# requests 精华
session = requests.Session()
session.headers.update({"Authorization": "Bearer xxx"})
response = session.post(url, json=data, timeout=30)

# httpx 精华
async with httpx.AsyncClient(base_url=url) as client:
    response = await client.post("/endpoint", json=data)

# pydantic 精华
class MyModel(BaseModel):
    name: str = Field(..., min_length=1)
    age: int = Field(ge=0, le=150)
data = MyModel(**json_data)  # 自动验证+转换

# dataclasses 精华
@dataclass
class Config:
    name: str
    values: list = field(default_factory=list)

下一步

掌握了这4个库,你已经能写出"干净又健壮"的AI应用代码了。下一篇文章中,我们将学习如何让你的智能体拥有"记忆力"——通过JSON、YAML和文件读写实现数据的持久化存储。

上一篇学AI,懂这些Python就够了(二):async/await 异步编程 —— 让AI智能体快10倍的并发秘籍

下一篇学AI,懂这些Python就够了(四):JSON·YAML·文件读写 —— AI智能体的数据持久化实战


本文是「学AI,懂这些Python就够了」系列的第3篇。选对库,事半功倍;选错库,事倍功半。记住:同步用requests,异步用httpx,验证用pydantic,轻量用dataclasses。

Logo

这里是“一人公司”的成长家园。我们提供从产品曝光、技术变现到法律财税的全栈内容,并连接云服务、办公空间等稀缺资源,助你专注创造,无忧运营。

更多推荐