学AI,懂这些Python就够了(三):requests·httpx·pydantic·dataclasses —— AI开发者必装的4个Python库
·
系列:「学AI,懂这些Python就够了」—— 面向AI智能体开发者的Python速成指南
标签:
PythonrequestshttpxpydanticdataclassesAI Agent难度:⭐⭐⭐☆☆
前言
如果只让你装4个Python库来做AI开发,我会毫不犹豫地推荐这4个:
- requests:经典HTTP客户端,调用REST API的第一选择
- httpx:requests的现代化替代品,原生支持异步
- pydantic:数据验证框架,让API参数不再"裸奔"
- dataclasses:标准库中的数据结构利器,配置和模型的最佳归宿
这四个库在AI智能体开发中无处不在——调API、验参数、管配置、建模型,每一个环节都离不开它们。
本文是 「学AI,懂这些Python就够了」 系列的第三篇,带你深入掌握这4个AI开发必备库。
1. requests:HTTP调用的瑞士军刀
“requests是那种你用过一次就不想再用urllib的库” —— 几乎所有Python开发者
1.1 基础请求
import requests
# GET 请求 —— 最常用的操作
response = requests.get("https://api.github.com/users/octocat")
print(response.status_code) # 200
print(response.json()) # 直接解析JSON
# 带参数的GET请求
response = requests.get(
"https://api.example.com/search",
params={"q": "python", "page": 1, "per_page": 20}
)
# POST 请求 —— 发送JSON数据
response = requests.post(
"https://api.openai.com/v1/chat/completions",
headers={"Authorization": "Bearer sk-xxx"},
json={
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}],
},
timeout=30, # 必须设置超时!
)
⚠️ 重要:永远不要忘记设置
timeout参数!否则一个卡住的请求会让你的程序永远等下去。
1.2 Session —— 连接复用提升性能
# ❌ 每次请求都建立新连接(慢)
for i in range(100):
requests.get("https://api.example.com/data")
# ✅ 使用 Session 复用连接
with requests.Session() as session:
# 设置公共请求头
session.headers.update({
"Authorization": "Bearer sk-xxx",
"User-Agent": "MyAgent/1.0",
})
# 设置默认超时
session.timeout = (5, 30) # (连接超时, 读取超时)
for i in range(100):
response = session.get("https://api.example.com/data")
1.3 错误处理与重试机制
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_robust_session() -> requests.Session:
"""创建一个带重试机制的Session"""
session = requests.Session()
# 配置重试策略
retry_strategy = Retry(
total=3, # 最多重试3次
backoff_factor=1, # 重试间隔 = backoff_factor * (2^(retry-1))
status_forcelist=[429, 500, 502, 503, 504], # 哪些状态码需要重试
allowed_methods=["GET", "POST"], # 允许重试的HTTP方法
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
return session
# 使用
def safe_api_call(url: str) -> dict:
session = create_robust_session()
try:
response = session.get(url, timeout=10)
response.raise_for_status() # 非2xx状态码会抛出异常
return response.json()
except requests.exceptions.Timeout:
print("请求超时")
return None
except requests.exceptions.HTTPError as e:
print(f"HTTP错误: {e}")
return None
except requests.exceptions.ConnectionError:
print("网络连接错误")
return None
1.4 实战:LLM API调用封装
class SimpleLLMClient:
"""基于requests的LLM API客户端"""
def __init__(self, api_key: str, base_url: str = "https://api.openai.com"):
self.session = requests.Session()
self.session.headers.update({
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
})
self.base_url = base_url
def chat(self, messages: list, model: str = "gpt-4", **kwargs) -> str:
"""发送对话请求,返回文本响应"""
payload = {
"model": model,
"messages": messages,
"temperature": kwargs.get("temperature", 0.7),
"max_tokens": kwargs.get("max_tokens", 2048),
}
response = self.session.post(
f"{self.base_url}/v1/chat/completions",
json=payload,
timeout=60,
)
response.raise_for_status()
data = response.json()
# 同时返回Token使用信息
usage = data.get("usage", {})
print(f"[Token] 输入: {usage.get('prompt_tokens')}, "
f"输出: {usage.get('completion_tokens')}")
return data["choices"][0]["message"]["content"]
def stream_chat(self, messages: list, model: str = "gpt-4") -> str:
"""流式对话 —— 逐字返回"""
payload = {
"model": model,
"messages": messages,
"stream": True,
}
with self.session.post(
f"{self.base_url}/v1/chat/completions",
json=payload,
timeout=60,
stream=True, # 关键:流式接收
) as response:
response.raise_for_status()
full_text = ""
for line in response.iter_lines():
if line.startswith(b"data: "):
data = line[6:] # 去掉 "data: " 前缀
if data == b"[DONE]":
break
import json
chunk = json.loads(data)
content = chunk["choices"][0]["delta"].get("content", "")
if content:
full_text += content
print(content, end="", flush=True)
return full_text
2. httpx:现代化的同步+异步HTTP客户端
2.1 httpx vs requests:为什么需要升级
| 特性 | requests | httpx |
|---|---|---|
| 同步请求 | ✅ | ✅ |
| 异步请求(async/await) | ❌ | ✅ |
| HTTP/2 支持 | ❌ | ✅ |
| 完整的类型提示 | 部分 | ✅ |
| API风格 | 经典 | 与requests几乎一样 |
| 连接池 | ✅ | ✅ |
| 流式响应 | ✅ | ✅ |
结论:如果你只需要同步请求,requests够用。但如果你的项目涉及异步编程(这在AI开发中几乎是必然的),那就直接上 httpx。
2.2 同步用法(和 requests 几乎一样)
import httpx
# 基本用法 —— 语法和requests完全一致
response = httpx.get("https://api.github.com/users/octocat")
data = response.json()
# Client 上下文管理器 —— 连接复用
with httpx.Client(
base_url="https://api.openai.com",
headers={"Authorization": "Bearer sk-xxx"},
timeout=30.0,
) as client:
# 路径自动拼接base_url
response = client.post(
"/v1/chat/completions",
json={"model": "gpt-4", "messages": [...]}
)
2.3 异步用法 —— httpx的真正杀手锏
import httpx
import asyncio
async def fetch_multiple_apis():
"""并发调用多个API —— httpx的异步能力"""
async with httpx.AsyncClient(timeout=30.0) as client:
# 创建多个异步任务
tasks = [
client.get(f"https://api.example.com/data/{i}")
for i in range(5)
]
# 并发执行所有请求
responses = await asyncio.gather(*tasks, return_exceptions=True)
return [r.json() for r in responses if not isinstance(r, Exception)]
# 运行
# asyncio.run(fetch_multiple_apis())
2.4 实战:全功能异步API客户端
import httpx
import asyncio
from typing import Optional, AsyncIterator
class AsyncAIClient:
"""支持流式和非流式的AI API客户端"""
def __init__(
self,
api_key: str,
base_url: str = "https://api.openai.com",
max_concurrent: int = 10,
):
self._client: Optional[httpx.AsyncClient] = None
self.api_key = api_key
self.base_url = base_url
self._semaphore = asyncio.Semaphore(max_concurrent)
async def __aenter__(self):
self._client = httpx.AsyncClient(
base_url=self.base_url,
headers={
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
},
timeout=httpx.Timeout(30.0, connect=5.0),
)
return self
async def __aexit__(self, *args):
await self._client.aclose()
async def chat(self, messages: list, model: str = "gpt-4") -> dict:
"""非流式对话(带并发控制)"""
async with self._semaphore:
response = await self._client.post(
"/v1/chat/completions",
json={"model": model, "messages": messages, "stream": False},
)
response.raise_for_status()
return response.json()
async def chat_stream(self, messages: list, model: str = "gpt-4") -> AsyncIterator[str]:
"""流式对话 —— 返回异步迭代器"""
async with self._semaphore:
async with self._client.stream(
"POST",
"/v1/chat/completions",
json={"model": model, "messages": messages, "stream": True},
) as response:
response.raise_for_status()
async for line in response.aiter_lines():
if line.startswith("data: "):
data = line[6:]
if data == "[DONE]":
break
import json
chunk = json.loads(data)
content = chunk["choices"][0]["delta"].get("content", "")
if content:
yield content
async def batch_chat(
self, prompts: list[list[dict]], model: str = "gpt-4"
) -> list:
"""批量并发对话 —— 充分利用异步优势"""
tasks = [self.chat(messages, model) for messages in prompts]
return await asyncio.gather(*tasks, return_exceptions=True)
# 使用示例
async def demo():
async with AsyncAIClient(api_key="sk-xxx") as client:
# 流式对话
async for chunk in client.chat_stream(
[{"role": "user", "content": "写一首关于AI的诗"}]
):
print(chunk, end="", flush=True)
3. pydantic:类型安全的数据库
在AI开发中,大量数据以JSON形式在API之间流动。pydantic 让这些数据"有形状、有规则、可验证"。
3.1 基础模型定义
from pydantic import BaseModel, Field
from typing import Optional, List
from datetime import datetime
class ChatMessage(BaseModel):
"""对话消息的数据模型"""
role: str = Field(..., description="消息角色:system/user/assistant")
content: str = Field(..., min_length=1, description="消息内容")
name: Optional[str] = Field(None, description="可选的发送者名称")
class ChatRequest(BaseModel):
"""Chat Completion 请求模型"""
model: str = Field(default="gpt-4", description="模型名称")
messages: List[ChatMessage] = Field(..., min_length=1)
temperature: float = Field(default=0.7, ge=0, le=2.0)
max_tokens: int = Field(default=2048, ge=1, le=128000)
stream: bool = Field(default=False)
# 自动验证和类型转换
request = ChatRequest(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}], # 字典自动转为ChatMessage
temperature="0.7", # 字符串自动转为float!
max_tokens="2048", # 字符串自动转为int!
)
print(request.model_dump_json(indent=2))
3.2 自定义验证器
from pydantic import BaseModel, Field, field_validator
import re
class AgentConfig(BaseModel):
"""智能体配置模型 —— 带自定义验证"""
name: str = Field(..., min_length=2, max_length=50)
system_prompt: str = Field(..., min_length=10)
model: str = "gpt-4"
temperature: float = Field(default=0.7, ge=0, le=2.0)
tools: List[str] = Field(default_factory=list)
@field_validator("name")
@classmethod
def name_must_be_valid(cls, v: str) -> str:
"""自定义验证:名称只能包含字母、数字、下划线、中文"""
if not re.match(r'^[\w一-鿿\s-]+$', v):
raise ValueError("名称只能包含字母、数字、中文、下划线和连字符")
return v.strip()
@field_validator("temperature")
@classmethod
def round_temperature(cls, v: float) -> float:
"""自动四舍五入到2位小数"""
return round(v, 2)
# 验证失败会给出清晰的错误信息
try:
config = AgentConfig(
name="A", # 太短!
system_prompt="Hi", # 太短!
)
except Exception as e:
print(e)
# 2 validation errors:
# name: 长度至少为2
# system_prompt: 长度至少为10
3.3 嵌套模型 —— 复杂配置一把梭
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum
class ModelProvider(str, Enum):
OPENAI = "openai"
ANTHROPIC = "anthropic"
LOCAL = "local"
class ModelConfig(BaseModel):
provider: ModelProvider = ModelProvider.OPENAI
name: str = "gpt-4"
temperature: float = 0.7
max_tokens: int = 2048
class MemoryConfig(BaseModel):
type: str = "buffer" # buffer / vector / hybrid
max_messages: int = 20
vector_db_url: Optional[str] = None
class ToolConfig(BaseModel):
name: str
description: str
enabled: bool = True
parameters: dict = Field(default_factory=dict)
class FullAgentConfig(BaseModel):
"""智能体的完整配置 —— 嵌套模型"""
agent_name: str
model: ModelConfig = Field(default_factory=ModelConfig) # 嵌套!
memory: MemoryConfig = Field(default_factory=MemoryConfig)
tools: List[ToolConfig] = Field(default_factory=list)
system_prompt: str
# 从JSON/YAML文件加载配置
config = FullAgentConfig(**yaml.safe_load(open("agent_config.yaml")))
print(config.model_dump_json(indent=2))
3.4 实战:API响应验证
from pydantic import BaseModel
from typing import Optional, List
class TokenUsage(BaseModel):
prompt_tokens: int
completion_tokens: int
total_tokens: int
class Choice(BaseModel):
index: int
message: ChatMessage
finish_reason: Optional[str] = None
class ChatResponse(BaseModel):
"""Chat Completion 响应模型"""
id: str
object: str
created: int
model: str
choices: List[Choice]
usage: TokenUsage
@property
def content(self) -> str:
"""快捷获取响应文本"""
return self.choices[0].message.content
@property
def total_cost(self) -> float:
"""快速计算成本(以GPT-4为例)"""
input_cost = self.usage.prompt_tokens * 0.03 / 1000
output_cost = self.usage.completion_tokens * 0.06 / 1000
return round(input_cost + output_cost, 4)
# 使用
def call_llm(messages: list) -> ChatResponse:
response_data = requests.post(
"https://api.openai.com/v1/chat/completions",
json={"model": "gpt-4", "messages": messages},
).json()
return ChatResponse(**response_data) # 自动验证+解析
response = call_llm([{"role": "user", "content": "Hello"}])
print(response.content) # 直接获取文本
print(f"${response.total_cost}") # 直接获取成本
4. dataclasses:标准库中的数据容器
pydantic 功能强大但需要额外安装。如果你只需要纯数据结构(不需要运行时验证),dataclasses 是更轻量的选择。
4.1 基础用法
from dataclasses import dataclass, field
from typing import List
@dataclass
class Document:
"""知识库文档的数据类"""
title: str
content: str
source: str
chunk_id: str = ""
metadata: dict = field(default_factory=dict) # 可变默认值必须用default_factory!
# 自动生成 __init__、__repr__、__eq__ 方法
doc = Document(
title="Python异步编程",
content="asyncio是Python的异步编程库...",
source="https://docs.python.org/3/library/asyncio.html",
)
print(doc) # Document(title='Python异步编程', content='asyncio是...', ...)
4.2 不可变数据类
from dataclasses import dataclass
@dataclass(frozen=True) # frozen=True 使实例不可变
class ModelInfo:
name: str
provider: str
max_tokens: int
cost_per_1k: float
# 类似命名元组,但更灵活
gpt4 = ModelInfo("gpt-4", "openai", 8192, 0.03)
# gpt4.max_tokens = 128000 # 这会报错!FrozenInstanceError
4.3 post_init —— 初始化后处理
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class AgentRun:
"""智能体运行记录"""
agent_name: str
input_text: str
output_text: str = ""
started_at: datetime = field(default_factory=datetime.now)
finished_at: datetime | None = None
token_usage: dict = field(default_factory=dict)
def __post_init__(self):
"""初始化后自动执行"""
if not self.input_text.strip():
raise ValueError("输入不能为空")
# 自动截断过长的输入
if len(self.input_text) > 10000:
self.input_text = self.input_text[:10000] + "..."
4.4 dataclasses vs pydantic:如何选择
| 维度 | dataclasses | pydantic |
|---|---|---|
| 类型验证 | ❌ 仅类型提示 | ✅ 运行时强制验证 |
| 类型转换 | ❌ | ✅ 自动转换(如 str→int) |
| JSON序列化 | ❌ 需手动实现 | ✅ 内置 model_dump / model_dump_json |
| 性能 | ⚡ 更快(无验证开销) | 🐢 较慢(有验证) |
| 依赖 | 📦 标准库 | 📦 需安装 |
| Field约束 | ❌ 无 | ✅ gt/ge/lt/le/min_length 等 |
| 自定义验证 | ❌ 需手写代码 | ✅ @field_validator |
| 嵌套模型 | ✅(但无验证) | ✅(含验证) |
| 适用场景 | 内部数据结构、配置类 | API参数验证、用户输入校验 |
# 经验法则:
# ✅ 用 dataclasses:内部数据传递、配置对象、简单的数据容器
# ✅ 用 pydantic:API接口数据、用户输入验证、需要序列化/反序列化的数据
5. 库的选择指南
5.1 HTTP客户端选择
你的场景是什么?
├── 简单的同步API调用 → requests(最稳定、生态最好)
├── 需要异步支持 → httpx(异步+同步双模)
├── HTTP/2 或现代特性 → httpx
├── 老旧项目维护 → requests(别乱动)
└── 新项目、高并发 → httpx(一步到位)
5.2 数据模型选择
你的需求是什么?
├── 简单的数据容器,不关心验证 → dataclasses
├── API请求/响应,需要严格验证 → pydantic
├── 配置文件(从YAML/JSON加载)→ pydantic(自动验证)
├── 高性能场景(大量对象创建)→ dataclasses
└── 需要与FastAPI/LangChain集成 → pydantic
5.3 依赖安装
pip install requests # HTTP客户端(经典)
pip install httpx # HTTP客户端(现代化,支持异步)
pip install pydantic # 数据验证框架
# dataclasses 是 Python 3.7+ 内置标准库,无需安装
6. 总结与下一步
本篇核心知识一览
AI开发四大库
├── requests
│ ├── GET/POST基础请求 + Session连接复用
│ ├── 重试机制(urllib3.Retry)
│ └── 流式响应(stream=True + iter_lines)
├── httpx
│ ├── 同步API(与requests几乎一样)
│ ├── 异步API(AsyncClient + async with)
│ └── HTTP/2 + 类型提示
├── pydantic
│ ├── BaseModel + Field约束
│ ├── @field_validator 自定义验证
│ ├── 嵌套模型 + model_dump序列化
│ └── 自动类型转换
└── dataclasses
├── @dataclass + field
├── frozen=True 不可变对象
├── __post_init__ 初始化后处理
└── asdict() / astuple() 转换
快速参考卡片
# requests 精华
session = requests.Session()
session.headers.update({"Authorization": "Bearer xxx"})
response = session.post(url, json=data, timeout=30)
# httpx 精华
async with httpx.AsyncClient(base_url=url) as client:
response = await client.post("/endpoint", json=data)
# pydantic 精华
class MyModel(BaseModel):
name: str = Field(..., min_length=1)
age: int = Field(ge=0, le=150)
data = MyModel(**json_data) # 自动验证+转换
# dataclasses 精华
@dataclass
class Config:
name: str
values: list = field(default_factory=list)
下一步
掌握了这4个库,你已经能写出"干净又健壮"的AI应用代码了。下一篇文章中,我们将学习如何让你的智能体拥有"记忆力"——通过JSON、YAML和文件读写实现数据的持久化存储。
本文是「学AI,懂这些Python就够了」系列的第3篇。选对库,事半功倍;选错库,事倍功半。记住:同步用requests,异步用httpx,验证用pydantic,轻量用dataclasses。
更多推荐


所有评论(0)