mirror of
https://github.com/Usagi-org/ai-goofish-monitor.git
synced 2025-11-25 03:15:07 +08:00
添加Basic认证,调整提示词顺序,使AI回复更遵从JSON格式
This commit is contained in:
@@ -49,3 +49,7 @@ AI_DEBUG_MODE=false
|
||||
|
||||
# 服务端口自定义 不配置默认8000
|
||||
SERVER_PORT=8000
|
||||
|
||||
# Web服务认证配置
|
||||
WEB_USERNAME=admin
|
||||
WEB_PASSWORD=admin123
|
||||
146
AUTH_README.md
Normal file
146
AUTH_README.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# Web服务认证配置说明
|
||||
|
||||
## 概述
|
||||
|
||||
本项目的Web服务现在支持Basic认证,确保只有授权用户才能访问管理界面和API。
|
||||
|
||||
## 配置方法
|
||||
|
||||
### 1. 环境变量配置
|
||||
|
||||
在 `.env` 文件中添加以下配置:
|
||||
|
||||
```bash
|
||||
# Web服务认证配置
|
||||
WEB_USERNAME=admin
|
||||
WEB_PASSWORD=admin123
|
||||
```
|
||||
|
||||
### 2. 默认凭据
|
||||
|
||||
如果未在 `.env` 文件中设置认证凭据,系统将使用以下默认值:
|
||||
- 用户名:`admin`
|
||||
- 密码:`admin123`
|
||||
|
||||
**⚠️ 重要:生产环境请务必修改默认密码!**
|
||||
|
||||
## 认证范围
|
||||
|
||||
### 需要认证的端点
|
||||
|
||||
以下所有API端点和页面都需要Basic认证:
|
||||
|
||||
- **Web界面**:`/` - 主管理界面
|
||||
- **任务管理**:
|
||||
- `GET /api/tasks` - 获取任务列表
|
||||
- `POST /api/tasks/generate` - AI生成任务
|
||||
- `POST /api/tasks` - 创建任务
|
||||
- `PATCH /api/tasks/{task_id}` - 更新任务
|
||||
- `POST /api/tasks/start/{task_id}` - 启动任务
|
||||
- `POST /api/tasks/stop/{task_id}` - 停止任务
|
||||
- `DELETE /api/tasks/{task_id}` - 删除任务
|
||||
- **日志管理**:
|
||||
- `GET /api/logs` - 获取日志
|
||||
- `DELETE /api/logs` - 清空日志
|
||||
- **结果管理**:
|
||||
- `GET /api/results/files` - 获取结果文件列表
|
||||
- `GET /api/results/{filename}` - 获取结果文件内容
|
||||
- `DELETE /api/results/files/{filename}` - 删除结果文件
|
||||
- **系统设置**:
|
||||
- `GET /api/settings/status` - 获取系统状态
|
||||
- `GET /api/settings/notifications` - 获取通知设置
|
||||
- `PUT /api/settings/notifications` - 更新通知设置
|
||||
- **Prompt管理**:
|
||||
- `GET /api/prompts` - 获取prompt文件列表
|
||||
- `GET /api/prompts/{filename}` - 获取prompt文件内容
|
||||
- `PUT /api/prompts/{filename}` - 更新prompt文件
|
||||
- **登录状态管理**:
|
||||
- `POST /api/login-state` - 更新登录状态
|
||||
- `DELETE /api/login-state` - 删除登录状态
|
||||
- **静态文件**:`/static/*` - CSS、JS、图片等静态资源
|
||||
|
||||
### 不需要认证的端点
|
||||
|
||||
- `GET /health` - 健康检查端点
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 1. 浏览器访问
|
||||
|
||||
当你在浏览器中访问Web界面时,会弹出认证对话框,输入配置的用户名和密码即可。
|
||||
|
||||
### 2. API调用
|
||||
|
||||
使用API时,需要在请求头中包含Basic认证信息:
|
||||
|
||||
```bash
|
||||
# 使用curl示例
|
||||
curl -u admin:admin123 http://localhost:8000/api/tasks
|
||||
|
||||
# 使用Python requests示例
|
||||
import requests
|
||||
from requests.auth import HTTPBasicAuth
|
||||
|
||||
response = requests.get(
|
||||
'http://localhost:8000/api/tasks',
|
||||
auth=HTTPBasicAuth('admin', 'admin123')
|
||||
)
|
||||
```
|
||||
|
||||
### 3. JavaScript前端
|
||||
|
||||
前端JavaScript代码会自动处理认证,无需修改。
|
||||
|
||||
## 安全建议
|
||||
|
||||
1. **修改默认密码**:生产环境务必修改默认的用户名和密码
|
||||
2. **使用强密码**:密码应包含大小写字母、数字和特殊字符
|
||||
3. **HTTPS部署**:生产环境建议使用HTTPS协议
|
||||
4. **定期更换密码**:建议定期更换认证凭据
|
||||
5. **限制访问IP**:可以通过防火墙限制访问IP范围
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 认证失败
|
||||
|
||||
1. 检查 `.env` 文件中的 `WEB_USERNAME` 和 `WEB_PASSWORD` 配置
|
||||
2. 确认环境变量已正确加载
|
||||
3. 检查用户名和密码是否正确输入
|
||||
|
||||
### 静态资源无法加载
|
||||
|
||||
1. 确认浏览器已通过认证
|
||||
2. 检查静态文件路径是否正确
|
||||
3. 查看浏览器开发者工具的网络请求
|
||||
|
||||
## 配置示例
|
||||
|
||||
### 完整的 .env 配置示例
|
||||
|
||||
```bash
|
||||
# Web服务认证配置
|
||||
WEB_USERNAME=myadmin
|
||||
WEB_PASSWORD=MySecurePassword123!
|
||||
|
||||
# 其他配置...
|
||||
OPENAI_API_KEY=your_openai_api_key
|
||||
NTFY_TOPIC_URL=https://ntfy.sh/your_topic
|
||||
```
|
||||
|
||||
### Docker部署配置
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
web:
|
||||
build: .
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
- WEB_USERNAME=admin
|
||||
- WEB_PASSWORD=secure_password_here
|
||||
volumes:
|
||||
- ./config.json:/app/config.json
|
||||
- ./logs:/app/logs
|
||||
- ./results:/app/results
|
||||
```
|
||||
48
README.md
48
README.md
@@ -87,9 +87,13 @@ pip install -r requirements.txt
|
||||
| `RUN_HEADLESS` | 是否以无头模式运行爬虫浏览器。 | 否 | 默认为 `true`。在本地调试遇到验证码时可设为 `false` 手动处理。**Docker部署时必须为 `true`**。 |
|
||||
| `AI_DEBUG_MODE` | 是否开启AI调试模式。 | 否 | 默认为 `false`。开启后会在控制台打印详细的AI请求和响应日志。 |
|
||||
| `SERVER_PORT` | Web UI服务的运行端口。 | 否 | 默认为 `8000`。 |
|
||||
| `WEB_USERNAME` | Web界面登录用户名。 | 否 | 默认为 `admin`。生产环境请务必修改。 |
|
||||
| `WEB_PASSWORD` | Web界面登录密码。 | 否 | 默认为 `admin123`。生产环境请务必修改为强密码。 |
|
||||
|
||||
> 💡 **调试建议**: 如果在配置AI API时遇到404错误,建议先使用阿里云或火山提供的API进行调试,确保基础功能正常后再尝试其他API提供商。某些API提供商可能存在兼容性问题或需要特殊的配置。
|
||||
|
||||
> 🔐 **安全提醒**: Web界面已启用Basic认证保护。默认用户名和密码为 `admin` / `admin123`,生产环境请务必修改为强密码!
|
||||
|
||||
2. **获取登录状态 (重要!)**: 为了让爬虫能够以登录状态访问闲鱼,必须先提供有效的登录凭证。我们推荐使用Web UI来完成此操作:
|
||||
|
||||
**推荐方式:通过 Web UI 更新**
|
||||
@@ -224,6 +228,50 @@ graph TD
|
||||
I --> C;
|
||||
```
|
||||
|
||||
## 🔐 Web界面认证
|
||||
|
||||
### 认证配置
|
||||
|
||||
Web界面已启用Basic认证保护,确保只有授权用户才能访问管理界面和API。
|
||||
|
||||
#### 配置方法
|
||||
|
||||
在 `.env` 文件中设置认证凭据:
|
||||
|
||||
```bash
|
||||
# Web服务认证配置
|
||||
WEB_USERNAME=admin
|
||||
WEB_PASSWORD=admin123
|
||||
```
|
||||
|
||||
#### 默认凭据
|
||||
|
||||
如果未在 `.env` 文件中设置认证凭据,系统将使用以下默认值:
|
||||
- 用户名:`admin`
|
||||
- 密码:`admin123`
|
||||
|
||||
**⚠️ 重要:生产环境请务必修改默认密码!**
|
||||
|
||||
#### 认证范围
|
||||
|
||||
- **需要认证**:所有API端点、Web界面、静态资源
|
||||
- **无需认证**:健康检查端点 (`/health`)
|
||||
|
||||
#### 使用方法
|
||||
|
||||
1. **浏览器访问**:访问Web界面时会弹出认证对话框
|
||||
2. **API调用**:需要在请求头中包含Basic认证信息
|
||||
3. **前端JavaScript**:会自动处理认证,无需修改
|
||||
|
||||
#### 安全建议
|
||||
|
||||
1. 修改默认密码为强密码
|
||||
2. 生产环境使用HTTPS协议
|
||||
3. 定期更换认证凭据
|
||||
4. 通过防火墙限制访问IP范围
|
||||
|
||||
详细配置说明请参考 [AUTH_README.md](AUTH_README.md)。
|
||||
|
||||
## 常见问题 (FAQ)
|
||||
|
||||
这里整理了一些社区用户在 Issues 中提出的常见问题及其解答。
|
||||
|
||||
15
spider_v2.py
15
spider_v2.py
@@ -52,17 +52,32 @@ async def main():
|
||||
|
||||
# 动态组合成最终的Prompt
|
||||
task['ai_prompt_text'] = base_prompt.replace("{{CRITERIA_SECTION}}", criteria_text)
|
||||
|
||||
# 验证生成的prompt是否有效
|
||||
if len(task['ai_prompt_text']) < 100:
|
||||
print(f"警告: 任务 '{task['task_name']}' 生成的prompt过短 ({len(task['ai_prompt_text'])} 字符),可能存在问题。")
|
||||
elif "{{CRITERIA_SECTION}}" in task['ai_prompt_text']:
|
||||
print(f"警告: 任务 '{task['task_name']}' 的prompt中仍包含占位符,替换可能失败。")
|
||||
else:
|
||||
print(f"✅ 任务 '{task['task_name']}' 的prompt生成成功,长度: {len(task['ai_prompt_text'])} 字符")
|
||||
|
||||
except FileNotFoundError as e:
|
||||
print(f"警告: 任务 '{task['task_name']}' 的prompt文件缺失: {e},该任务的AI分析将被跳过。")
|
||||
task['ai_prompt_text'] = ""
|
||||
except Exception as e:
|
||||
print(f"错误: 任务 '{task['task_name']}' 处理prompt文件时发生异常: {e},该任务的AI分析将被跳过。")
|
||||
task['ai_prompt_text'] = ""
|
||||
elif task.get("enabled", False) and task.get("ai_prompt_file"):
|
||||
try:
|
||||
with open(task["ai_prompt_file"], 'r', encoding='utf-8') as f:
|
||||
task['ai_prompt_text'] = f.read()
|
||||
print(f"✅ 任务 '{task['task_name']}' 的prompt文件读取成功,长度: {len(task['ai_prompt_text'])} 字符")
|
||||
except FileNotFoundError:
|
||||
print(f"警告: 任务 '{task['task_name']}' 的prompt文件 '{task['ai_prompt_file']}' 未找到,该任务的AI分析将被跳过。")
|
||||
task['ai_prompt_text'] = ""
|
||||
except Exception as e:
|
||||
print(f"错误: 任务 '{task['task_name']}' 读取prompt文件时发生异常: {e},该任务的AI分析将被跳过。")
|
||||
task['ai_prompt_text'] = ""
|
||||
|
||||
print("\n--- 开始执行监控任务 ---")
|
||||
if args.debug_limit > 0:
|
||||
|
||||
@@ -5,6 +5,7 @@ import os
|
||||
import re
|
||||
import sys
|
||||
import shutil
|
||||
from datetime import datetime
|
||||
from urllib.parse import urlencode, urlparse, urlunparse, parse_qsl
|
||||
|
||||
import requests
|
||||
@@ -133,6 +134,68 @@ def encode_image_to_base64(image_path):
|
||||
return None
|
||||
|
||||
|
||||
def validate_ai_response_format(parsed_response):
|
||||
"""验证AI响应的格式是否符合预期结构"""
|
||||
required_fields = [
|
||||
"prompt_version",
|
||||
"is_recommended",
|
||||
"reason",
|
||||
"risk_tags",
|
||||
"criteria_analysis"
|
||||
]
|
||||
|
||||
criteria_analysis_fields = [
|
||||
"model_chip",
|
||||
"battery_health",
|
||||
"condition",
|
||||
"history",
|
||||
"seller_type",
|
||||
"shipping",
|
||||
"seller_credit"
|
||||
]
|
||||
|
||||
seller_type_fields = [
|
||||
"status",
|
||||
"persona",
|
||||
"comment",
|
||||
"analysis_details"
|
||||
]
|
||||
|
||||
# 检查顶层字段
|
||||
for field in required_fields:
|
||||
if field not in parsed_response:
|
||||
safe_print(f" [AI分析] 警告:响应缺少必需字段 '{field}'")
|
||||
return False
|
||||
|
||||
# 检查criteria_analysis字段
|
||||
criteria_analysis = parsed_response.get("criteria_analysis", {})
|
||||
for field in criteria_analysis_fields:
|
||||
if field not in criteria_analysis:
|
||||
safe_print(f" [AI分析] 警告:criteria_analysis缺少字段 '{field}'")
|
||||
return False
|
||||
|
||||
# 检查seller_type的analysis_details
|
||||
seller_type = criteria_analysis.get("seller_type", {})
|
||||
if "analysis_details" in seller_type:
|
||||
analysis_details = seller_type["analysis_details"]
|
||||
required_details = ["temporal_analysis", "selling_behavior", "buying_behavior", "behavioral_summary"]
|
||||
for detail in required_details:
|
||||
if detail not in analysis_details:
|
||||
safe_print(f" [AI分析] 警告:analysis_details缺少字段 '{detail}'")
|
||||
return False
|
||||
|
||||
# 检查数据类型
|
||||
if not isinstance(parsed_response.get("is_recommended"), bool):
|
||||
safe_print(" [AI分析] 警告:is_recommended字段不是布尔类型")
|
||||
return False
|
||||
|
||||
if not isinstance(parsed_response.get("risk_tags"), list):
|
||||
safe_print(" [AI分析] 警告:risk_tags字段不是列表类型")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
@retry_on_failure(retries=3, delay=5)
|
||||
async def send_ntfy_notification(product_data, reason):
|
||||
"""当发现推荐商品时,异步发送一个高优先级的 ntfy.sh 通知。"""
|
||||
@@ -361,7 +424,7 @@ async def send_ntfy_notification(product_data, reason):
|
||||
safe_print(f" -> 发送 Webhook 通知时发生未知错误: {e}")
|
||||
|
||||
|
||||
@retry_on_failure(retries=5, delay=10)
|
||||
@retry_on_failure(retries=3, delay=5)
|
||||
async def get_ai_analysis(product_data, image_paths=None, prompt_text=""):
|
||||
"""将完整的商品JSON数据和所有图片发送给 AI 进行分析(异步)。"""
|
||||
if not client:
|
||||
@@ -383,21 +446,23 @@ async def get_ai_analysis(product_data, image_paths=None, prompt_text=""):
|
||||
|
||||
if AI_DEBUG_MODE:
|
||||
safe_print("\n--- [AI DEBUG] ---")
|
||||
safe_print("--- PROMPT TEXT (first 500 chars) ---")
|
||||
safe_print(prompt_text[:500] + "...")
|
||||
safe_print("--- PRODUCT DATA (JSON) ---")
|
||||
safe_print(product_details_json)
|
||||
safe_print("--- PROMPT TEXT (完整内容) ---")
|
||||
safe_print(prompt_text)
|
||||
safe_print("-------------------\n")
|
||||
|
||||
combined_text_prompt = f"""{system_prompt}
|
||||
|
||||
请基于你的专业知识和我的要求,分析以下完整的商品JSON数据:
|
||||
combined_text_prompt = f"""请基于你的专业知识和我的要求,分析以下完整的商品JSON数据:
|
||||
|
||||
```json
|
||||
{product_details_json}
|
||||
"""
|
||||
user_content_list = [{"type": "text", "text": combined_text_prompt}]
|
||||
```
|
||||
|
||||
{system_prompt}
|
||||
"""
|
||||
user_content_list = []
|
||||
|
||||
# 先添加图片内容
|
||||
if image_paths:
|
||||
for path in image_paths:
|
||||
base64_image = encode_image_to_base64(path)
|
||||
@@ -405,38 +470,124 @@ async def get_ai_analysis(product_data, image_paths=None, prompt_text=""):
|
||||
user_content_list.append(
|
||||
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}})
|
||||
|
||||
# 再添加文本内容
|
||||
user_content_list.append({"type": "text", "text": combined_text_prompt})
|
||||
|
||||
messages = [{"role": "user", "content": user_content_list}]
|
||||
|
||||
response = await client.chat.completions.create(
|
||||
model=MODEL_NAME,
|
||||
messages=messages,
|
||||
response_format={"type": "json_object"}
|
||||
)
|
||||
|
||||
ai_response_content = response.choices[0].message.content
|
||||
|
||||
if AI_DEBUG_MODE:
|
||||
safe_print("\n--- [AI DEBUG] ---")
|
||||
safe_print("--- RAW AI RESPONSE ---")
|
||||
safe_print(ai_response_content)
|
||||
safe_print("---------------------\n")
|
||||
|
||||
# 保存最终传输内容到日志文件
|
||||
try:
|
||||
# --- 新增代码:从Markdown代码块中提取JSON ---
|
||||
# 寻找第一个 "{" 和最后一个 "}" 来捕获完整的JSON对象
|
||||
json_start_index = ai_response_content.find('{')
|
||||
json_end_index = ai_response_content.rfind('}')
|
||||
# 创建logs文件夹
|
||||
logs_dir = "logs"
|
||||
os.makedirs(logs_dir, exist_ok=True)
|
||||
|
||||
if json_start_index != -1 and json_end_index != -1:
|
||||
clean_json_str = ai_response_content[json_start_index : json_end_index + 1]
|
||||
return json.loads(clean_json_str)
|
||||
else:
|
||||
# 如果找不到 "{" 或 "}",说明响应格式异常,按原样尝试解析并准备捕获错误
|
||||
safe_print("---!!! AI RESPONSE WARNING: Could not find JSON object markers '{' and '}' in the response. !!!---")
|
||||
return json.loads(ai_response_content) # 这行很可能会再次触发错误,但保留逻辑完整性
|
||||
# --- 修改结束 ---
|
||||
# 生成日志文件名(当前时间)
|
||||
current_time = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
log_filename = f"{current_time}.log"
|
||||
log_filepath = os.path.join(logs_dir, log_filename)
|
||||
|
||||
# 准备日志内容 - 直接保存原始传输内容
|
||||
log_content = json.dumps(messages, ensure_ascii=False)
|
||||
|
||||
# 写入日志文件
|
||||
with open(log_filepath, 'w', encoding='utf-8') as f:
|
||||
f.write(log_content)
|
||||
|
||||
safe_print(f" [日志] AI分析请求已保存到: {log_filepath}")
|
||||
|
||||
except Exception as e:
|
||||
safe_print(f" [日志] 保存AI分析日志时出错: {e}")
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
safe_print("---!!! AI RESPONSE PARSING FAILED (JSONDecodeError) !!!---")
|
||||
safe_print(f"原始返回值 (Raw response from AI):\n---\n{ai_response_content}\n---")
|
||||
raise e
|
||||
# 增强的AI调用,包含更严格的格式控制和重试机制
|
||||
max_retries = 3
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
# 根据重试次数调整参数
|
||||
current_temperature = 0.1 if attempt == 0 else 0.05 # 重试时使用更低的温度
|
||||
|
||||
response = await client.chat.completions.create(
|
||||
model=MODEL_NAME,
|
||||
messages=messages,
|
||||
response_format={"type": "json_object"},
|
||||
temperature=current_temperature,
|
||||
max_tokens=4000,
|
||||
)
|
||||
|
||||
ai_response_content = response.choices[0].message.content
|
||||
|
||||
if AI_DEBUG_MODE:
|
||||
safe_print(f"\n--- [AI DEBUG] 第{attempt + 1}次尝试 ---")
|
||||
safe_print("--- RAW AI RESPONSE ---")
|
||||
safe_print(ai_response_content)
|
||||
safe_print("---------------------\n")
|
||||
|
||||
# 尝试直接解析JSON
|
||||
try:
|
||||
parsed_response = json.loads(ai_response_content)
|
||||
|
||||
# 验证响应格式
|
||||
if validate_ai_response_format(parsed_response):
|
||||
safe_print(f" [AI分析] 第{attempt + 1}次尝试成功,响应格式验证通过")
|
||||
return parsed_response
|
||||
else:
|
||||
safe_print(f" [AI分析] 第{attempt + 1}次尝试格式验证失败")
|
||||
if attempt < max_retries - 1:
|
||||
safe_print(f" [AI分析] 准备第{attempt + 2}次重试...")
|
||||
continue
|
||||
else:
|
||||
safe_print(" [AI分析] 所有重试完成,使用最后一次结果")
|
||||
return parsed_response
|
||||
|
||||
except json.JSONDecodeError:
|
||||
safe_print(f" [AI分析] 第{attempt + 1}次尝试JSON解析失败,尝试清理响应内容...")
|
||||
|
||||
# 清理可能的Markdown代码块标记
|
||||
cleaned_content = ai_response_content.strip()
|
||||
if cleaned_content.startswith('```json'):
|
||||
cleaned_content = cleaned_content[7:]
|
||||
if cleaned_content.startswith('```'):
|
||||
cleaned_content = cleaned_content[3:]
|
||||
if cleaned_content.endswith('```'):
|
||||
cleaned_content = cleaned_content[:-3]
|
||||
cleaned_content = cleaned_content.strip()
|
||||
|
||||
# 寻找JSON对象边界
|
||||
json_start_index = cleaned_content.find('{')
|
||||
json_end_index = cleaned_content.rfind('}')
|
||||
|
||||
if json_start_index != -1 and json_end_index != -1 and json_end_index > json_start_index:
|
||||
json_str = cleaned_content[json_start_index:json_end_index + 1]
|
||||
try:
|
||||
parsed_response = json.loads(json_str)
|
||||
if validate_ai_response_format(parsed_response):
|
||||
safe_print(f" [AI分析] 第{attempt + 1}次尝试清理后成功")
|
||||
return parsed_response
|
||||
else:
|
||||
if attempt < max_retries - 1:
|
||||
safe_print(f" [AI分析] 准备第{attempt + 2}次重试...")
|
||||
continue
|
||||
else:
|
||||
safe_print(" [AI分析] 所有重试完成,使用清理后的结果")
|
||||
return parsed_response
|
||||
except json.JSONDecodeError as e:
|
||||
safe_print(f" [AI分析] 第{attempt + 1}次尝试清理后JSON解析仍然失败: {e}")
|
||||
if attempt < max_retries - 1:
|
||||
safe_print(f" [AI分析] 准备第{attempt + 2}次重试...")
|
||||
continue
|
||||
else:
|
||||
raise e
|
||||
else:
|
||||
safe_print(f" [AI分析] 第{attempt + 1}次尝试无法在响应中找到有效的JSON对象")
|
||||
if attempt < max_retries - 1:
|
||||
safe_print(f" [AI分析] 准备第{attempt + 2}次重试...")
|
||||
continue
|
||||
else:
|
||||
raise json.JSONDecodeError("No valid JSON object found", ai_response_content, 0)
|
||||
|
||||
except Exception as e:
|
||||
safe_print(f" [AI分析] 第{attempt + 1}次尝试AI调用失败: {e}")
|
||||
if attempt < max_retries - 1:
|
||||
safe_print(f" [AI分析] 准备第{attempt + 2}次重试...")
|
||||
continue
|
||||
else:
|
||||
raise e
|
||||
|
||||
@@ -22,16 +22,8 @@ DETAIL_API_URL_PATTERN = "h5api.m.goofish.com/h5/mtop.taobao.idle.pc.detail"
|
||||
# --- Environment Variables ---
|
||||
API_KEY = os.getenv("OPENAI_API_KEY")
|
||||
BASE_URL = os.getenv("OPENAI_BASE_URL")
|
||||
# 清理BASE_URL中的不可打印字符
|
||||
if BASE_URL:
|
||||
# 移除常见的不可打印字符,包括回车符(\r)、换行符(\n)、制表符(\t)等
|
||||
BASE_URL = ''.join(char for char in BASE_URL if char.isprintable()).strip()
|
||||
MODEL_NAME = os.getenv("OPENAI_MODEL_NAME")
|
||||
PROXY_URL = os.getenv("PROXY_URL")
|
||||
# 清理PROXY_URL中的不可打印字符
|
||||
if PROXY_URL:
|
||||
# 移除常见的不可打印字符,包括回车符(\r)、换行符(\n)、制表符(\t)等
|
||||
PROXY_URL = ''.join(char for char in PROXY_URL if char.isprintable()).strip()
|
||||
NTFY_TOPIC_URL = os.getenv("NTFY_TOPIC_URL")
|
||||
GOTIFY_URL = os.getenv("GOTIFY_URL")
|
||||
GOTIFY_TOKEN = os.getenv("GOTIFY_TOKEN")
|
||||
|
||||
136
web_server.py
136
web_server.py
@@ -6,9 +6,11 @@ import glob
|
||||
import asyncio
|
||||
import signal
|
||||
import sys
|
||||
import base64
|
||||
from contextlib import asynccontextmanager
|
||||
from dotenv import dotenv_values
|
||||
from fastapi import FastAPI, Request, HTTPException
|
||||
from fastapi import FastAPI, Request, HTTPException, Depends, status
|
||||
from fastapi.security import HTTPBasic, HTTPBasicCredentials
|
||||
from src.prompt_utils import generate_criteria, update_config_with_new_task
|
||||
from fastapi.responses import HTMLResponse, JSONResponse
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
@@ -176,12 +178,88 @@ def save_notification_settings(settings: dict):
|
||||
|
||||
app = FastAPI(title="闲鱼智能监控机器人", lifespan=lifespan)
|
||||
|
||||
# --- 认证配置 ---
|
||||
security = HTTPBasic()
|
||||
|
||||
# 从环境变量读取认证凭据
|
||||
def get_auth_credentials():
|
||||
"""从环境变量获取认证凭据"""
|
||||
username = os.getenv("WEB_USERNAME", "admin")
|
||||
password = os.getenv("WEB_PASSWORD", "admin123")
|
||||
return username, password
|
||||
|
||||
def verify_credentials(credentials: HTTPBasicCredentials = Depends(security)):
|
||||
"""验证Basic认证凭据"""
|
||||
username, password = get_auth_credentials()
|
||||
|
||||
# 检查用户名和密码是否匹配
|
||||
if credentials.username == username and credentials.password == password:
|
||||
return credentials.username
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="认证失败",
|
||||
headers={"WWW-Authenticate": "Basic"},
|
||||
)
|
||||
|
||||
# --- Globals for process and scheduler management ---
|
||||
scraper_processes = {} # 将单个进程变量改为字典,以管理多个任务进程 {task_id: process}
|
||||
scheduler = AsyncIOScheduler(timezone="Asia/Shanghai")
|
||||
|
||||
# Mount static files
|
||||
app.mount("/static", StaticFiles(directory="static"), name="static")
|
||||
# 自定义静态文件处理器,添加认证
|
||||
class AuthenticatedStaticFiles(StaticFiles):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
|
||||
async def __call__(self, scope, receive, send):
|
||||
# 检查认证
|
||||
headers = dict(scope.get("headers", []))
|
||||
authorization = headers.get(b"authorization", b"").decode()
|
||||
|
||||
if not authorization.startswith("Basic "):
|
||||
await send({
|
||||
"type": "http.response.start",
|
||||
"status": 401,
|
||||
"headers": [
|
||||
(b"www-authenticate", b"Basic realm=Authorization Required"),
|
||||
(b"content-type", b"text/plain"),
|
||||
],
|
||||
})
|
||||
await send({
|
||||
"type": "http.response.body",
|
||||
"body": b"Authentication required",
|
||||
})
|
||||
return
|
||||
|
||||
# 验证凭据
|
||||
try:
|
||||
credentials = base64.b64decode(authorization[6:]).decode()
|
||||
username, password = credentials.split(":", 1)
|
||||
|
||||
expected_username, expected_password = get_auth_credentials()
|
||||
if username != expected_username or password != expected_password:
|
||||
raise ValueError("Invalid credentials")
|
||||
|
||||
except Exception:
|
||||
await send({
|
||||
"type": "http.response.start",
|
||||
"status": 401,
|
||||
"headers": [
|
||||
(b"www-authenticate", b"Basic realm=Authorization Required"),
|
||||
(b"content-type", b"text/plain"),
|
||||
],
|
||||
})
|
||||
await send({
|
||||
"type": "http.response.body",
|
||||
"body": b"Authentication failed",
|
||||
})
|
||||
return
|
||||
|
||||
# 认证成功,继续处理静态文件
|
||||
await super().__call__(scope, receive, send)
|
||||
|
||||
# Mount static files with authentication
|
||||
app.mount("/static", AuthenticatedStaticFiles(directory="static"), name="static")
|
||||
|
||||
# Setup templates
|
||||
templates = Jinja2Templates(directory="templates")
|
||||
@@ -300,8 +378,18 @@ async def reload_scheduler_jobs():
|
||||
scheduler.print_jobs()
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health_check():
|
||||
"""健康检查端点,不需要认证"""
|
||||
return {"status": "healthy", "message": "服务正常运行"}
|
||||
|
||||
@app.get("/auth/status")
|
||||
async def auth_status(username: str = Depends(verify_credentials)):
|
||||
"""检查认证状态"""
|
||||
return {"authenticated": True, "username": username}
|
||||
|
||||
@app.get("/", response_class=HTMLResponse)
|
||||
async def read_root(request: Request):
|
||||
async def read_root(request: Request, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
提供 Web UI 的主页面。
|
||||
"""
|
||||
@@ -312,7 +400,7 @@ async def read_root(request: Request):
|
||||
CONFIG_FILE = "config.json"
|
||||
|
||||
@app.get("/api/tasks")
|
||||
async def get_tasks():
|
||||
async def get_tasks(username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
读取并返回 config.json 中的所有任务。
|
||||
"""
|
||||
@@ -333,7 +421,7 @@ async def get_tasks():
|
||||
|
||||
|
||||
@app.post("/api/tasks/generate", response_model=dict)
|
||||
async def generate_task(req: TaskGenerateRequest):
|
||||
async def generate_task(req: TaskGenerateRequest, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
使用 AI 生成一个新的分析标准文件,并据此创建一个新任务。
|
||||
"""
|
||||
@@ -399,7 +487,7 @@ async def generate_task(req: TaskGenerateRequest):
|
||||
|
||||
|
||||
@app.post("/api/tasks", response_model=dict)
|
||||
async def create_task(task: Task):
|
||||
async def create_task(task: Task, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
创建一个新任务并将其添加到 config.json。
|
||||
"""
|
||||
@@ -426,7 +514,7 @@ async def create_task(task: Task):
|
||||
|
||||
|
||||
@app.patch("/api/tasks/{task_id}", response_model=dict)
|
||||
async def update_task(task_id: int, task_update: TaskUpdate):
|
||||
async def update_task(task_id: int, task_update: TaskUpdate, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
更新指定ID任务的属性。
|
||||
"""
|
||||
@@ -540,7 +628,7 @@ async def update_task_running_status(task_id: int, is_running: bool):
|
||||
|
||||
|
||||
@app.post("/api/tasks/start/{task_id}", response_model=dict)
|
||||
async def start_single_task(task_id: int):
|
||||
async def start_single_task(task_id: int, username: str = Depends(verify_credentials)):
|
||||
"""启动单个任务。"""
|
||||
try:
|
||||
async with aiofiles.open(CONFIG_FILE, 'r', encoding='utf-8') as f:
|
||||
@@ -559,7 +647,7 @@ async def start_single_task(task_id: int):
|
||||
|
||||
|
||||
@app.post("/api/tasks/stop/{task_id}", response_model=dict)
|
||||
async def stop_single_task(task_id: int):
|
||||
async def stop_single_task(task_id: int, username: str = Depends(verify_credentials)):
|
||||
"""停止单个任务。"""
|
||||
await stop_task_process(task_id)
|
||||
return {"message": f"任务ID {task_id} 已发送停止信号。"}
|
||||
@@ -568,7 +656,7 @@ async def stop_single_task(task_id: int):
|
||||
|
||||
|
||||
@app.get("/api/logs")
|
||||
async def get_logs(from_pos: int = 0):
|
||||
async def get_logs(from_pos: int = 0, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
获取爬虫日志文件的内容。支持从指定位置增量读取。
|
||||
"""
|
||||
@@ -607,7 +695,7 @@ async def get_logs(from_pos: int = 0):
|
||||
|
||||
|
||||
@app.delete("/api/logs", response_model=dict)
|
||||
async def clear_logs():
|
||||
async def clear_logs(username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
清空日志文件内容。
|
||||
"""
|
||||
@@ -625,7 +713,7 @@ async def clear_logs():
|
||||
|
||||
|
||||
@app.delete("/api/tasks/{task_id}", response_model=dict)
|
||||
async def delete_task(task_id: int):
|
||||
async def delete_task(task_id: int, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
从 config.json 中删除指定ID的任务。
|
||||
"""
|
||||
@@ -666,7 +754,7 @@ async def delete_task(task_id: int):
|
||||
|
||||
|
||||
@app.get("/api/results/files")
|
||||
async def list_result_files():
|
||||
async def list_result_files(username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
列出所有生成的 .jsonl 结果文件。
|
||||
"""
|
||||
@@ -678,7 +766,7 @@ async def list_result_files():
|
||||
|
||||
|
||||
@app.delete("/api/results/files/{filename}", response_model=dict)
|
||||
async def delete_result_file(filename: str):
|
||||
async def delete_result_file(filename: str, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
删除指定的结果文件。
|
||||
"""
|
||||
@@ -697,7 +785,7 @@ async def delete_result_file(filename: str):
|
||||
|
||||
|
||||
@app.get("/api/results/{filename}")
|
||||
async def get_result_file_content(filename: str, page: int = 1, limit: int = 20, recommended_only: bool = False, sort_by: str = "crawl_time", sort_order: str = "desc"):
|
||||
async def get_result_file_content(filename: str, page: int = 1, limit: int = 20, recommended_only: bool = False, sort_by: str = "crawl_time", sort_order: str = "desc", username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
读取指定的 .jsonl 文件内容,支持分页、筛选和排序。
|
||||
"""
|
||||
@@ -756,7 +844,7 @@ async def get_result_file_content(filename: str, page: int = 1, limit: int = 20,
|
||||
|
||||
|
||||
@app.get("/api/settings/status")
|
||||
async def get_system_status():
|
||||
async def get_system_status(username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
检查系统关键文件和配置的状态。
|
||||
"""
|
||||
@@ -798,7 +886,7 @@ async def get_system_status():
|
||||
PROMPTS_DIR = "prompts"
|
||||
|
||||
@app.get("/api/prompts")
|
||||
async def list_prompts():
|
||||
async def list_prompts(username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
列出 prompts/ 目录下的所有 .txt 文件。
|
||||
"""
|
||||
@@ -808,7 +896,7 @@ async def list_prompts():
|
||||
|
||||
|
||||
@app.get("/api/prompts/{filename}")
|
||||
async def get_prompt_content(filename: str):
|
||||
async def get_prompt_content(filename: str, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
获取指定 prompt 文件的内容。
|
||||
"""
|
||||
@@ -825,7 +913,7 @@ async def get_prompt_content(filename: str):
|
||||
|
||||
|
||||
@app.put("/api/prompts/{filename}")
|
||||
async def update_prompt_content(filename: str, prompt_update: PromptUpdate):
|
||||
async def update_prompt_content(filename: str, prompt_update: PromptUpdate, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
更新指定 prompt 文件的内容。
|
||||
"""
|
||||
@@ -845,7 +933,7 @@ async def update_prompt_content(filename: str, prompt_update: PromptUpdate):
|
||||
|
||||
|
||||
@app.post("/api/login-state", response_model=dict)
|
||||
async def update_login_state(data: LoginStateUpdate):
|
||||
async def update_login_state(data: LoginStateUpdate, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
接收前端发送的登录状态JSON字符串,并保存到 xianyu_state.json。
|
||||
"""
|
||||
@@ -865,7 +953,7 @@ async def update_login_state(data: LoginStateUpdate):
|
||||
|
||||
|
||||
@app.delete("/api/login-state", response_model=dict)
|
||||
async def delete_login_state():
|
||||
async def delete_login_state(username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
删除 xianyu_state.json 文件。
|
||||
"""
|
||||
@@ -880,7 +968,7 @@ async def delete_login_state():
|
||||
|
||||
|
||||
@app.get("/api/settings/notifications", response_model=dict)
|
||||
async def get_notification_settings():
|
||||
async def get_notification_settings(username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
获取通知设置。
|
||||
"""
|
||||
@@ -888,7 +976,7 @@ async def get_notification_settings():
|
||||
|
||||
|
||||
@app.put("/api/settings/notifications", response_model=dict)
|
||||
async def update_notification_settings(settings: NotificationSettings):
|
||||
async def update_notification_settings(settings: NotificationSettings, username: str = Depends(verify_credentials)):
|
||||
"""
|
||||
更新通知设置。
|
||||
"""
|
||||
|
||||
Reference in New Issue
Block a user