mirror of
https://github.com/fish2018/pansou.git
synced 2025-11-25 03:14:59 +08:00
新增插件xiaoji
This commit is contained in:
1
main.go
1
main.go
@@ -63,6 +63,7 @@ import (
|
||||
_ "pansou/plugin/clxiong"
|
||||
_ "pansou/plugin/jutoushe"
|
||||
_ "pansou/plugin/sdso"
|
||||
_ "pansou/plugin/xiaoji"
|
||||
)
|
||||
|
||||
// 全局缓存写入管理器
|
||||
|
||||
270
plugin/xiaoji/html结构分析.md
Normal file
270
plugin/xiaoji/html结构分析.md
Normal file
@@ -0,0 +1,270 @@
|
||||
# 小鸡影视 (xiaojitv.com) 搜索结果HTML结构分析
|
||||
|
||||
## 网站信息
|
||||
|
||||
- **网站名称**: 小鸡影视
|
||||
- **域名**: `www.xiaojitv.com`
|
||||
- **搜索URL格式**: `https://www.xiaojitv.com/?s={关键词}`
|
||||
- **详情页URL格式**: `https://www.xiaojitv.com/{ID}.html`
|
||||
- **主要特点**: 影视资源站,提供多种网盘链接,使用base64编码保护真实链接
|
||||
|
||||
## HTML结构
|
||||
|
||||
### 搜索结果页面结构
|
||||
|
||||
搜索结果页面使用poster布局,主要内容位于`.poster-grid`元素内:
|
||||
|
||||
```html
|
||||
<div class="poster-grid">
|
||||
<article class="poster-item excerpt-1">
|
||||
<!-- 单个搜索结果 -->
|
||||
</article>
|
||||
<article class="poster-item excerpt-2">
|
||||
<!-- 单个搜索结果 -->
|
||||
</article>
|
||||
<!-- 更多搜索结果... -->
|
||||
</div>
|
||||
```
|
||||
|
||||
### 单个搜索结果结构
|
||||
|
||||
每个搜索结果包含以下主要元素:
|
||||
|
||||
#### 1. 封面图片和详情页链接
|
||||
|
||||
```html
|
||||
<div class="poster-image">
|
||||
<a class="poster-link" href="https://www.xiaojitv.com/656.html">
|
||||
<img src="https://www.xiaojitv.com/wp-content/uploads/2025/09/47a3352110f7e36.webp"
|
||||
alt="凡人修仙传(2020) | 小鸡影视"
|
||||
class="thumb">
|
||||
</a>
|
||||
<div class="poster-top-left"></div>
|
||||
<div class="poster-rating poster-top-right">
|
||||
<span class="rating-score">7.9</span>
|
||||
</div>
|
||||
<div class="poster-category poster-bottom-left">
|
||||
<a href="https://www.xiaojitv.com/dongman">动漫</a>
|
||||
</div>
|
||||
<div class="poster-views">阅读(<span class="ajaxlistpv" data-id="656"></span>)</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
#### 2. 标题和标签信息
|
||||
|
||||
```html
|
||||
<div class="poster-content">
|
||||
<h2 class="poster-title">
|
||||
<a href="https://www.xiaojitv.com/656.html" title="凡人修仙传(2020) | 小鸡影视">
|
||||
凡人修仙传(2020)
|
||||
</a>
|
||||
</h2>
|
||||
<div class="poster-tags">
|
||||
<a href="https://www.xiaojitv.com/tag/2020年">2020年</a> /
|
||||
<a href="https://www.xiaojitv.com/tag/7-9分">7.9分</a> /
|
||||
<a href="https://www.xiaojitv.com/tag/中国大陆">中国大陆</a> /
|
||||
<a href="https://www.xiaojitv.com/tag/动画">动画</a> /
|
||||
<a href="https://www.xiaojitv.com/tag/奇幻">奇幻</a>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
## 详情页面结构
|
||||
|
||||
详情页面包含完整的影片信息和网盘下载链接。
|
||||
|
||||
### 1. 页面标题和基本信息
|
||||
|
||||
```html
|
||||
<h1 class="article-title">
|
||||
<a href="https://www.xiaojitv.com/656.html">凡人修仙传(2020)</a>
|
||||
</h1>
|
||||
```
|
||||
|
||||
### 2. 相关资源区域 ⭐ 重要
|
||||
|
||||
网盘下载链接位于相关资源区域,这是xiaoji插件的核心提取目标:
|
||||
|
||||
```html
|
||||
<div class="cloud-search-resource-results" data-post-id="656">
|
||||
<div class="cloud-search-resource-header">
|
||||
<h3>相关资源</h3>
|
||||
<!-- 操作按钮 -->
|
||||
</div>
|
||||
|
||||
<!-- 资源列表 -->
|
||||
<div class="resource-compact-item">
|
||||
<div class="resource-compact-link">
|
||||
<a href="https://www.xiaojitv.com/go.html?url=aHR0cHM6Ly9wYW4ucXVhcmsuY24vcy9kNjQ5MWJmZWQxNmI="
|
||||
target="_blank" rel="nofollow">
|
||||
凡人修仙传 2024 4K 持续更新中
|
||||
</a>
|
||||
</div>
|
||||
<div class="resource-compact-info">
|
||||
<span class="resource-compact-source">聚合盘</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="resource-compact-item">
|
||||
<div class="resource-compact-link">
|
||||
<a href="https://www.xiaojitv.com/go.html?url=aHR0cHM6Ly9jbG91ZC4xODkuY24vdC9JYmFVVnpFN1puZXk="
|
||||
target="_blank" rel="nofollow">
|
||||
凡人修仙传 2024 4K 持续更新中 txb
|
||||
</a>
|
||||
</div>
|
||||
<div class="resource-compact-info">
|
||||
<span class="resource-compact-source">小愛盘②</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 更多资源... -->
|
||||
</div>
|
||||
```
|
||||
|
||||
### 3. Base64编码链接解析 🔑 关键特性
|
||||
|
||||
xiaoji网站使用特殊的链接保护机制:
|
||||
|
||||
**原始链接格式**:
|
||||
```
|
||||
https://www.xiaojitv.com/go.html?url=aHR0cHM6Ly9wYW4ucXVhcmsuY24vcy9kNjQ5MWJmZWQxNmI=
|
||||
```
|
||||
|
||||
**提取步骤**:
|
||||
1. 提取URL参数中的base64字符串:`aHR0cHM6Ly9wYW4ucXVhcmsuY24vcy9kNjQ5MWJmZWQxNmI=`
|
||||
2. 进行base64解码:`https://pan.quark.cn/s/d6491bfed16b`
|
||||
3. 得到真实的网盘链接
|
||||
|
||||
## 提取逻辑
|
||||
|
||||
### 搜索结果页面提取逻辑
|
||||
|
||||
1. 定位所有的`article.poster-item`元素
|
||||
2. 对于每个元素:
|
||||
- 从`.poster-link`的`href`属性提取详情页链接
|
||||
- 从链接中提取资源ID(正则:`/(\d+)\.html`)
|
||||
- 从`.poster-title a`提取标题
|
||||
- 从`.poster-rating .rating-score`提取评分
|
||||
- 从`.poster-category a`提取分类
|
||||
- 从`.poster-image img`的`src`属性提取封面图片URL
|
||||
- 从`.poster-tags a`提取标签信息
|
||||
|
||||
### 详情页面提取逻辑
|
||||
|
||||
1. 获取资源基本信息:
|
||||
- 标题:`.article-title a`的文本内容
|
||||
- 资源ID:从URL中提取
|
||||
|
||||
2. 提取网盘链接 ⭐ 核心逻辑:
|
||||
```go
|
||||
// 1. 查找所有资源链接
|
||||
doc.Find(".resource-compact-link a").Each(func(i int, s *goquery.Selection) {
|
||||
href, exists := s.Attr("href")
|
||||
if !exists {
|
||||
return
|
||||
}
|
||||
|
||||
var realURL string
|
||||
|
||||
// 2. 检查链接类型并处理
|
||||
if strings.Contains(href, "/go.html?url=") {
|
||||
// Base64编码链接,需要解码
|
||||
parts := strings.Split(href, "url=")
|
||||
if len(parts) == 2 {
|
||||
encoded := parts[1]
|
||||
decoded, err := base64.StdEncoding.DecodeString(encoded)
|
||||
if err == nil {
|
||||
realURL = string(decoded)
|
||||
}
|
||||
}
|
||||
} else if strings.HasPrefix(href, "http://") || strings.HasPrefix(href, "https://") ||
|
||||
strings.HasPrefix(href, "magnet:") || strings.HasPrefix(href, "ed2k://") {
|
||||
// 直接链接,无需解码
|
||||
realURL = href
|
||||
}
|
||||
|
||||
// 3. 处理有效链接
|
||||
if realURL != "" {
|
||||
link := model.Link{
|
||||
Type: determineCloudType(realURL),
|
||||
URL: realURL,
|
||||
Password: "", // xiaoji网站通常无密码
|
||||
}
|
||||
links = append(links, link)
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
3. 提取资源描述:
|
||||
- 资源名称:`.resource-compact-link a`的文本内容
|
||||
- 资源来源:`.resource-compact-source`的文本内容
|
||||
|
||||
## 支持的网盘类型
|
||||
|
||||
根据分析,xiaoji网站支持多种网盘类型:
|
||||
|
||||
- **夸克网盘**: `https://pan.quark.cn/s/xxxxx`
|
||||
- **天翼云盘**: `https://cloud.189.cn/t/xxxxx`
|
||||
- **阿里云盘**: `https://www.alipan.com/s/xxxxx`
|
||||
- **百度网盘**: `https://pan.baidu.com/s/xxxxx`
|
||||
- **115网盘**: `https://115.com/s/xxxxx`、`https://115cdn.com/s/xxxxx`
|
||||
- **城通网盘**: `https://url91.ctfile.com/f/xxxxx` (归类到others)
|
||||
- **磁力链接**: `magnet:?xt=urn:btih:xxxxx`
|
||||
- **ED2K链接**: `ed2k://xxxxx`
|
||||
|
||||
## 重要发现和注意事项
|
||||
|
||||
### 1. Base64编码保护 🔐
|
||||
|
||||
网站使用base64编码保护真实的网盘链接,这是xiaoji插件的最大特点:
|
||||
- 所有网盘链接都经过base64编码
|
||||
- 链接格式:`/go.html?url={base64字符串}`
|
||||
- 必须解码才能获得真实链接
|
||||
|
||||
### 2. 搜索结果布局
|
||||
|
||||
使用现代的poster布局,与传统的列表布局不同:
|
||||
- 使用CSS Grid布局
|
||||
- 每个结果都有封面图片
|
||||
- 包含评分和分类信息
|
||||
|
||||
### 3. 动态加载
|
||||
|
||||
页面可能使用了AJAX动态加载:
|
||||
- 某些内容可能需要等待JavaScript执行
|
||||
- 建议在请求时设置适当的User-Agent
|
||||
|
||||
### 4. 反爬虫措施
|
||||
|
||||
网站可能有一定的反爬虫措施:
|
||||
- 需要设置完整的浏览器请求头
|
||||
- 可能需要处理JavaScript渲染的内容
|
||||
|
||||
## 提取字段映射
|
||||
|
||||
| 字段 | HTML位置 | 提取方法 |
|
||||
|------|----------|----------|
|
||||
| 标题 | `.poster-title a` | 文本内容 |
|
||||
| 详情页链接 | `.poster-link` | href属性 |
|
||||
| 资源ID | 详情页URL | 正则提取 |
|
||||
| 封面图片 | `.poster-image img` | src属性 |
|
||||
| 评分 | `.rating-score` | 文本内容 |
|
||||
| 分类 | `.poster-category a` | 文本内容 |
|
||||
| 标签 | `.poster-tags a` | 文本内容数组 |
|
||||
| 网盘链接 | `.resource-compact-link a` | href属性(需base64解码) |
|
||||
| 资源描述 | `.resource-compact-link a` | 文本内容 |
|
||||
| 资源来源 | `.resource-compact-source` | 文本内容 |
|
||||
|
||||
## 实现优先级
|
||||
|
||||
1. **高优先级**: xiaoji是影视资源站,质量较好,建议设置为优先级2
|
||||
2. **Service层过滤**: 使用标准的Service层过滤,不跳过
|
||||
3. **缓存策略**: 建议设置合理的缓存时间,避免频繁请求
|
||||
|
||||
## 开发注意事项
|
||||
|
||||
1. **Base64解码**: 必须实现base64解码逻辑
|
||||
2. **网盘类型识别**: 使用系统自带的`determineCloudType`函数
|
||||
3. **错误处理**: 处理base64解码失败的情况
|
||||
4. **链接去重**: 避免重复的网盘链接
|
||||
5. **请求头设置**: 使用完整的浏览器请求头避免被拦截
|
||||
482
plugin/xiaoji/xiaoji.go
Normal file
482
plugin/xiaoji/xiaoji.go
Normal file
@@ -0,0 +1,482 @@
|
||||
package xiaoji
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/base64"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"pansou/model"
|
||||
"pansou/plugin"
|
||||
"regexp"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/PuerkitoBio/goquery"
|
||||
)
|
||||
|
||||
// 预编译的正则表达式
|
||||
var (
|
||||
// 从详情页URL中提取ID的正则表达式
|
||||
detailIDRegex = regexp.MustCompile(`/(\d+)\.html`)
|
||||
|
||||
// go.html链接的正则表达式,用于提取base64编码部分
|
||||
goLinkRegex = regexp.MustCompile(`/go\.html\?url=([A-Za-z0-9+/]+=*)`)
|
||||
|
||||
// 年份提取正则表达式
|
||||
yearRegex = regexp.MustCompile(`(\d{4})`)
|
||||
|
||||
// 缓存相关
|
||||
detailCache = sync.Map{} // 缓存详情页解析结果
|
||||
lastCleanupTime = time.Now()
|
||||
cacheTTL = 1 * time.Hour
|
||||
)
|
||||
|
||||
const (
|
||||
// 基础配置
|
||||
pluginName = "xiaoji"
|
||||
baseURL = "https://www.xiaojitv.com"
|
||||
|
||||
// 超时时间配置
|
||||
DefaultTimeout = 10 * time.Second
|
||||
DetailTimeout = 8 * time.Second
|
||||
|
||||
// 并发数配置
|
||||
MaxConcurrency = 15
|
||||
|
||||
// HTTP连接池配置
|
||||
MaxIdleConns = 100
|
||||
MaxIdleConnsPerHost = 30
|
||||
MaxConnsPerHost = 50
|
||||
IdleConnTimeout = 90 * time.Second
|
||||
)
|
||||
|
||||
// 在init函数中注册插件
|
||||
func init() {
|
||||
plugin.RegisterGlobalPlugin(NewXiaojiPlugin())
|
||||
|
||||
// 启动缓存清理goroutine
|
||||
go startCacheCleaner()
|
||||
}
|
||||
|
||||
// startCacheCleaner 启动一个定期清理缓存的goroutine
|
||||
func startCacheCleaner() {
|
||||
ticker := time.NewTicker(30 * time.Minute)
|
||||
defer ticker.Stop()
|
||||
|
||||
for range ticker.C {
|
||||
// 清空所有缓存
|
||||
detailCache = sync.Map{}
|
||||
lastCleanupTime = time.Now()
|
||||
}
|
||||
}
|
||||
|
||||
// XiaojiAsyncPlugin 小鸡影视异步插件
|
||||
type XiaojiAsyncPlugin struct {
|
||||
*plugin.BaseAsyncPlugin
|
||||
optimizedClient *http.Client
|
||||
}
|
||||
|
||||
// createOptimizedHTTPClient 创建优化的HTTP客户端
|
||||
func createOptimizedHTTPClient() *http.Client {
|
||||
transport := &http.Transport{
|
||||
MaxIdleConns: MaxIdleConns,
|
||||
MaxIdleConnsPerHost: MaxIdleConnsPerHost,
|
||||
MaxConnsPerHost: MaxConnsPerHost,
|
||||
IdleConnTimeout: IdleConnTimeout,
|
||||
DisableKeepAlives: false,
|
||||
ForceAttemptHTTP2: true,
|
||||
}
|
||||
return &http.Client{Transport: transport, Timeout: DefaultTimeout}
|
||||
}
|
||||
|
||||
// NewXiaojiPlugin 创建新的小鸡影视异步插件
|
||||
func NewXiaojiPlugin() *XiaojiAsyncPlugin {
|
||||
return &XiaojiAsyncPlugin{
|
||||
BaseAsyncPlugin: plugin.NewBaseAsyncPlugin(pluginName, 3),
|
||||
optimizedClient: createOptimizedHTTPClient(),
|
||||
}
|
||||
}
|
||||
|
||||
// Search 兼容性方法,实际调用SearchWithResult
|
||||
func (p *XiaojiAsyncPlugin) Search(keyword string, ext map[string]interface{}) ([]model.SearchResult, error) {
|
||||
result, err := p.SearchWithResult(keyword, ext)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return result.Results, nil
|
||||
}
|
||||
|
||||
// SearchWithResult 执行搜索并返回包含IsFinal标记的结果
|
||||
func (p *XiaojiAsyncPlugin) SearchWithResult(keyword string, ext map[string]interface{}) (model.PluginSearchResult, error) {
|
||||
return p.AsyncSearchWithResult(keyword, p.searchImpl, p.MainCacheKey, ext)
|
||||
}
|
||||
|
||||
// searchImpl 具体的搜索实现
|
||||
func (p *XiaojiAsyncPlugin) searchImpl(client *http.Client, keyword string, ext map[string]interface{}) ([]model.SearchResult, error) {
|
||||
// 1. 构建搜索URL
|
||||
encodedKeyword := url.QueryEscape(keyword)
|
||||
searchURL := fmt.Sprintf("%s/?s=%s", baseURL, encodedKeyword)
|
||||
|
||||
// 2. 创建带超时的上下文
|
||||
ctx, cancel := context.WithTimeout(context.Background(), DefaultTimeout)
|
||||
defer cancel()
|
||||
|
||||
// 3. 创建请求
|
||||
req, err := http.NewRequestWithContext(ctx, "GET", searchURL, nil)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("[%s] 创建请求失败: %w", pluginName, err)
|
||||
}
|
||||
|
||||
// 4. 设置请求头
|
||||
p.setRequestHeaders(req)
|
||||
|
||||
// 5. 发送请求
|
||||
resp, err := p.doRequestWithRetry(req, client)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("[%s] 搜索请求失败: %w", pluginName, err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
// 6. 检查状态码
|
||||
if resp.StatusCode != 200 {
|
||||
return nil, fmt.Errorf("[%s] 请求返回状态码: %d", pluginName, resp.StatusCode)
|
||||
}
|
||||
|
||||
// 7. 解析HTML
|
||||
doc, err := goquery.NewDocumentFromReader(resp.Body)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("[%s] HTML解析失败: %w", pluginName, err)
|
||||
}
|
||||
|
||||
// 8. 解析搜索结果
|
||||
results := p.parseSearchResults(doc, keyword)
|
||||
|
||||
// 9. 关键词过滤
|
||||
return plugin.FilterResultsByKeyword(results, keyword), nil
|
||||
}
|
||||
|
||||
// setRequestHeaders 设置HTTP请求头,模拟真实浏览器
|
||||
func (p *XiaojiAsyncPlugin) setRequestHeaders(req *http.Request) {
|
||||
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
|
||||
req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
|
||||
req.Header.Set("Accept-Language", "zh-CN,zh;q=0.9,en;q=0.8")
|
||||
req.Header.Set("Connection", "keep-alive")
|
||||
req.Header.Set("Referer", baseURL+"/")
|
||||
req.Header.Set("Cache-Control", "max-age=0")
|
||||
req.Header.Set("Upgrade-Insecure-Requests", "1")
|
||||
}
|
||||
|
||||
// doRequestWithRetry 带重试机制的HTTP请求
|
||||
func (p *XiaojiAsyncPlugin) doRequestWithRetry(req *http.Request, client *http.Client) (*http.Response, error) {
|
||||
maxRetries := 3
|
||||
var lastErr error
|
||||
|
||||
for i := 0; i < maxRetries; i++ {
|
||||
if i > 0 {
|
||||
// 指数退避重试
|
||||
backoff := time.Duration(1<<uint(i-1)) * 200 * time.Millisecond
|
||||
time.Sleep(backoff)
|
||||
}
|
||||
|
||||
// 克隆请求避免并发问题
|
||||
reqClone := req.Clone(req.Context())
|
||||
|
||||
resp, err := client.Do(reqClone)
|
||||
if err == nil && resp.StatusCode == 200 {
|
||||
return resp, nil
|
||||
}
|
||||
|
||||
if resp != nil {
|
||||
resp.Body.Close()
|
||||
}
|
||||
lastErr = err
|
||||
}
|
||||
|
||||
return nil, fmt.Errorf("重试 %d 次后仍然失败: %w", maxRetries, lastErr)
|
||||
}
|
||||
|
||||
// parseSearchResults 解析搜索结果
|
||||
func (p *XiaojiAsyncPlugin) parseSearchResults(doc *goquery.Document, keyword string) []model.SearchResult {
|
||||
results := make([]model.SearchResult, 0)
|
||||
|
||||
// 查找所有搜索结果项
|
||||
doc.Find("article.poster-item").Each(func(i int, s *goquery.Selection) {
|
||||
result := p.parseSearchResultItem(s, keyword)
|
||||
if result != nil {
|
||||
results = append(results, *result)
|
||||
}
|
||||
})
|
||||
|
||||
return results
|
||||
}
|
||||
|
||||
// parseSearchResultItem 解析单个搜索结果项
|
||||
func (p *XiaojiAsyncPlugin) parseSearchResultItem(s *goquery.Selection, keyword string) *model.SearchResult {
|
||||
// 1. 提取详情页链接
|
||||
detailLink, exists := s.Find(".poster-link").Attr("href")
|
||||
if !exists || detailLink == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
// 2. 确保链接是绝对路径
|
||||
if strings.HasPrefix(detailLink, "/") {
|
||||
detailLink = baseURL + detailLink
|
||||
}
|
||||
|
||||
// 3. 提取资源ID
|
||||
matches := detailIDRegex.FindStringSubmatch(detailLink)
|
||||
if len(matches) < 2 {
|
||||
return nil
|
||||
}
|
||||
resourceID := matches[1]
|
||||
|
||||
// 4. 提取标题
|
||||
title := strings.TrimSpace(s.Find(".poster-title a").Text())
|
||||
if title == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
// 5. 提取评分
|
||||
rating := strings.TrimSpace(s.Find(".rating-score").Text())
|
||||
|
||||
// 6. 提取分类
|
||||
category := strings.TrimSpace(s.Find(".poster-category a").Text())
|
||||
|
||||
// 7. 提取标签
|
||||
var tags []string
|
||||
s.Find(".poster-tags a").Each(func(i int, tagSel *goquery.Selection) {
|
||||
tag := strings.TrimSpace(tagSel.Text())
|
||||
if tag != "" {
|
||||
tags = append(tags, tag)
|
||||
}
|
||||
})
|
||||
|
||||
// 8. 提取封面图片
|
||||
coverImg, _ := s.Find(".poster-image img").Attr("src")
|
||||
|
||||
// 9. 构建基础信息
|
||||
content := fmt.Sprintf("分类: %s", category)
|
||||
if rating != "" {
|
||||
content += fmt.Sprintf(" | 评分: %s", rating)
|
||||
}
|
||||
if len(tags) > 0 {
|
||||
content += fmt.Sprintf(" | 标签: %s", strings.Join(tags, ", "))
|
||||
}
|
||||
|
||||
// 10. 获取详情页的下载链接
|
||||
links := p.fetchDetailPageLinks(detailLink)
|
||||
|
||||
// 11. 创建搜索结果
|
||||
result := &model.SearchResult{
|
||||
UniqueID: fmt.Sprintf("%s-%s", pluginName, resourceID),
|
||||
Title: title,
|
||||
Content: content,
|
||||
Datetime: time.Now(),
|
||||
Tags: tags,
|
||||
Links: links,
|
||||
Channel: "", // 插件搜索结果必须为空字符串
|
||||
}
|
||||
|
||||
// 12. 如果有封面图片,可以添加到额外信息中
|
||||
if coverImg != "" {
|
||||
// 这里可以扩展添加图片信息,当前版本暂不处理
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
// fetchDetailPageLinks 获取详情页的下载链接
|
||||
func (p *XiaojiAsyncPlugin) fetchDetailPageLinks(detailURL string) []model.Link {
|
||||
// 1. 检查缓存
|
||||
if cached, ok := detailCache.Load(detailURL); ok {
|
||||
if links, ok := cached.([]model.Link); ok {
|
||||
return links
|
||||
}
|
||||
}
|
||||
|
||||
// 2. 创建请求
|
||||
ctx, cancel := context.WithTimeout(context.Background(), DetailTimeout)
|
||||
defer cancel()
|
||||
|
||||
req, err := http.NewRequestWithContext(ctx, "GET", detailURL, nil)
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
// 3. 设置请求头
|
||||
p.setRequestHeaders(req)
|
||||
|
||||
// 4. 发送请求
|
||||
resp, err := p.doRequestWithRetry(req, p.optimizedClient)
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
// 5. 解析HTML
|
||||
doc, err := goquery.NewDocumentFromReader(resp.Body)
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
// 6. 提取下载链接
|
||||
links := p.parseDetailPageLinks(doc)
|
||||
|
||||
// 7. 缓存结果
|
||||
if len(links) > 0 {
|
||||
detailCache.Store(detailURL, links)
|
||||
}
|
||||
|
||||
return links
|
||||
}
|
||||
|
||||
// parseDetailPageLinks 解析详情页的下载链接
|
||||
func (p *XiaojiAsyncPlugin) parseDetailPageLinks(doc *goquery.Document) []model.Link {
|
||||
links := make([]model.Link, 0)
|
||||
seenLinks := make(map[string]bool) // 用于去重
|
||||
|
||||
// 查找相关资源区域的链接
|
||||
doc.Find(".resource-compact-link a").Each(func(i int, s *goquery.Selection) {
|
||||
href, exists := s.Attr("href")
|
||||
if !exists {
|
||||
return
|
||||
}
|
||||
|
||||
var realURL string
|
||||
|
||||
// 检查是否为go.html格式的链接(需要base64解码)
|
||||
if strings.Contains(href, "/go.html?url=") {
|
||||
// 提取并解码真实链接
|
||||
realURL = p.decodeGoLink(href)
|
||||
} else if strings.HasPrefix(href, "http://") || strings.HasPrefix(href, "https://") || strings.HasPrefix(href, "magnet:") || strings.HasPrefix(href, "ed2k://") {
|
||||
// 直接链接(包括磁力链接、网盘链接等)
|
||||
realURL = href
|
||||
}
|
||||
|
||||
// 处理有效链接
|
||||
if p.isValidURL(realURL) && !seenLinks[realURL] {
|
||||
// 确定网盘类型
|
||||
linkType := p.determineCloudType(realURL)
|
||||
|
||||
// 创建链接对象
|
||||
link := model.Link{
|
||||
Type: linkType,
|
||||
URL: realURL,
|
||||
Password: "", // xiaoji网站通常无密码
|
||||
}
|
||||
|
||||
links = append(links, link)
|
||||
seenLinks[realURL] = true
|
||||
}
|
||||
})
|
||||
|
||||
return links
|
||||
}
|
||||
|
||||
// decodeGoLink 解码go.html链接,提取真实的网盘链接
|
||||
func (p *XiaojiAsyncPlugin) decodeGoLink(goLink string) string {
|
||||
// 1. 提取base64编码部分
|
||||
matches := goLinkRegex.FindStringSubmatch(goLink)
|
||||
if len(matches) < 2 {
|
||||
return ""
|
||||
}
|
||||
|
||||
encoded := matches[1]
|
||||
|
||||
// 2. 清理编码字符串
|
||||
encoded = strings.TrimSpace(encoded)
|
||||
if encoded == "" {
|
||||
return ""
|
||||
}
|
||||
|
||||
// 3. Base64解码
|
||||
decoded, err := base64.StdEncoding.DecodeString(encoded)
|
||||
if err != nil {
|
||||
// 尝试处理可能的URL编码问题
|
||||
encoded = strings.ReplaceAll(encoded, " ", "+")
|
||||
// 尝试修复padding问题
|
||||
switch len(encoded) % 4 {
|
||||
case 2:
|
||||
encoded += "=="
|
||||
case 3:
|
||||
encoded += "="
|
||||
}
|
||||
decoded, err = base64.StdEncoding.DecodeString(encoded)
|
||||
if err != nil {
|
||||
return ""
|
||||
}
|
||||
}
|
||||
|
||||
realURL := strings.TrimSpace(string(decoded))
|
||||
|
||||
// 4. 验证解码结果是否为有效URL
|
||||
if p.isValidURL(realURL) {
|
||||
return realURL
|
||||
}
|
||||
|
||||
return ""
|
||||
}
|
||||
|
||||
// isValidURL 验证URL是否有效
|
||||
func (p *XiaojiAsyncPlugin) isValidURL(urlStr string) bool {
|
||||
if urlStr == "" {
|
||||
return false
|
||||
}
|
||||
|
||||
// 检查基本的URL格式
|
||||
if strings.HasPrefix(urlStr, "http://") || strings.HasPrefix(urlStr, "https://") {
|
||||
// HTTP/HTTPS链接需要有域名
|
||||
if len(urlStr) <= 8 || urlStr == "http://" || urlStr == "https://" {
|
||||
return false
|
||||
}
|
||||
// 简单检查是否包含域名
|
||||
return strings.Contains(urlStr[8:], ".")
|
||||
}
|
||||
|
||||
// 磁力链接
|
||||
if strings.HasPrefix(urlStr, "magnet:") {
|
||||
return len(urlStr) > 7 && strings.Contains(urlStr, "xt=")
|
||||
}
|
||||
|
||||
// ED2K链接
|
||||
if strings.HasPrefix(urlStr, "ed2k://") {
|
||||
return len(urlStr) > 7
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
|
||||
// determineCloudType 确定网盘类型
|
||||
func (p *XiaojiAsyncPlugin) determineCloudType(url string) string {
|
||||
switch {
|
||||
case strings.Contains(url, "pan.quark.cn"):
|
||||
return "quark"
|
||||
case strings.Contains(url, "drive.uc.cn"):
|
||||
return "uc"
|
||||
case strings.Contains(url, "pan.baidu.com"):
|
||||
return "baidu"
|
||||
case strings.Contains(url, "aliyundrive.com") || strings.Contains(url, "alipan.com"):
|
||||
return "aliyun"
|
||||
case strings.Contains(url, "pan.xunlei.com"):
|
||||
return "xunlei"
|
||||
case strings.Contains(url, "cloud.189.cn"):
|
||||
return "tianyi"
|
||||
case strings.Contains(url, "115.com") || strings.Contains(url, "115cdn.com"):
|
||||
return "115"
|
||||
case strings.Contains(url, "123pan.com"):
|
||||
return "123"
|
||||
case strings.Contains(url, "caiyun.139.com"):
|
||||
return "mobile"
|
||||
case strings.Contains(url, "mypikpak.com"):
|
||||
return "pikpak"
|
||||
case strings.Contains(url, "magnet:"):
|
||||
return "magnet"
|
||||
case strings.Contains(url, "ed2k://"):
|
||||
return "ed2k"
|
||||
default:
|
||||
// ctfile.com 和其他未知网盘都归类到 others
|
||||
return "others"
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user