Commit Graph

32 Commits

Author SHA1 Message Date
wizardchen
bff0e742fa fix: try fix ocr avx not support 2025-09-11 13:21:21 +08:00
wizardchen
6f6ca84dae feat(docreader): add health check 2025-09-10 20:22:14 +08:00
wizardchen
7cfae7e0d3 fix: pre fetch ocr models in docker container 2025-09-10 17:24:26 +08:00
wizardchen
19d2493afc fix: make file docker build not work 2025-09-10 15:13:12 +08:00
wizardchen
0e1d7edca3 fix: image parser concurrency error 2025-09-10 13:19:39 +08:00
wizardchen
7775559a9b feat: use paddle ocr v4 instead 2025-09-10 01:22:25 +08:00
wizardchen
2b6cbee1b6 feat: add aliyun rerank 2025-09-10 01:22:25 +08:00
begoniezhao
3f8a1d20c1 fix(docreader): update paddle version 2025-09-09 19:25:02 +08:00
Liwx
4489a4da7f Update base_parser.py 2025-09-08 14:58:37 +08:00
Liwx
202f353543 Update base_parser.py 2025-09-08 14:58:37 +08:00
Liwx1014
696815ddfb update pdf_parser.py 2025-09-08 14:58:37 +08:00
Liwx1014
88b467caf0 fix:build docreader timeout; update ocr config;support pdf tables parsing 2025-09-08 14:58:37 +08:00
Liwx1014
eb27a30c41 fix:build docreader timeout; update ocr config;support pdf tables parsing 2025-09-08 14:58:37 +08:00
Liwx1014
3aad892a62 fix:build docreader timeout; update ocr config;support pdf tables parsing 2025-09-08 14:58:37 +08:00
fatelei
d74ae9153b fix: https://github.com/Tencent/WeKnora/issues/114 2025-09-08 10:36:37 +08:00
Liwx
a1473fe731 Update ocr_engine.py 2025-08-29 12:24:58 +08:00
Liwx
7d0037fc2d Update ocr_engine.py 2025-08-29 12:24:58 +08:00
Liwx1014
11910048c0 fix:ocr extract error list out of range 2025-08-29 12:24:58 +08:00
wizardchen
f8394c7e4d fix processed_content used before assignment 2025-08-21 16:52:39 +08:00
wizardchen
d801112f5f fix: use image_data_list before assign 2025-08-21 10:08:37 +08:00
wizardchen
785261313f feat: make CONCURRENCY_POOL_SIZE configurable 2025-08-16 13:27:01 +08:00
begoniezhao
20049d034a refactor: optimize storage configuration priority and VLM configuration check logic 2025-08-15 17:33:44 +08:00
wizardchen
09d038eeb7 fix: strip minio path prefix 2025-08-15 01:36:04 +08:00
begoniezhao
f77720155c feat: Added WEB_PROXY environment variable to optimize web content processing 2025-08-14 17:09:11 +08:00
wizardchen
8b43931886 feat: support minio storage 2025-08-14 12:16:08 +08:00
dongyuxiang
396fd9326b chore: ignore mac .DS_Store 2025-08-12 17:55:51 +08:00
wizardchen
ddcf5edf02 fix: Fix docx parser init failed 2025-08-11 11:03:39 +08:00
wizardchen
bdabed6bfa feat: Added web page for configuring model information 2025-08-10 17:11:07 +08:00
begoniezhao
24c190c492 feat: 新增多模态模型配置及 VLM 模型认证 2025-08-08 17:05:24 +08:00
begoniezhao
6d1e192a2c refactor(caption.py): 改进 CaptionChatResp 解析逻辑,增强字段处理健壮性 2025-08-06 16:33:16 +08:00
begoniezhao
8557297f28 fix(docreader): add detail parameter for openai interface 2025-08-06 11:52:35 +08:00
wizardchen
56eb2bce33 init commit 2025-08-05 15:08:07 +08:00