Files
crawl4zeroerr/requirements.txt
Oo 9e14b56275 优化文档导出层级与链接保真,统一正文标题映射并增强 Word 段落超链接处理。
同时移除不再使用的文档后处理依赖,减少汇总导出流程中的冗余步骤。

Made-with: Cursor
2026-03-30 10:32:34 +08:00

25 lines
567 B
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 零差云控官网爬虫依赖
requests>=2.28.0
beautifulsoup4>=4.11.0
markdown>=3.6
markdownify>=0.11.0
python-docx>=0.8.11
lxml>=4.9.0
# doc2md.py 依赖
Pillow>=9.0.0
matplotlib>=3.5.0 # 可选:用于渲染 LaTeX 公式
# wand>=0.6.0 # 可选:用于 WMF/EMF 转换(需要系统安装 ImageMagick
# html2image>=2.0.0 # 可选:用于表格渲染为图片
# test_llm.py 依赖 - RAG 方案
openai>=1.0.0
langchain>=0.1.0
langchain-openai>=0.1.0
langchain-community>=0.0.20
faiss-cpu>=1.7.4
tiktoken>=0.5.0
sentence-transformers>=2.2.0
torch>=2.0.0