更新爬虫方案文档,增加摘要提取模块以生成文档摘要;优化基础爬虫类的标题提取逻辑,支持多个选择器,调整内容处理逻辑以去除重复标题。

This commit is contained in:
oy2020
2026-01-31 16:34:13 +08:00
parent 3c625d1c3a
commit c707704d80
5 changed files with 355 additions and 31 deletions

1
.gitignore vendored
View File

@@ -32,6 +32,7 @@ wheels/
# 输出文件
output/
output_post/
# 临时文件
*.tmp