Logo
Explore Help
Sign In
scalett/crawl4zeroerr
2
1
Fork 0
You've already forked crawl4zeroerr
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
5 Commits 1 Branch 0 Tags
dbe9ba36290321f767acf0d93e1b07b3a0c7e12e
Commit Graph

2 Commits

Author SHA1 Message Date
oy2020
dbe9ba3629 新增标题层级处理规则:1. 新增主页链接;2.新增docx后处理,合并同一层级的标题;3. 优化层级,h1不重复 2026-02-09 18:53:32 +08:00
oy2020
c707704d80 更新爬虫方案文档,增加摘要提取模块以生成文档摘要;优化基础爬虫类的标题提取逻辑,支持多个选择器,调整内容处理逻辑以去除重复标题。 2026-01-31 16:34:13 +08:00
Powered by Gitea Version: 1.25.2 Page: 16ms Template: 2ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API