This website requires JavaScript.
Explore
Help
Sign In
scalett
1 Followers
·
0 Following
Joined on
2025-06-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
1
Projects
Packages
Public Activity
Starred Repositories
1
scalett
pushed to
main
at
scalett/crawl4zeroerr
2026-02-09 18:54:23 +08:00
dbe9ba3629
新增标题层级处理规则:1. 新增主页链接;2.新增docx后处理,合并同一层级的标题;3. 优化层级,h1不重复
scalett
pushed to
main
at
scalett/crawl4zeroerr
2026-01-31 16:35:56 +08:00
c707704d80
更新爬虫方案文档,增加摘要提取模块以生成文档摘要;优化基础爬虫类的标题提取逻辑,支持多个选择器,调整内容处理逻辑以去除重复标题。
scalett
pushed to
main
at
scalett/crawl4zeroerr
2026-01-31 09:32:04 +08:00
3c625d1c3a
更新爬虫方案文档,增加服务与支持-详细页面的输出信息;优化基础爬虫类,增强标题提取和内容去重逻辑;根据doc2md.py调整图片处理逻辑以改善Word文档生成效果。
scalett
pushed to
main
at
scalett/crawl4zeroerr
2026-01-29 17:45:38 +08:00
3670129972
忽略 output 文件夹,移除已跟踪的 output 文件
scalett
pushed to
main
at
scalett/crawl4zeroerr
2026-01-29 17:42:52 +08:00
2e6c5159d2
蹇界暐 output 鏂囦欢澶癸紝绉婚櫎宸茶窡韪殑 output 鏂囦欢
scalett
pushed to
main
at
scalett/crawl4zeroerr
2026-01-29 17:39:10 +08:00
51b67b9e68
初始提交:零差云控官网爬虫项目
scalett
created branch
main
in
scalett/crawl4zeroerr
2026-01-29 17:39:10 +08:00
scalett
created repository
scalett/crawl4zeroerr
2026-01-29 17:35:50 +08:00