Changelog - Crawlboy

v1.2.0

Latest 2026-05-29

Changed

Crawl4AI — minimum dependency raised to >=0.8.6 (unclecode/crawl4ai 0.8.x: security fixes, MarkdownGenerationResult API, Playwright/patchright updates)
lxml — CI/Docker install upgrades to >=6.1.0 after pip install to fix PYSEC-2026-87 until crawl4ai relaxes lxml~=5.3

Install

$ pip install crawlboy==1.2.0

$ crawl4ai-setup

After install, upgrade lxml if you run your own security audits: pip install 'lxml>=6.1.0'

GitHub release PyPI Full changelog

2026-04-18

Added

--meta-frontmatter — optional YAML frontmatter on each Markdown file with source_url and extracted HTML metadata (title, canonical, meta name / property / http-equiv), plus matching interactive wizard option
PyYAML — dependency for frontmatter serialization

GitHub release PyPI Full changelog

2026-04-11

Added

Sitemap crawling — sequentially crawls every URL from XML sitemaps with Crawl4AI
Nested sitemap support — recursively follows <sitemapindex> entries
Markdown output — converts crawled pages to Markdown, one file per URL with mirrored directory structure
HTML export — optional --save-html flag to preserve raw HTML alongside Markdown
Image download — --download-images to save media locally with content-addressed filenames (deduped across crawl) and automatic path rewriting in Markdown and HTML
Automatic sitemap discovery — auto-detects sitemap from robots.txt or common paths (/sitemap.xml, /sitemap_index.xml, etc.)
Interactive CLI — guided wizard with questionary and Rich for easy configuration
Flexible URL modes — direct sitemap URL (--sitemap-url) or site root discovery (--site-url)
Host filtering — respects site origin by default; --include-offsite-urls to crawl all listed URLs
Error logging — failures logged to errors.jsonl with paths and error details
Performance tuning — configurable per-page delay, page timeout, and max URL limit
Browser control — --no-headless to show browser window for debugging
Docker support — includes Dockerfile for containerized execution with pre-installed browser dependencies
Fail-fast mode — --fail-fast to stop on first error for rapid iteration

Technical Details

PyPI Full changelog