ArchiveBox/abx-dl
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...
ai-scrapingarchiveboxchromeclicli-toolcrawlingcurldownloadergallery-dlheadlesshttp-clientinternet-archivingplaywrightpuppeteerscrapingwgetyoutube-dlyt-dlp
First Claude commit: Mar 15, 2026Last Claude commit: 2mo agoDiscovered: Mar 16, 2026
Recent Claude Commits
Fix total_hooks NameError, progress bar tracking, create_bus **kwargs
cbc75442mo agoauthor_emailSet event_timeout=phase_timeout on cleanup events
6fcf9842mo agoauthor_emailUse per-plugin timeout as SIGTERM grace period during daemon cleanup
ab5b9f02mo agoauthor_emailCompute per-plugin phase timeouts instead of naive num_hooks * TIMEOUT
e648aca2mo agoauthor_emailUse full event timeout for bg daemons instead of no timeout
f930aad2mo agoauthor_emailFix bg daemon hooks killed prematurely by per-hook timeout
50873192mo agoauthor_emailRemove BinaryLoadedEvent, scope synthetic ArchiveResults to Snapshot hooks
2929a8b2mo agoauthor_emailInclude binary provider plugins in filter_plugins, fix BinaryInstalled dedup
fb212472mo agoauthor_emailAdd 'noresult' status, fix install to use Binary events, include provider deps
08a11792mo agoauthor_emailFix install hooks not showing in Install Results table
93dd3062mo agoauthor_emailRename ProcessRecordOutputtedEvent to ProcessStdoutEvent, emit every line
df78cd02mo agoauthor_emailMove end_ts and output_files computation to ProcessRecordOutputtedEvent
1495e4b2mo agoauthor_emailGlob output_files on inline ArchiveResultEvents
da635942mo agoauthor_emailSet process_id, start_ts, end_ts on inline ArchiveResultEvents
8b4ba522mo agoauthor_emailAdd process_id, output_files, start_ts, end_ts to ArchiveResultEvent
71525872mo agoauthor_emailDecouple ArchiveResult from process metadata, use bus.find for dedup
ce5d7282mo agoauthor_emailRemove ArchiveResult/Snapshot events from CrawlSetup docstrings
44028d62mo agoauthor_emailReplace on_result callback with bus.on() event subscriptions
65fbb8d2mo agoauthor_emailMake psutil a hard import (already a project dependency)
0f4f0852mo agoauthor_emailClean up docstrings, fix all ruff and pyright errors
516c6282mo agoauthor_emailMove ArchiveResult ownership from ProcessService to ArchiveResultService
a2098e42mo agoauthor_emailDecouple JSONL routing from ProcessService via ProcessRecordOutputtedEvent
2d643fb2mo agoauthor_emailAdd tests for BinaryLoaded/InstalledEvent and update docstrings
810baca2mo agoauthor_emailAdd SnapshotEvent/ArchiveResultEvent, replace callback with bus events
52b0daa2mo agoauthor_emailAdd BinaryLoadedEvent and BinaryInstalledEvent informational events
edef9752mo agoauthor_emailRename ProcessCompleted → ProcessCompletedEvent for consistency
e9539d72mo agoauthor_emailAdd url to cleanup events, rename event_kwargs → crawl_kwargs/snapshot_kwargs
a333e472mo agoauthor_emailUpdate docstrings to reflect new lifecycle event flow
5949d662mo agoauthor_emailRename CrawlSetupCompletedEvent → CrawlStartEvent
2930c7d2mo agoauthor_emailAdd CrawlSetupEvent/CrawlSetupCompletedEvent to demarkate crawl phases
42aff3f2mo agoauthor_emailFold cleanup emission into on_CrawlEvent/on_SnapshotEvent, add ordering test
bba1b052mo agoauthor_emailRemove HookRunnerService, flatten to BaseService + make_hook_handler
a1c74702mo agoauthor_emailReplace _cleanup_bg_hooks with CrawlCleanupEvent/SnapshotCleanupEvent
d5ae1de2mo agoauthor_emailRemove EVENT_CLASS indirection, use explicit event classes in bus.on()
86b16562mo agoauthor_emailPre-register all hook handlers at init time, avoid dynamic dispatch
6e0a3312mo agoauthor_emailExtract HookRunnerService, fix emit_result to include Process records
30517072mo agoauthor_emailCompute CrawlEvent timeout from num_snapshot_hooks * TIMEOUT, remove copy header
dbd60d22mo agoauthor_emailMerge plugins.py into models.py and use archivebox://install for install
78516042mo agoauthor_emailConsolidate process kill logic into process_utils.py with 15s grace period
a0eefa62mo agoauthor_emailFix leaked subprocess on exception during stdout streaming
1e5bd302mo agoauthor_emailAdd comprehensive docstrings to all services, orchestrator, and events
1be2a4d2mo agoauthor_emailFix Chrome launch: strip googleapis/google.com from NO_PROXY and tune EventBus timeouts
1aaa3922mo agoauthor_emailFix EventBus history limit and bump abx-plugins to 1.9.18
89274092mo agoauthor_emailClean up services: remove no-op handlers, add hierarchy docstrings, fix timeout race
36ba3f52mo agoauthor_emailMove SnapshotEvent under CrawlEvent for proper parent/child hierarchy
c0682652mo agoauthor_emailRemove synthetic "started" ArchiveResult for bg hooks
b36dae82mo agoauthor_emailFix bg daemon lifecycle: proper parent/child tracking, SIGTERM before timeout
b9079682mo agoauthor_emailFix timeout hang and crawl daemon blocking
07ba3072mo agoauthor_emailStream stdout JSONL in realtime, fix crawl hook config propagation
b18c9962mo agoauthor_emailAdd ProcessKillEvent to SIGTERM daemon hooks through the event system
7de965e2mo agoauthor_email