ArchiveBox/abx-dl
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...
ai-scrapingarchiveboxchromeclicli-toolcrawlingcurldownloadergallery-dlheadlesshttp-clientinternet-archivingplaywrightpuppeteerscrapingwgetyoutube-dlyt-dlp
First Claude commit: Mar 15, 2026Last Claude commit: 1mo agoDiscovered: Mar 16, 2026
Recent Claude Commits
Fix total_hooks NameError, progress bar tracking, create_bus **kwargs
cbc75441mo agoauthor_emailSet event_timeout=phase_timeout on cleanup events
6fcf9841mo agoauthor_emailUse per-plugin timeout as SIGTERM grace period during daemon cleanup
ab5b9f01mo agoauthor_emailCompute per-plugin phase timeouts instead of naive num_hooks * TIMEOUT
e648aca1mo agoauthor_emailUse full event timeout for bg daemons instead of no timeout
f930aad1mo agoauthor_emailFix bg daemon hooks killed prematurely by per-hook timeout
50873191mo agoauthor_emailRemove BinaryLoadedEvent, scope synthetic ArchiveResults to Snapshot hooks
2929a8b1mo agoauthor_emailInclude binary provider plugins in filter_plugins, fix BinaryInstalled dedup
fb212471mo agoauthor_emailAdd 'noresult' status, fix install to use Binary events, include provider deps
08a11791mo agoauthor_emailFix install hooks not showing in Install Results table
93dd3061mo agoauthor_emailRename ProcessRecordOutputtedEvent to ProcessStdoutEvent, emit every line
df78cd01mo agoauthor_emailMove end_ts and output_files computation to ProcessRecordOutputtedEvent
1495e4b1mo agoauthor_emailGlob output_files on inline ArchiveResultEvents
da635941mo agoauthor_emailSet process_id, start_ts, end_ts on inline ArchiveResultEvents
8b4ba521mo agoauthor_emailAdd process_id, output_files, start_ts, end_ts to ArchiveResultEvent
71525871mo agoauthor_emailDecouple ArchiveResult from process metadata, use bus.find for dedup
ce5d7281mo agoauthor_emailRemove ArchiveResult/Snapshot events from CrawlSetup docstrings
44028d61mo agoauthor_emailReplace on_result callback with bus.on() event subscriptions
65fbb8d1mo agoauthor_emailMake psutil a hard import (already a project dependency)
0f4f0851mo agoauthor_emailClean up docstrings, fix all ruff and pyright errors
516c6281mo agoauthor_emailMove ArchiveResult ownership from ProcessService to ArchiveResultService
a2098e41mo agoauthor_emailDecouple JSONL routing from ProcessService via ProcessRecordOutputtedEvent
2d643fb1mo agoauthor_emailAdd tests for BinaryLoaded/InstalledEvent and update docstrings
810baca1mo agoauthor_emailAdd SnapshotEvent/ArchiveResultEvent, replace callback with bus events
52b0daa1mo agoauthor_emailAdd BinaryLoadedEvent and BinaryInstalledEvent informational events
edef9751mo agoauthor_emailRename ProcessCompleted → ProcessCompletedEvent for consistency
e9539d71mo agoauthor_emailAdd url to cleanup events, rename event_kwargs → crawl_kwargs/snapshot_kwargs
a333e471mo agoauthor_emailUpdate docstrings to reflect new lifecycle event flow
5949d661mo agoauthor_emailRename CrawlSetupCompletedEvent → CrawlStartEvent
2930c7d1mo agoauthor_emailAdd CrawlSetupEvent/CrawlSetupCompletedEvent to demarkate crawl phases
42aff3f1mo agoauthor_emailFold cleanup emission into on_CrawlEvent/on_SnapshotEvent, add ordering test
bba1b051mo agoauthor_emailRemove HookRunnerService, flatten to BaseService + make_hook_handler
a1c74701mo agoauthor_emailReplace _cleanup_bg_hooks with CrawlCleanupEvent/SnapshotCleanupEvent
d5ae1de1mo agoauthor_emailRemove EVENT_CLASS indirection, use explicit event classes in bus.on()
86b16561mo agoauthor_emailPre-register all hook handlers at init time, avoid dynamic dispatch
6e0a3311mo agoauthor_emailExtract HookRunnerService, fix emit_result to include Process records
30517071mo agoauthor_emailCompute CrawlEvent timeout from num_snapshot_hooks * TIMEOUT, remove copy header
dbd60d21mo agoauthor_emailMerge plugins.py into models.py and use archivebox://install for install
78516041mo agoauthor_emailConsolidate process kill logic into process_utils.py with 15s grace period
a0eefa61mo agoauthor_emailFix leaked subprocess on exception during stdout streaming
1e5bd301mo agoauthor_emailAdd comprehensive docstrings to all services, orchestrator, and events
1be2a4d1mo agoauthor_emailFix Chrome launch: strip googleapis/google.com from NO_PROXY and tune EventBus timeouts
1aaa3921mo agoauthor_emailFix EventBus history limit and bump abx-plugins to 1.9.18
89274091mo agoauthor_emailClean up services: remove no-op handlers, add hierarchy docstrings, fix timeout race
36ba3f51mo agoauthor_emailMove SnapshotEvent under CrawlEvent for proper parent/child hierarchy
c0682651mo agoauthor_emailRemove synthetic "started" ArchiveResult for bg hooks
b36dae81mo agoauthor_emailFix bg daemon lifecycle: proper parent/child tracking, SIGTERM before timeout
b9079681mo agoauthor_emailFix timeout hang and crawl daemon blocking
07ba3071mo agoauthor_emailStream stdout JSONL in realtime, fix crawl hook config propagation
b18c9961mo agoauthor_emailAdd ProcessKillEvent to SIGTERM daemon hooks through the event system
7de965e1mo agoauthor_email