Fast, resilient, resumable CLI (and Go library) for downloading models and datasets from the Hugging Face Hub—now with a clean flag set, a colorful TUI, structured JSON events, retry/backoff, and filesystems‑only resume (no progress files).
v2.0.0 is a breaking redesign of the CLI and library. The old README summarized that this major version simplifies flags, adds dry‑run plans, verification options, multipart thresholding, and JSON output. This document is the definitive, up‑to‑date guide for v2.0.0.
-
Resumable by default
- LFS files: verified by SHA‑256 (when provided by the repo).
- Non‑LFS files: verified by size.
- Large files use multipart range downloads with per‑part resume.
-
Beautiful live TUI
- Auto‑adapts to terminal width/height; smart truncation; per‑file bars, speeds, ETA.
- Colorful when supported; graceful plain‑text fallback (
TERM=dumb
orNO_COLOR=1
).
-
Robust + fast cancellation
- Ctrl‑C (SIGINT) or SIGTERM aborts immediately across goroutines; second Ctrl‑C exits.
-
Structured progress events (
--json
) for CI/logging. -
Practical controls
- Overall concurrency and per‑file connection limits.
- Retry with exponential backoff.
- Verification policy for non‑LFS files:
none | size | etag | sha256
. - Dry‑run “plan” mode (table or JSON).
-
No progress/meta files
- Skip decisions are made only from what’s on disk (checksums/sizes).
Note: the previous README mentioned saving an
.hfdownloader.meta.json
. v2.0.0 no longer persists such files; resume is purely filesystem‑based.
- Skip decisions are made only from what’s on disk (checksums/sizes).
Note: the previous README mentioned saving an
git clone https://github.com/bodaay/HuggingFaceModelDownloader
cd HuggingFaceModelDownloader
go build -o hfdownloader .
# optional:
# go install . # installs into your $GOBIN
- Go 1.21+
- macOS / Linux / Windows (modern terminals support the live TUI; otherwise it falls back to plain text)
# Public model
hfdownloader download TheBloke/Mistral-7B-Instruct-v0.2-GGUF -o ./Models
# Private or gated repo (requires token)
HF_TOKEN=xxxx hfdownloader download owner/private-model -o ./Models
# Filter LFS artifacts by name; append a subdir for each filter
hfdownloader download TheBloke/vicuna-13b-v1.3.0-GGML:q4_0,q5_0 \
--append-filter-subdir -o ./Models -c 8 --max-active 3
# Dataset mode
hfdownloader download facebook/flores --dataset -o ./Datasets
# Plan only (no downloads), pretty-printed JSON
hfdownloader download TheBloke/Mistral-7B-Instruct-v0.2-GGUF:q4_0 --dry-run --plan-format json
Tip: You can include filters in REPO
via owner/name:filter1,filter2
or pass -F, --filters
.
hfdownloader download [REPO] [flags]
# REPO: owner/name or owner/name:filter1,filter2
-
Destination
-o, --output
Storage — base folder for downloads
-
What to fetch
-r, --repo
(optional if REPO positional is present)--dataset
— treat repo as dataset instead of model-b, --revision
main — branch/tag/sha-F, --filters
— comma‑separated LFS name filters (q4_0,q5_0
)--append-filter-subdir
— put each filter’s files under its own subfolder
-
Speed and parallelism
-c, --connections
8 — per‑file HTTP range connections--max-active
GOMAXPROCS — max concurrent files--multipart-threshold
32MiB — only multipart files ≥ threshold
-
Reliability
--retries
4 — retry attempts per request/part--backoff-initial
400ms,--backoff-max
10s--verify
size — non‑LFS verification:none | size | etag | sha256
-
Planning / logging
--dry-run
— plan only (no downloads)--plan-format
table —table | json
(with--dry-run
)--json
— emit machine‑readable events-q, --quiet
— reduce console noise-v, --verbose
— more diagnostics
-
Auth & config
-t, --token
— Hugging Face token (orHF_TOKEN
env)--config
— path to JSON config (defaults to~/.config/hfdownloader.json
if present)
The old README showed a larger flag surface and described resume/overwrite toggles and metadata persistence; v2.0.0 deliberately simplifies this—resume is always on, and skip decisions are based on your files only (no metadata saved).
- Header: repo, revision, dataset flag, output dir, connections, verification, retries.
- Global progress bar with %, bytes, speed, ETA.
- Per‑file rows: status (▶/✓/•/×), filename (middle‑ellipsis), bar, bytes, speed, ETA.
- Auto‑adjusts to width/height; plain text fallback when colors/ANSI aren’t available.
Disable color: set NO_COLOR=1
or run in non‑TTY pipelines.
JSON mode: --json
bypasses the TUI for structured events (see below).
The downloader emits typed events:
scan_start
plan_item
(one per file; includes size and whether it’s LFS)file_start
file_progress
(bytes/total, periodic for multipart)retry
(attempt #, message)file_done
(includesmessage: "skip (...)"
when a file is skipped)error
done
Example:
{"time":"2025-09-05T18:42:10Z","event":"scan_start","repo":"owner/name","revision":"main","message":"scanning repo"}
{"time":"2025-09-05T18:42:11Z","event":"plan_item","path":"model-q4_0.gguf","total":4227858432,"repo":"owner/name","revision":"main"}
{"time":"2025-09-05T18:42:29Z","event":"file_done","path":"model-q4_0.gguf","repo":"owner/name","revision":"main"}
{"time":"2025-09-05T18:42:29Z","event":"done","message":"download complete (downloaded 1, skipped 12)","repo":"owner/name","revision":"main"}
Skip lines: each file prints at most one “skip (…)” per run. Deduplication is enforced internally.
-
LFS files (SHA available) → compute local SHA‑256 and compare to the repo’s SHA.
- If equal → skip (
file_done
withskip (sha256 match)
). - If different (even if size matches) → re‑download.
- If equal → skip (
-
Non‑LFS / SHA unknown → compare file size.
- If equal → skip (
skip (size match)
). - If different → re‑download.
- If equal → skip (
-
Multipart parts → each range part downloads to
path.part-00
,path.part-01
, …; matching‑length parts are not re‑fetched and the file is assembled when all parts are present.
- SIGINT/SIGTERM: fast, cooperative cancellation—no new work is scheduled, all goroutines exit promptly, and ongoing HTTP calls are context‑bound.
- SIGKILL (9): cannot be intercepted by any program; the OS terminates immediately.
Download GGUF variants into separate subfolders
hfdownloader download TheBloke/Mistral-7B-Instruct-v0.2-GGUF:q4_0,q5_0 \
--append-filter-subdir -o ./Models --connections 8 --max-active 3
Stricter verification (non‑LFS)
hfdownloader download owner/name --verify etag
Datasets
hfdownloader download huggingface/awesome-dataset --dataset -o ./Datasets
Plan first, then run
hfdownloader download owner/name:q4_0 --dry-run --plan-format json
hfdownloader download owner/name:q4_0
If --config
is not provided, the tool will read ~/.config/hfdownloader.json
when present and use values as defaults (CLI flags still override).
Example:
{
"output": "Storage",
"connections": 8,
"max-active": 3,
"multipart-threshold": "256MiB",
"verify": "size",
"retries": 4,
"backoff-initial": "400ms",
"backoff-max": "10s",
"token": "hf_xxx"
}
package main
import (
"context"
"log"
"github.com/bodaay/HuggingFaceModelDownloader/hfdownloader"
)
func main() {
job := hfdownloader.Job{
Repo: "TheBloke/Mistral-7B-Instruct-v0.2-GGUF:q4_0",
Revision: "main",
IsDataset: false,
Filters: []string{"q4_0"},
// AppendFilterSubdir: true, // optional
}
cfg := hfdownloader.Settings{
OutputDir: "Storage",
Concurrency: 8,
MaxActiveDownloads: 3,
MultipartThreshold: "256MiB",
Verify: "size", // none|size|etag|sha256
Retries: 4,
BackoffInitial: "400ms",
BackoffMax: "10s",
Token: "", // or os.Getenv("HF_TOKEN")
}
progress := func(ev hfdownloader.ProgressEvent) {
switch ev.Event {
case "file_done":
if strings.HasPrefix(ev.Message, "skip") {
log.Printf("skip: %s (%s)", ev.Path, ev.Message)
} else {
log.Printf("done: %s", ev.Path)
}
case "retry":
log.Printf("retry %s: attempt %d: %s", ev.Path, ev.Attempt, ev.Message)
}
}
if err := hfdownloader.Download(context.Background(), job, cfg, progress); err != nil {
log.Fatal(err)
}
}
Events you can handle: scan_start
, plan_item
, file_start
, file_progress
, retry
, file_done
, error
, done
.
- 401 Unauthorized
Provide a token:
-t TOKEN
orHF_TOKEN=...
. Some repos require auth/acceptance. - 403 Forbidden (terms) Visit the repo page and accept terms, then retry.
- Range requests disabled Multipart falls back to a single GET automatically; downloads still work.
- Slow throughput
Increase
--connections
and--max-active
gradually; ensure disk/FS and network can keep up. - Repeated “skip” lines v2.0.0 emits at most one “skip (…)” per file per run. If you still see duplicates, check for duplicate paths in the upstream tree or path collisions on Windows.
- Cleaner mental model (one download command, sensible defaults).
- Filesystem‑only resume—reliable and transparent; no “state” files to corrupt.
- JSON events and a TUI that looks great everywhere.
- Strong cancellation story for real‑world, long‑running downloads.
The prior README’s CLI outline and examples helped guide this cleanup; this version documents the finalized v2 surface and behavior.
Apache‑2.0 (see LICENSE
).
Thanks to the HF community and tooling ecosystem—this project tries to be a pragmatic drop‑in for everyday model & dataset fetching.