How It Works¶
Verification pipeline¶
Every email runs through up to 5 sequential checks. Each check either passes (adding to the score), fails (stopping the pipeline), warns (noting uncertainty), or skips (service unavailable).
Email input
│
▼
① Syntax check ─────────── RFC 5322 validation via email-validator
│ FAIL → stop Normalises to lowercase, rejects typos
▼
② Disposable check ──────── 500+ known throwaway domain blocklist
│ FAIL → stop (mailinator.com, guerrillamail.com, etc.)
▼
③ DNS / MX check ────────── Async DNS lookup via dnspython
│ FAIL → stop Fails on NXDOMAIN or zero MX records
│ Detects catch-all domains
▼
④ Reacher SMTP ──────────── POST /v0/check_email to Reacher microservice
│ SKIP if no Docker Rust service handles Gmail/Outlook quirks
│ FAIL → stop Returns: is_deliverable, is_catch_all, can_connect
▼
⑤ Holehe platforms ──────── Checks email against 120+ platforms (opt-in)
│ SKIP unless --holehe Useful on catch-all domains where SMTP is unreliable
▼
Confidence score [0–100]
Score baseline and deltas¶
All emails start at 30. Each passing check adds points:
| Check | Delta | Condition |
|---|---|---|
| Syntax | 0 | Gate only — no delta |
| Disposable | +5 | Not a disposable domain |
| DNS | +10 | MX records found |
| Reacher | +20 | SMTP accepted, not catch-all |
| Holehe | +15 | Registered on ≥2 platforms |
| Holehe | +5 | Registered on exactly 1 platform |
| Source hint | +15–35 | Website contact page, team page, etc. |
Final score is clamped to [0, 100].
Catch-all domains
Domains like Google Workspace or Office 365 accept all RCPT TO commands,
so SMTP verification is useless. Use --holehe on these — platform presence
is a reliable signal when SMTP isn't.
Discovery pipeline¶
When you run coldreach find, all sources execute concurrently:
coldreach find --domain acme.com
│
├─── Cache check ──── HIT → return immediately
│
├─── Source 1: WebCrawlerSource (homepage + /contact + /team + /about)
├─── Source 2: WhoisSource (registrant contact)
├─── Source 3: GitHubSource (commit author emails)
├─── Source 4: RedditSource (mentions with email patterns)
├─── Source 5: SearchEngineSource (SearXNG → DDG → Brave fallback)
├─── Source 6: HarvesterSource (theHarvester Docker container)
└─── Source 7: SpiderFootSource (SpiderFoot Docker container)
│
▼
Merge all SourceResult lists
│
▼
Pattern generation
(if --name: infer email format from found emails → generate targeted guesses)
│
▼
Deduplicate by email address (keep highest confidence_hint)
│
▼
Verification (run_basic_pipeline for each unique email)
│
▼
Score + rank results
│
▼
Store in cache (full results, before min_confidence filter)
│
▼
Display (filter by min_confidence unless --all)
Sources that are unavailable (Docker not running, timeout, etc.) are silently skipped — they contribute a SKIP status to the summary but don't fail the run.
Pattern generation¶
When a target name is provided (--name "Jane Smith"), ColdReach uses found emails to infer the company's email format before generating guesses.
Example:
Known emails found: ["m.chen@acme.com", "r.jones@acme.com"]
↓
Inferred format: "f.last" (first initial + dot + last name)
↓
Generated for "Jane Smith":
- j.smith@acme.com (inferred format — confidence: higher)
- jane.smith@acme.com (companion format — common co-occurrence)
When no known emails exist, ColdReach falls back to the top 3 most common B2B formats: first.last, flast, first.
Supported formats¶
| Format | Example |
|---|---|
first.last |
jane.smith@acme.com |
flast |
jsmith@acme.com |
first |
jane@acme.com |
f.last |
j.smith@acme.com |
firstl |
janes@acme.com |
last |
smith@acme.com |
last.first |
smith.jane@acme.com |
first.last.initial |
jane.smith.j@acme.com |
Caching¶
ColdReach uses a two-layer cache to avoid repeating expensive discovery runs.
coldreach find --domain acme.com
│
▼
① Check Redis (if available)
│ HIT → deserialize → return
│ MISS ↓
② Check SQLite (~/.coldreach/cache.db)
│ HIT + not expired → promote to Redis → return
│ MISS ↓
③ Run all sources (expensive)
│
▼
Store full results in both Redis + SQLite
(TTL: 7 days, configurable via COLDREACH_CACHE_TTL_DAYS)
Important: results are cached before the min_confidence filter is applied. This means a future call with --all will still use the cached full result set without re-querying.
Use --refresh to bypass the cache for one run, or coldreach cache clear to remove entries.