Skip to content

How It Works


Verification pipeline

Every email runs through up to 5 sequential checks. Each check either passes (adding to the score), fails (stopping the pipeline), warns (noting uncertainty), or skips (service unavailable).

Email input
① Syntax check ─────────── RFC 5322 validation via email-validator
    │ FAIL → stop           Normalises to lowercase, rejects typos
② Disposable check ──────── 500+ known throwaway domain blocklist
    │ FAIL → stop           (mailinator.com, guerrillamail.com, etc.)
③ DNS / MX check ────────── Async DNS lookup via dnspython
    │ FAIL → stop           Fails on NXDOMAIN or zero MX records
    │                       Detects catch-all domains
④ Reacher SMTP ──────────── POST /v0/check_email to Reacher microservice
    │ SKIP if no Docker      Rust service handles Gmail/Outlook quirks
    │ FAIL → stop            Returns: is_deliverable, is_catch_all, can_connect
⑤ Holehe platforms ──────── Checks email against 120+ platforms (opt-in)
    │ SKIP unless --holehe   Useful on catch-all domains where SMTP is unreliable
Confidence score [0–100]

Score baseline and deltas

All emails start at 30. Each passing check adds points:

Check Delta Condition
Syntax 0 Gate only — no delta
Disposable +5 Not a disposable domain
DNS +10 MX records found
Reacher +20 SMTP accepted, not catch-all
Holehe +15 Registered on ≥2 platforms
Holehe +5 Registered on exactly 1 platform
Source hint +15–35 Website contact page, team page, etc.

Final score is clamped to [0, 100].

Catch-all domains

Domains like Google Workspace or Office 365 accept all RCPT TO commands, so SMTP verification is useless. Use --holehe on these — platform presence is a reliable signal when SMTP isn't.


Discovery pipeline

When you run coldreach find, all sources execute concurrently:

coldreach find --domain acme.com
        ├─── Cache check ──── HIT → return immediately
        ├─── Source 1: WebCrawlerSource      (homepage + /contact + /team + /about)
        ├─── Source 2: WhoisSource           (registrant contact)
        ├─── Source 3: GitHubSource          (commit author emails)
        ├─── Source 4: RedditSource          (mentions with email patterns)
        ├─── Source 5: SearchEngineSource    (SearXNG → DDG → Brave fallback)
        ├─── Source 6: HarvesterSource       (theHarvester Docker container)
        └─── Source 7: SpiderFootSource      (SpiderFoot Docker container)
        Merge all SourceResult lists
        Pattern generation
        (if --name: infer email format from found emails → generate targeted guesses)
        Deduplicate by email address (keep highest confidence_hint)
        Verification (run_basic_pipeline for each unique email)
        Score + rank results
        Store in cache (full results, before min_confidence filter)
        Display (filter by min_confidence unless --all)

Sources that are unavailable (Docker not running, timeout, etc.) are silently skipped — they contribute a SKIP status to the summary but don't fail the run.


Pattern generation

When a target name is provided (--name "Jane Smith"), ColdReach uses found emails to infer the company's email format before generating guesses.

Example:

Known emails found: ["m.chen@acme.com", "r.jones@acme.com"]
Inferred format: "f.last"   (first initial + dot + last name)
Generated for "Jane Smith":
  - j.smith@acme.com   (inferred format — confidence: higher)
  - jane.smith@acme.com  (companion format — common co-occurrence)

When no known emails exist, ColdReach falls back to the top 3 most common B2B formats: first.last, flast, first.

Supported formats

Format Example
first.last jane.smith@acme.com
flast jsmith@acme.com
first jane@acme.com
f.last j.smith@acme.com
firstl janes@acme.com
last smith@acme.com
last.first smith.jane@acme.com
first.last.initial jane.smith.j@acme.com

Caching

ColdReach uses a two-layer cache to avoid repeating expensive discovery runs.

coldreach find --domain acme.com
① Check Redis (if available)
        │ HIT → deserialize → return
        │ MISS ↓
② Check SQLite (~/.coldreach/cache.db)
        │ HIT + not expired → promote to Redis → return
        │ MISS ↓
③ Run all sources (expensive)
Store full results in both Redis + SQLite
(TTL: 7 days, configurable via COLDREACH_CACHE_TTL_DAYS)

Important: results are cached before the min_confidence filter is applied. This means a future call with --all will still use the cached full result set without re-querying.

Use --refresh to bypass the cache for one run, or coldreach cache clear to remove entries.