Skip to content

Data Models

Core Pydantic models and enums that flow through the entire ColdReach pipeline — from source discovery through verification and into storage.


Core models

coldreach.core.models

ColdReach core Pydantic models.

These are the primary data structures that flow through the entire pipeline — from source discovery through verification and into storage/output.

Design rules
  • Every model is immutable (frozen=False by default for update ergonomics, but validation runs on assignment).
  • All email strings are normalized to lowercase on input.
  • Confidence is always in range [0, 100].
  • Timestamps are always UTC-naive datetimes (stored as UTC, no tz info embedded to keep SQLite simple).

VerificationStatus

Bases: StrEnum

Result of the full verification pipeline for one email address.

VALID class-attribute instance-attribute

VALID = 'valid'

SMTP accepted the address and it's not a catch-all.

INVALID class-attribute instance-attribute

INVALID = 'invalid'

Definitively invalid: bad syntax, NXDOMAIN, or SMTP 550.

RISKY class-attribute instance-attribute

RISKY = 'risky'

Passes basic checks but has low-confidence signals.

UNKNOWN class-attribute instance-attribute

UNKNOWN = 'unknown'

Cannot determine — catch-all domain or SMTP unreachable.

CATCH_ALL class-attribute instance-attribute

CATCH_ALL = 'catch_all'

Domain accepts all RCPT TO addresses — unverifiable via SMTP.

DISPOSABLE class-attribute instance-attribute

DISPOSABLE = 'disposable'

Known throwaway / temporary email service.

UNDELIVERABLE class-attribute instance-attribute

UNDELIVERABLE = 'undeliverable'

No MX records — domain cannot receive email.

EmailSource

Bases: StrEnum

Where an email address was discovered.

SourceRecord

Bases: BaseModel

A single discovery event — one source that found one email.

url class-attribute instance-attribute

url = None

The page URL or API endpoint where the email was found.

context class-attribute instance-attribute

context = ''

Surrounding text snippet or metadata that led to the discovery.

EmailRecord

Bases: BaseModel

A single email address with its verification state and discovery sources.

Attributes:

Name Type Description
email str

The email address (normalized to lowercase on input).

confidence int

Integer in [0, 100]. Higher = more likely to be valid and deliverable.

status VerificationStatus

Verification status from the pipeline.

sources list[SourceRecord]

All sources that discovered this address (de-duplicated upstream).

is_catch_all_domain bool

True if the email's domain accepts all RCPT TO probes — SMTP verification is meaningless in this case.

mx_records list[str]

MX hostnames for the domain, sorted by priority.

holehe_platforms list[str]

Platform names where this email was confirmed registered (via Holehe).

checked_at datetime

When verification was last run.

domain property

domain

The domain part of the email address.

local_part property

local_part

The local (username) part of the email address.

source_names property

source_names

Deduplicated list of source identifiers that found this email.

primary_source property

primary_source

The highest-priority source that found this email.

normalise_email classmethod

normalise_email(v)

Lowercase and strip whitespace.

Source code in coldreach/core/models.py
@field_validator("email")
@classmethod
def normalise_email(cls, v: str) -> str:
    """Lowercase and strip whitespace."""
    v = v.strip().lower()
    if "@" not in v or v.startswith("@") or v.endswith("@"):
        raise ValueError(f"Invalid email format: {v!r}")
    return v

confidence_label

confidence_label()

Human-readable confidence tier.

Source code in coldreach/core/models.py
def confidence_label(self) -> str:
    """Human-readable confidence tier."""
    if self.confidence >= 80:
        return "high"
    if self.confidence >= 50:
        return "medium"
    return "low"

to_dict

to_dict()

Flat dict suitable for CSV export.

Source code in coldreach/core/models.py
def to_dict(self) -> dict[str, Any]:
    """Flat dict suitable for CSV export."""
    return {
        "email": self.email,
        "confidence": self.confidence,
        "status": self.status.value,
        "sources": ", ".join(self.source_names),
        "is_catch_all": self.is_catch_all_domain,
        "holehe_platforms": ", ".join(self.holehe_platforms),
        "checked_at": self.checked_at.isoformat(),
    }

DomainResult

Bases: BaseModel

All email addresses discovered for one domain.

Attributes:

Name Type Description
domain str

The domain that was scanned (e.g. "stripe.com").

company_name str | None

Human-readable company name if known.

emails list[EmailRecord]

Discovered and verified email addresses.

is_catch_all bool

True if the domain's mail server accepts all RCPT TO probes.

mx_records list[str]

MX records for the domain.

crawled_at datetime

Timestamp when the scan completed.

best_email property

best_email

Return the email with the highest confidence score.

sorted_emails

sorted_emails(min_confidence=0)

Return emails sorted by confidence descending.

Parameters:

Name Type Description Default
min_confidence int

Exclude emails below this confidence threshold.

0
Source code in coldreach/core/models.py
def sorted_emails(self, min_confidence: int = 0) -> list[EmailRecord]:
    """Return emails sorted by confidence descending.

    Parameters
    ----------
    min_confidence:
        Exclude emails below this confidence threshold.
    """
    filtered = [e for e in self.emails if e.confidence >= min_confidence]
    return sorted(filtered, key=lambda e: e.confidence, reverse=True)

add_email

add_email(record)

Add or merge an email record, avoiding exact duplicates.

Source code in coldreach/core/models.py
def add_email(self, record: EmailRecord) -> None:
    """Add or merge an email record, avoiding exact duplicates."""
    for existing in self.emails:
        if existing.email == record.email:
            return  # already tracked
    self.emails.append(record)

Exceptions

coldreach.exceptions

ColdReach custom exceptions.

Hierarchy

ColdReachError ├── ConfigError — bad or missing configuration ├── SourceError — data source (scraper / API) failed │ └── RateLimitError — upstream rate limit hit ├── VerificationError — error during email verification └── ServiceUnavailableError — a Docker service is not reachable

ColdReachError

Bases: Exception

Base exception for all ColdReach errors.

ConfigError

Bases: ColdReachError

Raised when configuration is invalid or missing.

SourceError

Bases: ColdReachError

Raised when a data source fails to return results.

RateLimitError

RateLimitError(service, retry_after=None)

Bases: SourceError

Raised when an upstream service rate-limits the request.

Attributes:

Name Type Description
service

Human-readable service name (e.g. "SearXNG").

retry_after

Suggested number of seconds to wait before retrying, if provided by the upstream service.

Source code in coldreach/exceptions.py
def __init__(self, service: str, retry_after: int | None = None) -> None:
    self.service = service
    self.retry_after = retry_after
    msg = f"Rate limited by {service}"
    if retry_after is not None:
        msg += f" — retry after {retry_after}s"
    super().__init__(msg)

VerificationError

Bases: ColdReachError

Raised when the verification pipeline encounters an unrecoverable error.

ServiceUnavailableError

ServiceUnavailableError(service, url)

Bases: ColdReachError

Raised when a required Docker service cannot be reached.

Attributes:

Name Type Description
service

Short service name used in docker-compose.yml (e.g. "reacher").

url

The URL that was attempted.

Source code in coldreach/exceptions.py
def __init__(self, service: str, url: str) -> None:
    self.service = service
    self.url = url
    super().__init__(
        f"Service '{service}' is not available at {url}.\n"
        f"Start it with:  docker compose up {service}"
    )