Skip to content

Pattern Generation API

Generates likely email addresses from a person's name and a company domain. When known emails are available, the domain's format is inferred first so only targeted guesses are produced.


Pattern generation

coldreach.generate.patterns

Email pattern generator — produces candidate addresses from a person's name + domain.

Given a full name like "John Smith" and domain "acme.com", generates the 12 most common professional email formats used by B2B companies:

john@acme.com john.smith@acme.com jsmith@acme.com j.smith@acme.com smithj@acme.com smith.j@acme.com johnsmith@acme.com smith@acme.com johns@acme.com john-smith@acme.com j-smith@acme.com js@acme.com

Names are normalised: accents stripped, hyphenated names split, suffixes (Jr, Sr, III, etc.) removed before pattern expansion.

Usage
from coldreach.generate.patterns import generate_patterns

candidates = generate_patterns("John Smith", "acme.com")
# → [EmailPattern(email="john@acme.com", format_name="first"), ...]

EmailPattern dataclass

EmailPattern(email, format_name)

A single generated email candidate.

Attributes:

Name Type Description
email str

The generated email address (already lowercased).

format_name str

Short identifier for the pattern (e.g. "first.last").

generate_patterns

generate_patterns(full_name, domain)

Generate candidate email addresses for full_name at domain.

Parameters:

Name Type Description Default
full_name str

The person's full name, e.g. "John Smith".

required
domain str

The company domain, e.g. "acme.com".

required

Returns:

Type Description
list[EmailPattern]

Deduplicated list of candidates ordered from most-common to least. Empty list if name cannot be parsed (e.g. empty string).

Examples:

>>> patterns = generate_patterns("John Smith", "acme.com")
>>> [p.format_name for p in patterns[:3]]
['first', 'first.last', 'flast']
Source code in coldreach/generate/patterns.py
def generate_patterns(full_name: str, domain: str) -> list[EmailPattern]:
    """Generate candidate email addresses for *full_name* at *domain*.

    Parameters
    ----------
    full_name:
        The person's full name, e.g. ``"John Smith"``.
    domain:
        The company domain, e.g. ``"acme.com"``.

    Returns
    -------
    list[EmailPattern]
        Deduplicated list of candidates ordered from most-common to least.
        Empty list if name cannot be parsed (e.g. empty string).

    Examples
    --------
    >>> patterns = generate_patterns("John Smith", "acme.com")
    >>> [p.format_name for p in patterns[:3]]
    ['first', 'first.last', 'flast']
    """
    domain = domain.strip().lower().removeprefix("www.")
    first, last = _parse_name(full_name)

    if not first:
        return []

    f = first
    l = last  # noqa: E741
    fi = f[0] if f else ""
    li = l[0] if l else ""

    # Build candidates in priority order (most common B2B formats first)
    raw: list[tuple[str, str]] = []

    if l:
        raw = [
            (f"{f}", "first"),
            (f"{f}.{l}", "first.last"),
            (f"{fi}{l}", "flast"),
            (f"{fi}.{l}", "f.last"),
            (f"{l}{fi}", "lastf"),
            (f"{l}.{fi}", "last.f"),
            (f"{f}{l}", "firstlast"),
            (f"{l}", "last"),
            (f"{f}{li}", "firsts"),  # e.g. johns (first + last initial)
            (f"{f}-{l}", "first-last"),
            (f"{fi}-{l}", "f-last"),
            (f"{fi}{li}", "initials"),
        ]
    else:
        # Single-token name — only generate what makes sense
        raw = [
            (f"{f}", "first"),
        ]

    # Deduplicate while preserving order
    seen: set[str] = set()
    results: list[EmailPattern] = []
    for local, fmt in raw:
        if not local or local in seen:
            continue
        seen.add(local)
        results.append(EmailPattern(email=f"{local}@{domain}", format_name=fmt))

    return results

generate_role_emails

generate_role_emails(domain)

Generate common role-based email candidates for domain.

Returns candidates like info@domain.com, sales@domain.com. These are low-confidence guesses — always verify before using.

Parameters:

Name Type Description Default
domain str

The company domain, e.g. "acme.com".

required

Returns:

Type Description
list[EmailPattern]

Role email candidates with format_name like "role:info".

Source code in coldreach/generate/patterns.py
def generate_role_emails(domain: str) -> list[EmailPattern]:
    """Generate common role-based email candidates for *domain*.

    Returns candidates like ``info@domain.com``, ``sales@domain.com``.
    These are low-confidence guesses — always verify before using.

    Parameters
    ----------
    domain:
        The company domain, e.g. ``"acme.com"``.

    Returns
    -------
    list[EmailPattern]
        Role email candidates with ``format_name`` like ``"role:info"``.
    """
    domain = domain.strip().lower().removeprefix("www.")
    return [
        EmailPattern(email=f"{role}@{domain}", format_name=f"role:{role}") for role in _ROLE_LOCALS
    ]

most_likely_format

most_likely_format(known_emails, domain)

Infer the most common email format from a list of known addresses.

Useful when you already have one confirmed email at a domain and want to generate candidates for other people using the same format.

Parameters:

Name Type Description Default
known_emails list[str]

List of confirmed email addresses at the domain.

required
domain str

The domain to analyse.

required

Returns:

Type Description
str | None

The format_name of the most common pattern, or None if it cannot be determined.

Examples:

>>> most_likely_format(["john.smith@acme.com", "jane.doe@acme.com"], "acme.com")
'first.last'
Source code in coldreach/generate/patterns.py
def most_likely_format(known_emails: list[str], domain: str) -> str | None:
    """Infer the most common email format from a list of known addresses.

    Useful when you already have one confirmed email at a domain and want
    to generate candidates for other people using the same format.

    Parameters
    ----------
    known_emails:
        List of confirmed email addresses at the domain.
    domain:
        The domain to analyse.

    Returns
    -------
    str | None
        The ``format_name`` of the most common pattern, or ``None`` if
        it cannot be determined.

    Examples
    --------
    >>> most_likely_format(["john.smith@acme.com", "jane.doe@acme.com"], "acme.com")
    'first.last'
    """
    from collections import Counter

    format_counts: Counter[str] = Counter()

    for email in known_emails:
        if "@" not in email:
            continue
        local, email_domain = email.lower().rsplit("@", 1)
        if email_domain != domain:
            continue

        # Classify the local part into a pattern
        if "." in local:
            parts = local.split(".", 1)
            if len(parts[0]) == 1:
                format_counts["f.last"] += 1
            elif len(parts[1]) == 1:
                format_counts["last.f"] += 1
            else:
                format_counts["first.last"] += 1
        elif "-" in local:
            parts = local.split("-", 1)
            if len(parts[0]) == 1:
                format_counts["f-last"] += 1
            else:
                format_counts["first-last"] += 1
        elif len(local) <= 3:
            format_counts["initials"] += 1
        elif len(local) <= 6:
            format_counts["flast"] += 1
        else:
            format_counts["firstlast"] += 1

    if not format_counts:
        return None
    return format_counts.most_common(1)[0][0]

Format learner

coldreach.generate.learner

Domain email format learner.

Infers a company's email format from confirmed addresses at that domain, then generates targeted candidates for a specific person — only the format(s) that match the domain's known pattern.

This avoids the shotgun approach of generating all 12 variants and running each through SMTP verification (expensive and likely to trigger rate limits).

Confidence tiers: - Known format match → confidence_hint = 10 (format confirmed from real emails) - Blind guess → confidence_hint = 5 (no known emails, guessing top-3 formats)

Example
from coldreach.generate.learner import targeted_patterns

# Domain uses "first.last" format (inferred from jane.doe@acme.com)
patterns = targeted_patterns("John Smith", "acme.com", ["jane.doe@acme.com"])
# → [EmailPattern("john.smith@acme.com", "first.last")]

# Domain format unknown — return top-3 guesses
patterns = targeted_patterns("John Smith", "acme.com", [])
# → [EmailPattern("john.smith@acme.com", "first.last"),
#    EmailPattern("jsmith@acme.com", "flast"),
#    EmailPattern("john@acme.com", "first")]

learn_format

learn_format(known_emails, domain)

Return the most likely email format_name for domain.

Analyses the local parts of known_emails and returns the format_name (e.g. "first.last", "flast") that best describes them.

Returns None if the format cannot be determined (too few emails, or local parts are too ambiguous like role addresses info@, hr@).

Source code in coldreach/generate/learner.py
def learn_format(known_emails: list[str], domain: str) -> str | None:
    """Return the most likely email format_name for *domain*.

    Analyses the local parts of *known_emails* and returns the format_name
    (e.g. ``"first.last"``, ``"flast"``) that best describes them.

    Returns ``None`` if the format cannot be determined (too few emails,
    or local parts are too ambiguous like role addresses ``info@``, ``hr@``).
    """
    fmt = most_likely_format(known_emails, domain)
    if fmt:
        logger.debug("Format learner: %s%s (from %d email(s))", domain, fmt, len(known_emails))
    else:
        logger.debug("Format learner: %s → unknown (from %d email(s))", domain, len(known_emails))
    return fmt

targeted_patterns

targeted_patterns(
    full_name, domain, known_emails, *, max_fallback=3
)

Generate targeted email candidates for full_name at domain.

When a domain format can be inferred from known_emails, only patterns matching that format (plus close companions) are returned.

When the format is unknown, the max_fallback most common B2B formats are returned.

Parameters:

Name Type Description Default
full_name str

Person's full name, e.g. "John Smith".

required
domain str

Company domain, e.g. "acme.com".

required
known_emails list[str]

Confirmed email addresses at domain (used to infer format).

required
max_fallback int

Number of fallback formats to try when domain format is unknown.

3

Returns:

Type Description
list[EmailPattern]

Targeted candidates, deduplicated, ordered by confidence. Empty if full_name cannot be parsed.

Source code in coldreach/generate/learner.py
def targeted_patterns(
    full_name: str,
    domain: str,
    known_emails: list[str],
    *,
    max_fallback: int = 3,
) -> list[EmailPattern]:
    """Generate targeted email candidates for *full_name* at *domain*.

    When a domain format can be inferred from *known_emails*, only patterns
    matching that format (plus close companions) are returned.

    When the format is unknown, the *max_fallback* most common B2B formats
    are returned.

    Parameters
    ----------
    full_name:
        Person's full name, e.g. ``"John Smith"``.
    domain:
        Company domain, e.g. ``"acme.com"``.
    known_emails:
        Confirmed email addresses at *domain* (used to infer format).
    max_fallback:
        Number of fallback formats to try when domain format is unknown.

    Returns
    -------
    list[EmailPattern]
        Targeted candidates, deduplicated, ordered by confidence.
        Empty if *full_name* cannot be parsed.
    """
    all_patterns = generate_patterns(full_name, domain)
    if not all_patterns:
        return []

    # Index patterns by format_name for quick lookup
    by_format: dict[str, EmailPattern] = {p.format_name: p for p in all_patterns}

    inferred = learn_format(known_emails, domain) if known_emails else None

    if inferred and inferred in by_format:
        # Pick the inferred format + any companions
        selected_formats = [inferred] + [
            f for f in _COMPANION_FORMATS.get(inferred, []) if f in by_format
        ]
    else:
        # No known format — use fallback list
        selected_formats = [f for f in _FALLBACK_FORMATS if f in by_format][:max_fallback]

    # Build final list, preserving order, deduplicating by email address
    seen_emails: set[str] = set()
    result: list[EmailPattern] = []
    for fmt in selected_formats:
        pat = by_format.get(fmt)
        if pat and pat.email not in seen_emails:
            seen_emails.add(pat.email)
            result.append(pat)

    logger.debug(
        "Learner generated %d candidate(s) for '%s' at %s (inferred: %s)",
        len(result),
        full_name,
        domain,
        inferred or "unknown",
    )
    return result