Skip to content

re: avoid per-call mark-array allocation for patterns with no capturing groups #150717

@gaborbernat

Description

@gaborbernat

Feature or enhancement

Every match, search, or fullmatch on a pattern with no capturing groups allocates capture-group bookkeeping, then frees it without ever reading it. Group-less patterns are common in validation and scanning code, so this runs often.

Examples:

  • Checking the format of millions of records during an import, e.g. re.match(r"\d{4}-\d{2}-\d{2}", value).
  • Scanning each log line with re.search(r"ERROR|WARN", line).
  • Routers and frameworks testing many small patterns per request.

Proposed change: skip that allocation when a pattern has no capturing groups. Patterns with groups stay untouched, and results are identical.

On a local optimized build, re.match and re.search run 11 to 13 percent faster for group-less patterns on short inputs, with no change for patterns that use groups.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.16new features, bugs and security fixesperformancePerformance or resource usagetopic-regextype-featureA feature request or enhancement
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions