CWE-625: Permissive Regular Expression — Weakness Reference

01Summary

A permissive regular expression accepts a wider range of input than intended, allowing malicious or unexpected data to pass validation checks. This weakness often occurs when regex patterns are too loose, use overly broad character classes, or fail to anchor boundaries properly. The result is that attackers can bypass security controls that rely on the regex to filter dangerous input.

02How It Happens

Regular expressions are commonly used to validate email addresses, URLs, file names, phone numbers, and other structured data. When a regex pattern is written without sufficient specificity—for example, using . to match any character instead of a defined set, or forgetting to anchor the pattern with ^ and $—it becomes permissive. An attacker can craft input that technically matches the pattern but contains unexpected or malicious content. For instance, a regex meant to validate a domain name might accidentally accept input with embedded newlines, special characters, or path traversal sequences if not carefully constructed.

03Real-World Impact

Permissive regex validation can lead to injection attacks, path traversal, header injection, or bypass of access controls. For example, if an email validation regex is too loose, an attacker might inject newlines to craft a malicious email header. If a file name regex doesn't properly restrict characters, an attacker could upload a file with a path traversal sequence (e.g., ../../../etc/passwd). In authentication contexts, a permissive regex on usernames or tokens could allow attackers to forge credentials or bypass rate-limiting checks.

04Vulnerable & Fixed Patterns

Python PHP

Vulnerable pattern

import re

def validate_email(user_input):
    pattern = r"[a-zA-Z0-9]+@[a-zA-Z0-9]+"
    if re.match(pattern, user_input):
        return True
    return False

# Attacker input: "admin@example.com\nBcc: attacker@evil.com"
# This passes validation because the pattern doesn't anchor the end
email = "admin@example.com\nBcc: attacker@evil.com"
print(validate_email(email))  # Returns True — dangerous!

Why it's vulnerable:
The pattern lacks $ anchoring at the end, so it matches the beginning of the string and ignores everything after. It also doesn't require a proper domain structure, allowing newlines and other characters to slip through.

Fixed pattern

import re

def validate_email(user_input):
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    if re.match(pattern, user_input):
        return True
    return False

# Attacker input: "admin@example.com\nBcc: attacker@evil.com"
# This now correctly rejects the input
email = "admin@example.com\nBcc: attacker@evil.com"
print(validate_email(email))  # Returns False — safe!

Vulnerable pattern

<?php
function validate_filename($user_input) {
    $pattern = "/^[a-zA-Z0-9]+/";
    if (preg_match($pattern, $user_input)) {
        return true;
    }
    return false;
}

// Attacker input: "document.txt../../../etc/passwd"
// This passes because the pattern doesn't anchor the end
$filename = "document.txt../../../etc/passwd";
if (validate_filename($filename)) {
    // File is "validated" but contains path traversal
    move_uploaded_file($_FILES['upload']['tmp_name'], "/uploads/" . $filename);
}
?>

Why it's vulnerable:
The pattern lacks $ anchoring and doesn't restrict the full filename, so path traversal sequences and dangerous characters pass through.

Fixed pattern

<?php
function validate_filename($user_input) {
    $pattern = "/^[a-zA-Z0-9._-]+$/";
    if (preg_match($pattern, $user_input)) {
        return true;
    }
    return false;
}

// Attacker input: "document.txt../../../etc/passwd"
// This now correctly rejects the input
$filename = "document.txt../../../etc/passwd";
if (validate_filename($filename)) {
    move_uploaded_file($_FILES['upload']['tmp_name'], "/uploads/" . $filename);
} else {
    // Reject the upload
    die("Invalid filename");
}
?>

05Prevention Checklist

Always anchor regex patterns
with ^ at the start and $ at the end to ensure the entire input matches, not just a substring.

Be explicit about allowed characters
— use character classes like [a-zA-Z0-9] instead of . (which matches anything), and avoid overly broad patterns.

Test regex patterns thoroughly
with both valid and invalid inputs, including edge cases like newlines, special characters, and boundary conditions.

Use allowlists, not blocklists
— define exactly what *is* allowed rather than trying to block what *isn't*.

Consider using a dedicated validation library
for common patterns (email, URL, phone) rather than writing regex from scratch.

Document the intent of each regex
so future maintainers understand what it should and shouldn't match.

06Signs You May Already Be Affected

Review your application's validation logic for regex patterns that lack ^ and $ anchors, or that use overly broad character classes like . without proper context. Check logs for rejected input that should have been valid, or conversely, for accepted input that looks suspicious (e.g., containing newlines, path traversal sequences, or unexpected special characters). If you've recently had a validation bypass or injection vulnerability, audit all regex patterns in your codebase.

07Related Recent Vulnerabilities

CVE-2026-34830 Rack: Rack::Sendfile regex injection via HTTP_X_ACCEL_MAPPING header allows arbitrary file reads through nginx CVSS 5.9/10 MEDIUM CVE-2026-34763 Rack: Rack::Directory info disclosure and DoS via unescaped regex interpolation CVSS 5.3/10 MEDIUM CVE-2026-32973 OpenClaw < 2026.3.11 - Exec Allowlist Pattern Overmatch via POSIX Path Normalization CVSS 8.8/10 HIGH CVE-2026-23651 Microsoft ACI Confidential Containers Elevation of Privilege Vulnerability CVSS 6.7/10 MEDIUM CVE-2023-6544 Keycloak: authorization bypass CVSS 5.4/10 MEDIUM CVE-2020-8910 Auth Bypass in Google's Closure-Library CVSS 6.5/10 MEDIUM