Weakness reference
CWE-186

Overly Restrictive Regular Expression

This weakness occurs when a regular expression is written so strictly that it rejects legitimate input that should be accepted. While the intent is often to…

01Summary

This weakness occurs when a regular expression is written so strictly that it rejects legitimate input that should be accepted. While the intent is often to improve security by filtering dangerous patterns, an overly restrictive regex can break legitimate functionality, frustrate users, and pressure developers or users to bypass the validation entirely—often in unsafe ways.

02How It Happens

Regular expressions are commonly used to validate user input: email addresses, usernames, phone numbers, URLs, and file names. When a developer writes a regex that is too narrow—perhaps based on incomplete understanding of valid formats, or an attempt to be "extra safe"—it blocks valid data. Users then either cannot complete their tasks, or they find workarounds (like entering data in a different format, or disabling validation) that may introduce actual security gaps. The weakness is not that the regex itself is exploitable, but that its overly strict behavior creates a denial-of-service condition or incentivizes insecure alternatives.

03Real-World Impact

The practical consequences range from user frustration to security degradation. A site that rejects valid email addresses or international characters may lock out legitimate users. Developers who encounter such restrictions may disable validation altogether, remove the regex check, or implement a less secure alternative. In some cases, users may be forced to enter data in non-standard formats that are harder to process correctly downstream, increasing the risk of injection or parsing errors.

04Vulnerable & Fixed Patterns

Vulnerable pattern
import re

def validate_username(username):
    # Regex is too restrictive: only allows 5-8 lowercase letters
    pattern = r'^[a-z]{5,8}$'
    if re.match(pattern, username):
        return True
    return False

# Rejects valid usernames like "alice_2024", "Bob", or "user-name"

Why it's vulnerable:
The regex rejects legitimate usernames that contain numbers, underscores, hyphens, or uppercase letters. Users may then bypass the validation or use insecure workarounds.

Fixed pattern
import re

def validate_username(username):
    # Allows alphanumeric, underscore, hyphen; 3-32 characters
    # Based on common username standards
    pattern = r'^[a-zA-Z0-9_-]{3,32}$'
    if re.match(pattern, username):
        return True
    return False

# Accepts "alice_2024", "Bob", "user-name", etc.
Vulnerable pattern
<?php
function validate_email($email) {
    // Overly restrictive: rejects many valid email formats
    $pattern = '/^[a-z]+@[a-z]+\.[a-z]{2,3}$/';
    if (preg_match($pattern, $email)) {
        return true;
    }
    return false;
}

// Rejects "user+tag@example.co.uk", "User@Example.COM", etc.
?>

Why it's vulnerable:
The regex rejects valid emails with uppercase letters, plus signs, multiple domain levels, or numeric characters. Users with such addresses cannot register or log in.

Fixed pattern
<?php
function validate_email($email) {
    // Use PHP's built-in filter, which handles RFC 5322 correctly
    if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
        return true;
    }
    return false;
}

// Accepts standard email formats per RFC 5322
?>

05Prevention Checklist

Test your regex against real-world valid input
before deployment. Include international characters, numbers, special characters, and edge cases (e.g., emails with plus signs, usernames with underscores).
Use built-in validation functions
where available (e.g., filter_var() in PHP, email-validator libraries in Python) rather than writing custom regexes for common formats.
Document the regex intent and constraints
in code comments, and review it with a colleague to catch overly narrow patterns.
Provide clear error messages
that explain what format is accepted, so users understand why their input was rejected and can adjust accordingly.
Prefer allowlisting over denylisting
, but ensure your allowlist is based on actual standards (RFC 5322 for email, etc.), not assumptions.
Monitor user feedback and failed validation logs
for patterns of legitimate rejections, and adjust the regex if needed.

06Signs You May Already Be Affected

Check your application logs for repeated validation failures on input that appears legitimate (e.g., many failed login attempts with valid-looking email addresses, or support tickets from users unable to enter their name or address). If users report that they cannot complete forms despite entering data in standard formats, or if you see evidence of workarounds (e.g., users entering "john.doe" instead of "john_doe"), your regex may be too restrictive.