Weakness reference
CWE-185

Incorrect Regular Expression

Regular expressions are a powerful tool for validating and filtering input, but a poorly written regex can silently accept dangerous data or reject legitimate…

01Summary

Regular expressions are a powerful tool for validating and filtering input, but a poorly written regex can silently accept dangerous data or reject legitimate input. When a regex pattern doesn't accurately match its intended set of values, attackers can craft inputs that bypass validation, or legitimate users encounter false rejections. This weakness is particularly dangerous because the flaw is often subtle and hard to spot during code review.

02How It Happens

Regular expressions are notoriously easy to write incorrectly. Common mistakes include forgetting anchors (^ and $), using overly permissive character classes, failing to escape special characters, or misunderstanding quantifiers. For example, a regex intended to match only numeric input might accidentally match strings containing numbers mixed with other characters. Another frequent error is assuming a regex will reject input outside its pattern when it actually only checks whether the pattern *exists* somewhere in the input. These mistakes often go undetected because the regex works correctly for common test cases but fails on edge cases or adversarial input.

03Real-World Impact

An incorrect regex used for email validation might accept malformed addresses that cause downstream errors or bypass security checks. A regex meant to block SQL keywords might fail to catch variations in capitalization or encoding. Input validation bypasses can lead to injection attacks, path traversal, command execution, or account takeover depending on what the regex was supposed to protect. Even seemingly minor validation failures can compound when combined with other weaknesses in the application.

04Vulnerable & Fixed Patterns

Vulnerable pattern
import re

def validate_username(user_input):
    # Intended to match only alphanumeric usernames, 3-20 chars
    pattern = r'[a-zA-Z0-9]{3,20}'
    if re.search(pattern, user_input):
        return True
    return False

# Problem: re.search() finds the pattern ANYWHERE in the string
# "admin123!!!!" passes because "admin123" matches
username = "admin123!!!!"
print(validate_username(username))  # Returns True — WRONG

Why it's vulnerable:
The regex pattern lacks anchors (^ and $), so re.search() matches if the pattern appears *anywhere* in the input. An attacker can append or prepend arbitrary characters to bypass the validation.

Fixed pattern
import re

def validate_username(user_input):
    # Anchors ensure the ENTIRE string matches the pattern
    pattern = r'^[a-zA-Z0-9]{3,20}$'
    if re.match(pattern, user_input):
        return True
    return False

username = "admin123!!!!"
print(validate_username(username))  # Returns False — CORRECT
Vulnerable pattern
<?php
function validate_email($user_input) {
    // Intended to match valid email addresses
    $pattern = '/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/';
    if (preg_match($pattern, $user_input)) {
        return true;
    }
    return false;
}

// Problem: preg_match() finds the pattern ANYWHERE in the string
// "prefix_user@example.com_suffix" passes because "user@example.com" matches
$email = "prefix_user@example.com_suffix";
var_dump(validate_email($email));  // Returns true — WRONG
?>

Why it's vulnerable:
Without anchors, preg_match() succeeds if the pattern is found anywhere in the input. An attacker can embed a valid-looking email within a longer malicious string.

Fixed pattern
<?php
function validate_email($user_input) {
    // Anchors ensure the ENTIRE string matches the pattern
    $pattern = '/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/';
    if (preg_match($pattern, $user_input)) {
        return true;
    }
    return false;
}

$email = "prefix_user@example.com_suffix";
var_dump(validate_email($email));  // Returns false — CORRECT
?>

05Prevention Checklist

Always use anchors
(^ and $) in validation regexes to ensure the entire input matches, not just a substring.
Test edge cases
including empty strings, special characters, very long input, and boundary values — don't rely on happy-path tests alone.
Use a regex tester tool
(regex101.com, regexpal.com) to verify your pattern matches intended inputs and rejects invalid ones before deploying.
Consider alternatives to regex
for simple validations (e.g., built-in email validators, length checks, allowlists) — regex is powerful but error-prone.
Document the intent
of each regex with a comment explaining what it should match and what it should reject.
Review regex changes carefully
in code review — regex bugs are easy to miss and hard to spot visually.

06Signs You May Already Be Affected

Check your application logs for unexpected input patterns that should have been rejected, or for errors occurring in code that processes supposedly validated input. If you notice users reporting that legitimate input is being rejected while similar-looking malicious input passes through, a regex validation bug may be the cause. Review your validation regexes for missing anchors or overly permissive character classes.

07Related Recent Vulnerabilities