A permissive regular expression accepts a wider range of input than intended, allowing malicious or unexpected data to pass validation checks. This weakness…
A permissive regular expression accepts a wider range of input than intended, allowing malicious or unexpected data to pass validation checks. This weakness often occurs when regex patterns are too loose, use overly broad character classes, or fail to anchor boundaries properly. The result is that attackers can bypass security controls that rely on the regex to filter dangerous input.
02How It Happens
Regular expressions are commonly used to validate email addresses, URLs, file names, phone numbers, and other structured data. When a regex pattern is written without sufficient specificity—for example, using . to match any character instead of a defined set, or forgetting to anchor the pattern with ^ and $—it becomes permissive. An attacker can craft input that technically matches the pattern but contains unexpected or malicious content. For instance, a regex meant to validate a domain name might accidentally accept input with embedded newlines, special characters, or path traversal sequences if not carefully constructed.
03Real-World Impact
Permissive regex validation can lead to injection attacks, path traversal, header injection, or bypass of access controls. For example, if an email validation regex is too loose, an attacker might inject newlines to craft a malicious email header. If a file name regex doesn't properly restrict characters, an attacker could upload a file with a path traversal sequence (e.g., ../../../etc/passwd). In authentication contexts, a permissive regex on usernames or tokens could allow attackers to forge credentials or bypass rate-limiting checks.
04Vulnerable & Fixed Patterns
Vulnerable pattern
import re
def validate_email(user_input):
pattern = r"[a-zA-Z0-9]+@[a-zA-Z0-9]+"
if re.match(pattern, user_input):
return True
return False
# Attacker input: "admin@example.com\nBcc: attacker@evil.com"
# This passes validation because the pattern doesn't anchor the end
email = "admin@example.com\nBcc: attacker@evil.com"
print(validate_email(email)) # Returns True — dangerous!
Why it's vulnerable: The pattern lacks $ anchoring at the end, so it matches the beginning of the string and ignores everything after. It also doesn't require a proper domain structure, allowing newlines and other characters to slip through.
Fixed pattern
import re
def validate_email(user_input):
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
if re.match(pattern, user_input):
return True
return False
# Attacker input: "admin@example.com\nBcc: attacker@evil.com"
# This now correctly rejects the input
email = "admin@example.com\nBcc: attacker@evil.com"
print(validate_email(email)) # Returns False — safe!
Vulnerable pattern
<?php
function validate_filename($user_input) {
$pattern = "/^[a-zA-Z0-9]+/";
if (preg_match($pattern, $user_input)) {
return true;
}
return false;
}
// Attacker input: "document.txt../../../etc/passwd"
// This passes because the pattern doesn't anchor the end
$filename = "document.txt../../../etc/passwd";
if (validate_filename($filename)) {
// File is "validated" but contains path traversal
move_uploaded_file($_FILES['upload']['tmp_name'], "/uploads/" . $filename);
}
?>
Why it's vulnerable: The pattern lacks $ anchoring and doesn't restrict the full filename, so path traversal sequences and dangerous characters pass through.
Fixed pattern
<?php
function validate_filename($user_input) {
$pattern = "/^[a-zA-Z0-9._-]+$/";
if (preg_match($pattern, $user_input)) {
return true;
}
return false;
}
// Attacker input: "document.txt../../../etc/passwd"
// This now correctly rejects the input
$filename = "document.txt../../../etc/passwd";
if (validate_filename($filename)) {
move_uploaded_file($_FILES['upload']['tmp_name'], "/uploads/" . $filename);
} else {
// Reject the upload
die("Invalid filename");
}
?>
05Prevention Checklist
Always anchor regex patterns with ^ at the start and $ at the end to ensure the entire input matches, not just a substring.
Be explicit about allowed characters — use character classes like [a-zA-Z0-9] instead of . (which matches anything), and avoid overly broad patterns.
Test regex patterns thoroughly with both valid and invalid inputs, including edge cases like newlines, special characters, and boundary conditions.
Use allowlists, not blocklists — define exactly what *is* allowed rather than trying to block what *isn't*.
Consider using a dedicated validation library for common patterns (email, URL, phone) rather than writing regex from scratch.
Document the intent of each regex so future maintainers understand what it should and shouldn't match.
06Signs You May Already Be Affected
Review your application's validation logic for regex patterns that lack ^ and $ anchors, or that use overly broad character classes like . without proper context. Check logs for rejected input that should have been valid, or conversely, for accepted input that looks suspicious (e.g., containing newlines, path traversal sequences, or unexpected special characters). If you've recently had a validation bypass or injection vulnerability, audit all regex patterns in your codebase.