This weakness occurs when a program validates user input before sanitizing or filtering it, rather than the other way around. If the sanitization step…
This weakness occurs when a program validates user input *before* sanitizing or filtering it, rather than the other way around. If the sanitization step accidentally reintroduces dangerous content—or if validation rules are too loose—malicious data can slip through. The order matters: filter first, then validate the result.
02How It Happens
Most developers intuitively validate input early: check that a username is the right length, that an email looks valid, that a number is in range. But validation alone doesn't remove dangerous characters or code. If sanitization happens *after* validation, a carefully crafted input can pass validation, then be transformed by the filter in a way that creates a vulnerability. For example, an input might pass a length check, then be processed by a filter that removes certain characters—but the removal itself could create a valid SQL command or script tag from what was previously benign.
The core issue is that validation and filtering serve different purposes. Validation answers "is this the right shape?" Filtering answers "is this safe to use?" Doing validation first assumes the input is already safe, which it isn't.
03Real-World Impact
An attacker can craft input that passes validation checks but becomes dangerous after filtering. This can lead to SQL injection, cross-site scripting (XSS), command injection, or other code execution vulnerabilities. For instance, a comment field might validate that the input is under 500 characters, then apply a filter that removes <script> tags—but if the filter is naive, an attacker might use encoding or nested tags to bypass it, resulting in stored XSS. The validation step gave false confidence that the input was safe.
04Vulnerable & Fixed Patterns
Vulnerable pattern
def process_comment(user_input):
# Validate first: check length
if len(user_input) > 500:
raise ValueError("Comment too long")
# Then filter: remove <script> tags (naive approach)
filtered = user_input.replace("<script>", "").replace("</script>", "")
# Store the filtered comment
db.store_comment(filtered)
return filtered
Why it's vulnerable: The validation passes because the input is under 500 characters. However, the filter only removes literal <script> tags. An attacker can use <SCRIPT> (uppercase), <scr<script>ipt> (nested), or HTML entities to bypass the filter and inject a script tag into the database.
Fixed pattern
import html
def process_comment(user_input):
# Filter first: escape HTML entities
filtered = html.escape(user_input)
# Then validate: check length of filtered result
if len(filtered) > 500:
raise ValueError("Comment too long after sanitization")
# Store the safe, filtered comment
db.store_comment(filtered)
return filtered
Vulnerable pattern
function process_comment($user_input) {
// Validate first: check length
if (strlen($user_input) > 500) {
throw new Exception("Comment too long");
}
// Then filter: remove <script> tags (naive approach)
$filtered = str_replace(array("<script>", "</script>"), "", $user_input);
// Store the filtered comment
$db->query("INSERT INTO comments (text) VALUES ('$filtered')");
return $filtered;
}
Why it's vulnerable: The validation passes, but the filter only removes exact <script> tags. An attacker can use case variations, HTML encoding, or nested tags to bypass the filter and inject malicious code.
Fixed pattern
function process_comment($user_input) {
// Filter first: escape HTML entities
$filtered = htmlspecialchars($user_input, ENT_QUOTES, 'UTF-8');
// Then validate: check length of filtered result
if (strlen($filtered) > 500) {
throw new Exception("Comment too long after sanitization");
}
// Use prepared statement to prevent SQL injection
$stmt = $db->prepare("INSERT INTO comments (text) VALUES (?)");
$stmt->bind_param("s", $filtered);
$stmt->execute();
return $filtered;
}
05Prevention Checklist
Filter before validation: Always sanitize or escape user input *before* checking it against validation rules. This ensures validation operates on safe data.
Use context-appropriate encoding: HTML-escape for display, SQL parameterization for queries, URL-encode for URLs. Don't rely on generic string replacement.
Validate the filtered result: After sanitization, re-validate to ensure the output meets your requirements (length, format, etc.).
Avoid blacklist-based filters: Don't try to remove "bad" patterns like <script>. Use whitelist-based allowlisting or context-aware encoding instead.
Test filter bypass scenarios: Include test cases with uppercase variants, nested tags, HTML entities, and other encoding tricks to verify your filter is robust.
Use established libraries: Rely on well-maintained sanitization libraries (e.g., html.escape in Python, htmlspecialchars() in PHP) rather than writing custom filters.
06Signs You May Already Be Affected
Review your codebase for validation logic that runs before sanitization. Look for patterns where input length or format is checked, then passed to a string-replacement filter. Check your logs for unusual HTML entities, repeated tag patterns, or encoded characters in user-submitted content—these may indicate bypass attempts. If you've had XSS or injection vulnerabilities in the past, audit whether the fix was a better filter or a reordering of validation and filtering steps.