This weakness occurs when input filtering or transformation logic unintentionally converts multiple different inputs into a single output value, and that…
This weakness occurs when input filtering or transformation logic unintentionally converts multiple different inputs into a single output value, and that output is not validated independently. An attacker can exploit this by crafting input that passes the filter but collapses into a dangerous value during processing. The result is a bypass of intended security controls.
02How It Happens
The vulnerability arises when a developer applies a transformation (such as normalization, encoding removal, or case conversion) to user input with the assumption that the transformation is reversible or that it preserves safety properties. However, if multiple distinct inputs can collapse into the same output, and that output is then used in a security-sensitive context without re-validation, an attacker can supply an input that appears safe before transformation but becomes dangerous after it.
Common scenarios include:
- Path traversal via normalization: Multiple representations of the same path (e.g., ..%2f, ..%5c, ..\\) collapse into ../ after decoding and normalization, bypassing a blacklist that only checked the original form.
- Encoding bypass: Double-encoded or mixed-case input collapses into a dangerous string after decoding, evading a filter applied to the pre-decoded form.
- Case folding: Input filtered in one case collapses into a different case during processing, bypassing a case-sensitive blacklist.
03Real-World Impact
An attacker can bypass input validation and execute unintended operations: accessing files outside intended directories, injecting SQL or script code, executing arbitrary commands, or escalating privileges. The impact depends on the context—file access vulnerabilities can lead to information disclosure, while injection flaws can enable code execution or data manipulation.
04Vulnerable & Fixed Patterns
Vulnerable pattern
import os
# Assume a whitelist of allowed directories
allowed_dirs = ['/var/www/uploads']
def serve_file(user_path):
# Filter: remove ".." from the input
sanitized = user_path.replace('..', '')
# Construct full path
full_path = os.path.join(allowed_dirs[0], sanitized)
# Open and serve the file
with open(full_path, 'r') as f:
return f.read()
# Attacker input: "....//secret.txt" collapses to "../secret.txt" after filter
result = serve_file('....//secret.txt')
Why it's vulnerable: The filter removes .. but doesn't account for the fact that ....// contains .. twice. After removal, it becomes //, which when joined with the base path, can still traverse upward. The sanitized path is not re-validated after transformation.
Fixed pattern
import os
from pathlib import Path
allowed_dirs = ['/var/www/uploads']
def serve_file(user_path):
# Resolve the full path and check it is within allowed directory
base = Path(allowed_dirs[0]).resolve()
requested = (base / user_path).resolve()
# Validate that resolved path is within the allowed directory
if not str(requested).startswith(str(base)):
raise ValueError("Path traversal attempt detected")
with open(requested, 'r') as f:
return f.read()
Vulnerable pattern
<?php
// Whitelist of allowed file names
$allowed = ['profile.php', 'settings.php'];
function load_page($filename) {
// Filter: remove null bytes
$clean = str_replace("\0", '', $filename);
// Check against whitelist
if (!in_array($clean, $allowed)) {
die('Access denied');
}
// Include the file
include($clean);
}
// Attacker input: "profile.php\0.txt" passes the whitelist check
// but the null byte may be stripped by the filesystem, loading "profile.php"
load_page('profile.php\0.txt');
?>
Why it's vulnerable: The null byte is removed after the whitelist check, not before. An attacker can append a null byte to a filename to bypass the whitelist, and the filesystem may ignore the null byte during file operations, loading an unintended file.
Fixed pattern
<?php
$allowed = ['profile.php', 'settings.php'];
function load_page($filename) {
// Validate before any transformation
if (!in_array($filename, $allowed, true)) {
die('Access denied');
}
// Ensure no null bytes or other dangerous characters
if (strpos($filename, "\0") !== false || !preg_match('/^[a-z0-9._-]+$/i', $filename)) {
die('Invalid filename');
}
include($filename);
}
load_page('profile.php');
?>
05Prevention Checklist
Validate after transformation: Always re-validate input after any normalization, decoding, or case conversion. Do not assume the filter is sufficient before transformation.
Use allowlists, not blacklists: Define what is acceptable (e.g., allowed characters, paths, or values) rather than trying to block dangerous patterns.
Canonicalize early: Convert input to a canonical form (e.g., fully decoded, lowercase, normalized path) once, then validate that canonical form.
Avoid multi-step filtering: Do not apply multiple filters in sequence expecting them to compose safely. A single, well-defined validation step is more reliable.
Test edge cases: Specifically test inputs that might collapse into the same value after transformation (e.g., .., ...., %2e%2e, mixed case).
Use framework-provided functions: Rely on language and framework functions designed for safe path handling, file inclusion, and query construction rather than custom filtering.
06Signs You May Already Be Affected
Look for unexpected file access, unusual include/require statements in logs, or files being loaded from outside intended directories. Check for evidence of path traversal attempts in access logs (e.g., requests with .., encoded dots, or mixed-case variations). Review custom input filtering logic for assumptions about the safety of transformations.