CWE-180: Incorrect Behavior Order: Validate Before Canonicalize — Weakness Reference

01Summary

This weakness occurs when software validates user input *before* converting it to its canonical (standard) form. An attacker can craft input using alternate encodings, escape sequences, or representations that pass validation but transform into malicious content once normalized. A classic example is validating a file path before decoding URL-encoded characters — the validator might miss ..%2F..%2F (encoded path traversal), which becomes ../../ after decoding.

02How It Happens

The vulnerability arises from a logic-order mistake: validation happens on the "raw" input, while the application later processes a canonicalized version. Between validation and use, the input is decoded, unescaped, or normalized in some way. If the canonical form differs from what was validated, an attacker can bypass security checks by using alternate representations. Common scenarios include double-encoding, Unicode normalization, case folding, or filesystem-specific path resolution (e.g., Windows short names like PROGRA~1).

The root cause is a false assumption that validation on the raw form is sufficient. In reality, the application should validate *after* canonicalization, ensuring that the final form used by the application is the one that was checked.

03Real-World Impact

Attackers can exploit this to access restricted files, traverse directory structures, bypass allowlists, or inject malicious content. For example, a file upload filter might reject .exe files but accept .exe%00.jpg (null-byte encoded), which resolves to .exe after decoding. Similarly, path validation might block /etc/passwd but allow /%2e%2e/etc/passwd (encoded dot-dot-slash), which becomes /../etc/passwd after URL decoding. The consequences range from information disclosure to remote code execution, depending on what the canonicalized input is used for.

04Vulnerable & Fixed Patterns

Python PHP

Vulnerable pattern

import urllib.parse

def validate_filename(user_input):
    # Validate BEFORE canonicalizing
    if ".." in user_input or user_input.startswith("/"):
        raise ValueError("Invalid path")
    return user_input

def read_file(filename):
    # Canonicalize AFTER validation
    decoded = urllib.parse.unquote(filename)
    with open(f"/var/data/{decoded}", "r") as f:
        return f.read()

# Attacker passes "..%2F..%2Fetc%2Fpasswd"
# Validation sees ".." is not in the raw string (it's encoded as %2F)
# After unquote(), it becomes "../../etc/passwd" — path traversal succeeds
user_file = validate_filename("..%2F..%2Fetc%2Fpasswd")
content = read_file(user_file)

Why it's vulnerable:
The validation checks the encoded form, which doesn't contain the literal .. sequence. After URL decoding, the path traversal becomes active and bypasses the check.

Fixed pattern

import urllib.parse
from pathlib import Path

def read_file(filename):
    # Canonicalize FIRST
    decoded = urllib.parse.unquote(filename)
    
    # Validate AFTER canonicalization
    safe_base = Path("/var/data").resolve()
    requested = (safe_base / decoded).resolve()
    
    if not str(requested).startswith(str(safe_base)):
        raise ValueError("Path traversal detected")
    
    with open(requested, "r") as f:
        return f.read()

Vulnerable pattern

<?php
function validate_filename($user_input) {
    // Validate BEFORE canonicalizing
    if (strpos($user_input, "..") !== false || strpos($user_input, "/etc") !== false) {
        throw new Exception("Invalid path");
    }
    return $user_input;
}

function read_file($filename) {
    // Canonicalize AFTER validation
    $decoded = urldecode($filename);
    $content = file_get_contents("/var/data/" . $decoded);
    return $content;
}

// Attacker passes "..%2F..%2Fetc%2Fpasswd"
// Validation doesn't see "/etc" (it's encoded)
// After urldecode(), it becomes "../../etc/passwd"
$file = validate_filename("..%2F..%2Fetc%2Fpasswd");
echo read_file($file);
?>

Why it's vulnerable:
The validation runs on the URL-encoded string, which doesn't match the dangerous patterns. Once decoded, the path traversal is active.

Fixed pattern

<?php
function read_file($filename) {
    // Canonicalize FIRST
    $decoded = urldecode($filename);
    $realpath = realpath("/var/data/" . $decoded);
    
    // Validate AFTER canonicalization
    if ($realpath === false || strpos($realpath, "/var/data") !== 0) {
        throw new Exception("Path traversal detected");
    }
    
    $content = file_get_contents($realpath);
    return $content;
}
?>

05Prevention Checklist

Canonicalize first, validate second.
Always decode, unescape, and normalize input before applying security checks.

Use allowlists on canonical form.
Define what characters, patterns, or values are acceptable *after* normalization, not before.

Leverage platform APIs for path validation.
Use realpath() (PHP), Path.resolve() (Python), or equivalent to resolve paths to their true form before checking boundaries.

Be aware of multiple encoding layers.
Watch for double-encoding (e.g., %252F), Unicode normalization, case folding, and filesystem-specific tricks (Windows short names, alternate data streams).

Test with encoded payloads.
Include URL-encoded, HTML-encoded, and Unicode variants in your test cases to catch validation-order bugs.

Document canonicalization rules.
Make it explicit in code comments and design docs which transformations happen and in what order.

06Signs You May Already Be Affected

Look for file-access logs showing requests with unusual encoding patterns (e.g., %2F, %2e, %00) that successfully accessed files outside intended directories. Check for unexpected files in sensitive locations or evidence of directory traversal in web server logs. Review validation code to see if checks happen before URL decoding, HTML unescaping, or path resolution.

07Related Recent Vulnerabilities

CVE-2026-7120 @fastify/static vulnerable to Authorization Bypass via Non-Canonical URL Paths CVSS 5.3/10 MEDIUM CVE-2026-52747 ModSecurity: Multipart form-data parser silently strips embedded line breaks from form-field values, enabling request-body inspection bypass CVSS 8.6/10 HIGH CVE-2026-42462 Fedify has an LD-Signature Bypass via JSON-LD Named-Graph Restructuring CVSS 7.0/10 HIGH CVE-2026-45022 go-git: Improper parsing of specially crafted objects may lead to inconsistent interpretation compared to upstream Git CVSS 7.0/10 HIGH CVE-2026-39409 Hono has incorrect IP matching in ipRestriction() for IPv4-mapped IPv6 addresses CVSS 6.3/10 MEDIUM CVE-2026-39364 Vite has a `server.fs.deny` bypass with queries CVSS 8.2/10 HIGH CVE-2026-34786 Rack: Rack::Static header_rules bypass via URL-encoded paths CVSS 5.3/10 MEDIUM CVE-2026-24895 FrankenPHP affected by Path Confusion via Unicode casing in CGI path splitting allows execution of arbitrary files CVSS 8.9/10 HIGH