Weakness reference
CWE-20

Improper Input Validation

Improper input validation occurs when a program fails to check or correctly verify data received from users, files, APIs, or other external sources before…

01Summary

Improper input validation occurs when a program fails to check or correctly verify data received from users, files, APIs, or other external sources before using it. This weakness is foundational to many security vulnerabilities—if you don't validate input, attackers can inject malicious data, bypass logic, or trigger unintended behavior. It's one of the most common root causes of exploitable flaws.

02How It Happens

Input validation failures arise when developers assume data is safe, trustworthy, or in the expected format without explicit checks. This can happen in several ways: accepting any string length without bounds, trusting user-supplied file types, failing to check numeric ranges, or not verifying that data matches an expected pattern. The weakness becomes critical when unvalidated input flows into sensitive operations—database queries, file operations, command execution, or business logic decisions. Even "internal" data (from cookies, hidden form fields, or APIs) must be validated, because attackers can forge or manipulate these sources.

03Real-World Impact

Unvalidated input is the entry point for SQL injection, cross-site scripting (XSS), command injection, path traversal, and buffer overflows. An attacker might submit a negative quantity to a shopping cart, bypass authentication checks, upload a malicious file disguised as an image, or inject code into a database field. The consequences range from data theft and account compromise to complete system takeover, depending on where the unvalidated input is used.

04Vulnerable & Fixed Patterns

Vulnerable pattern
import sqlite3

def fetch_user_data(user_id):
    conn = sqlite3.connect('app.db')
    cursor = conn.cursor()
    # user_id comes directly from user input (e.g., request parameter)
    query = f"SELECT * FROM users WHERE id = {user_id}"
    cursor.execute(query)
    return cursor.fetchall()

Why it's vulnerable:
The user_id parameter is concatenated directly into the SQL query without validation or parameterization. An attacker can supply a malformed or malicious value (e.g., 1 OR 1=1) to alter query logic or extract unauthorized data.

Fixed pattern
import sqlite3

def fetch_user_data(user_id):
    # Validate that user_id is an integer
    try:
        user_id = int(user_id)
    except (ValueError, TypeError):
        raise ValueError("user_id must be a valid integer")
    
    conn = sqlite3.connect('app.db')
    cursor = conn.cursor()
    # Use parameterized query
    query = "SELECT * FROM users WHERE id = ?"
    cursor.execute(query, (user_id,))
    return cursor.fetchall()
Vulnerable pattern
<?php
$filename = $_GET['file'];
// No validation of filename
$content = file_get_contents('/uploads/' . $filename);
echo $content;
?>

Why it's vulnerable:
The $filename parameter is not validated or sanitized. An attacker can supply a path traversal payload (e.g., ../../etc/passwd) to read files outside the intended directory.

Fixed pattern
<?php
$filename = $_GET['file'] ?? '';

// Validate: only alphanumeric, dash, underscore, and .txt extension
if (!preg_match('/^[a-zA-Z0-9_-]+\.txt$/', $filename)) {
    die('Invalid filename');
}

// Ensure the resolved path stays within the uploads directory
$filepath = realpath('/uploads/' . $filename);
if ($filepath === false || strpos($filepath, realpath('/uploads/')) !== 0) {
    die('Access denied');
}

$content = file_get_contents($filepath);
echo htmlspecialchars($content, ENT_QUOTES, 'UTF-8');
?>

05Prevention Checklist

Define an allowlist.
For each input field, decide what values are acceptable (e.g., email format, numeric range, file extension) and reject everything else.
Validate type and length.
Check that integers are integers, strings don't exceed a reasonable length, and arrays have the expected structure.
Use parameterized queries.
Never concatenate user input into SQL, command-line, or template strings; use prepared statements or parameterized APIs.
Reject unexpected formats.
Use regex, type casting, or parsing libraries to enforce strict input format (e.g., ISO 8601 dates, valid JSON).
Validate on the server.
Client-side validation is a convenience, not a security control; always re-validate on the server.
Fail securely.
If input is invalid, reject it with a generic error message; do not echo back the invalid input or reveal why it failed.

06Signs You May Already Be Affected

Look for unexpected behavior tied to unusual input: error messages that reference database syntax, files appearing in unexpected locations, or log entries showing malformed requests that succeeded. If you find user-controlled data being used directly in queries, file paths, or command execution without validation, that's a red flag. Review access logs for patterns like repeated requests with special characters or path traversal sequences.

07Related Recent Vulnerabilities