Improper Neutralization of Data within XPath Expressions
XPath injection occurs when user-supplied input is directly embedded into an XPath query without proper escaping or validation. An attacker can inject XPath…
XPath injection occurs when user-supplied input is directly embedded into an XPath query without proper escaping or validation. An attacker can inject XPath syntax to alter the query logic, potentially bypassing authentication, extracting unauthorized data from XML documents, or modifying query results. This weakness is particularly common in applications that use XML for configuration, data storage, or authentication.
02How It Happens
XPath is a query language for selecting nodes in XML documents. When an application constructs an XPath expression by concatenating user input directly into the query string, an attacker can inject XPath operators and functions to change the query's meaning. Unlike SQL injection, XPath has no standard parameterized query mechanism in most languages, so developers must manually escape special characters or use allowlisting. The vulnerability arises when this escaping is omitted or incomplete, allowing characters like quotes, brackets, and logical operators to break out of the intended query context.
03Real-World Impact
XPath injection can lead to authentication bypass (e.g., injecting ' or '1'='1 into a login query), unauthorized data extraction from XML documents, or logic manipulation. In systems that store user credentials, configuration data, or sensitive records in XML, a successful injection can expose all records or allow an attacker to assume any identity. The impact depends on what data the XPath query accesses and what operations the application performs based on the results.
04Vulnerable & Fixed Patterns
Vulnerable pattern
import xml.etree.ElementTree as ET
xml_data = """
<users>
<user><name>alice</name><password>secret123</password></user>
<user><name>bob</name><password>pass456</password></user>
</users>
"""
tree = ET.ElementTree(ET.fromstring(xml_data))
username = input("Enter username: ")
# Vulnerable: XPath expression built by string concatenation
xpath_query = f"//user[name='{username}']/password"
result = tree.getroot().findall(xpath_query)
if result:
print(f"Password: {result[0].text}")
else:
print("User not found")
Why it's vulnerable: The username input is directly interpolated into the XPath expression. An attacker entering ' or '1'='1 would change the query to //user[name='' or '1'='1']/password, matching all users instead of a specific one.
Fixed pattern
import xml.etree.ElementTree as ET
xml_data = """
<users>
<user><name>alice</name><password>secret123</password></user>
<user><name>bob</name><password>pass456</password></user>
</users>
"""
tree = ET.ElementTree(ET.fromstring(xml_data))
username = input("Enter username: ")
# Fixed: Iterate and compare, avoiding dynamic XPath construction
root = tree.getroot()
for user in root.findall("user"):
name_elem = user.find("name")
if name_elem is not None and name_elem.text == username:
password_elem = user.find("password")
if password_elem is not None:
print(f"Password: {password_elem.text}")
break
else:
print("User not found")
Vulnerable pattern
<?php
$xml_string = <<<XML
<users>
<user><name>alice</name><password>secret123</password></user>
<user><name>bob</name><password>pass456</password></user>
</users>
XML;
$dom = new DOMDocument();
$dom->loadXML($xml_string);
$xpath = new DOMXPath($dom);
$username = $_GET['username'];
// Vulnerable: XPath expression built by string concatenation
$query = "//user[name='" . $username . "']/password";
$result = $xpath->query($query);
if ($result->length > 0) {
echo "Password: " . $result->item(0)->nodeValue;
} else {
echo "User not found";
}
?>
Why it's vulnerable: The $username parameter is directly concatenated into the XPath query string. An attacker can inject XPath operators to alter the query logic and extract unintended data.
Fixed pattern
<?php
$xml_string = <<<XML
<users>
<user><name>alice</name><password>secret123</password></user>
<user><name>bob</name><password>pass456</password></user>
</users>
XML;
$dom = new DOMDocument();
$dom->loadXML($xml_string);
$xpath = new DOMXPath($dom);
$username = $_GET['username'];
// Fixed: Use XPath string functions to safely compare
// Escape single quotes by doubling them in XPath string literals
$escaped_username = str_replace("'", "''", $username);
$query = "//user[name='" . $escaped_username . "']/password";
$result = $xpath->query($query);
if ($result->length > 0) {
echo "Password: " . htmlspecialchars($result->item(0)->nodeValue);
} else {
echo "User not found";
}
?>
05Prevention Checklist
Avoid dynamic XPath construction. Where possible, use static XPath queries and iterate over results in application code to filter by user input.
Escape special characters. If dynamic XPath is unavoidable, escape single and double quotes by doubling them (' → '') before embedding user input.
Use allowlisting. Restrict input to a known set of safe values (e.g., usernames matching [a-zA-Z0-9_]+) and reject anything else.
Validate input type and length. Ensure input matches expected format and is not excessively long.
Apply principle of least privilege. Ensure the XML document and XPath queries access only the minimum data necessary for the operation.
Use security-focused XML libraries. Some frameworks provide safer XPath APIs; prefer them over manual string concatenation.
06Signs You May Already Be Affected
Review application logs for unusual XPath queries or XML parsing errors, particularly those containing quote characters or logical operators in user-supplied parameters. Check for unexpected data exposure in XML-backed authentication or configuration systems, or for cases where login attempts with special characters succeed when they should fail. If your application constructs XPath queries by string concatenation, audit those code paths immediately.