Weakness reference
CWE-776

Improper Restriction of Recursive Entity References in DTDs (XML Entity Expansion)

This weakness occurs when an application parses XML documents without limiting how many times entity references can be nested or expanded. An attacker can…

01Summary

This weakness occurs when an application parses XML documents without limiting how many times entity references can be nested or expanded. An attacker can craft a malicious XML file with deeply recursive entity definitions that cause the parser to consume exponential amounts of memory and CPU, crashing the application or making it unresponsive. This is a form of denial-of-service attack sometimes called a "billion laughs" or "XML bomb."

02How It Happens

XML allows documents to define custom entities—named placeholders that expand to text or other entities when parsed. If an entity definition references itself or chains through other entities without restriction, a parser will recursively expand each reference. A carefully constructed DTD (Document Type Definition) can define a small entity that references itself multiple times, then reference that entity multiple times in a larger entity, creating exponential growth. When the parser processes the document, it expands all these references, rapidly consuming memory until the system runs out of resources.

The vulnerability exists because many XML parsers enable entity expansion by default and do not enforce limits on recursion depth, entity size, or total expansion ratio. Developers often assume that XML parsing is a safe operation and do not configure parser security settings.

03Real-World Impact

An attacker can send a specially crafted XML document to any application that parses untrusted XML input—such as a web service accepting file uploads, API endpoints processing XML payloads, or document processors. The application will hang or crash while attempting to expand the entities, denying service to legitimate users. In multi-tenant environments, a single malicious XML file can affect all users of the service. Recovery typically requires restarting the affected process or service.

04Vulnerable & Fixed Patterns

Vulnerable pattern
import xml.etree.ElementTree as ET

# Receives untrusted XML from user input
xml_data = request.files['document'].read()

# Parse without entity expansion restrictions
tree = ET.parse(xml_data)
root = tree.getroot()
process_data(root)

Why it's vulnerable:
The default ElementTree parser expands all entity references without limits. A malicious XML document with recursive entity definitions will cause exponential memory consumption during parsing.

Fixed pattern
import xml.etree.ElementTree as ET

xml_data = request.files['document'].read()

# Disable entity expansion and external DTD processing
parser = ET.XMLParser()
parser.entity = {}  # Disable entity expansion
parser.default_handler = None

# Alternatively, use defusedxml library (recommended)
from defusedxml.ElementTree import parse as safe_parse
tree = safe_parse(xml_data)
root = tree.getroot()
process_data(root)
Vulnerable pattern
<?php
// Receives untrusted XML from user input
$xml_string = file_get_contents($_FILES['document']['tmp_name']);

// Parse without entity restrictions
$dom = new DOMDocument();
$dom->load($xml_string);
process_xml($dom);
?>

Why it's vulnerable:
By default, DOMDocument will expand entity references without limits. A recursive entity definition in the DTD will cause the parser to consume exponential memory.

Fixed pattern
<?php
$xml_string = file_get_contents($_FILES['document']['tmp_name']);

// Disable external entity loading and DTD processing
$dom = new DOMDocument();
libxml_disable_entity_loader(true);
$dom->load($xml_string, LIBXML_NOENT | LIBXML_DTDLOAD);

// Or use SimpleXML with restrictions
libxml_set_streams_context(stream_context_create([
    'http' => ['timeout' => 1]
]));
$xml = simplexml_load_file($xml_string, null, LIBXML_NOENT);
process_xml($xml);
?>

05Prevention Checklist

Disable external entity processing
in your XML parser configuration; set FEATURE_SECURE_PROCESSING or equivalent to true.
Disable DTD processing entirely
if your application does not require DTD validation; use parser options like LIBXML_NOENT or XMLConstants.FEATURE_SECURE_PROCESSING.
Use a hardened XML library
such as defusedxml (Python) or libxml2 with security patches (PHP), which disable dangerous features by default.
Validate XML schema
against a strict, pre-defined schema before processing; reject documents that do not conform.
Set resource limits
on the XML parser: maximum entity expansion depth, maximum entity size, and maximum total document size.
Sanitize and validate file uploads
before parsing; reject files that are suspiciously large or contain DTD declarations if not required.

06Signs You May Already Be Affected

Monitor application logs and system resource usage for sudden spikes in CPU or memory consumption coinciding with XML file uploads or API requests. Check for parser errors or timeouts when processing specific XML documents. If your application has experienced unexplained outages or slowdowns after accepting XML input from external sources, investigate whether entity expansion attacks may have occurred.

07Related Recent Vulnerabilities