-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
Description
Version
v17.5.0
Platform
Windows 10
Subsystem
http
What steps will reproduce the bug?
Run this http
server:
import http from "http";
const name = `Rock & roll 音楽 («🎵🎶»).txt`; // Also is used as the response body
const defaultCdHeader = getContentDisposition(name);
// console.log(defaultCdHeader === Buffer.from(`inline; filename="${name}"`).toString("binary")); // true
const host = "localhost";
const port = 8000;
const origin = `http://${host}:${port}`;
const server = http.createServer(requestListener);
server.listen(port, host, () => {
console.log("Server is running on: " + origin + " (open the web page)");
console.log("Downloads the file: " + origin + "/?dl=1");
console.log("Remove Transfer-Encoding: " + origin + "/?te=0" + " (open the web page)");
console.log("Downloads w/o T-E header: " + origin + "/?dl=1&te=0");
});
function requestListener(req, res) {
let cdHeader = defaultCdHeader;
const {dl, te} = Object.fromEntries(new URL(origin + req.url).searchParams.entries());
if (dl === "1") {
cdHeader = getContentDisposition(name, {type: "attachment"});
}
res.setHeader("Content-Disposition", cdHeader);
// The header must be a `ByteString`. For example, I can't do that:
// res.setHeader("Header-X", name); // TypeError [ERR_INVALID_CHAR]: Invalid character in header content ["Header-X"]
if (te === "0") { // Note: "Transfer-Encoding: chunked" is set by default.
res.removeHeader("Transfer-Encoding"); // Any other TE header's value has the same effect as the header removing
const byteCount = new TextEncoder().encode(name).length;
res.setHeader("Content-Length", byteCount);
}
res.setHeader("Content-Type", "text/html; charset=utf-8");
res.writeHead(200);
res.end(name);
}
// Note: In case of this issue you can ignore the follow function code. It just returns C-D header as `ByteString`.
/**
* Simple implementation for getting "Content-Disposition" header from filename
* @example
* // By default, it produces the same result as it (with replacing all double-quotes by underscore):
* Buffer.from(`inline; filename="${name}"`).toString("binary")
*
* @param {string} name
* @param {Object} opts
* @param {"inline"|"attachment"} [opts.type="inline"]
* @param {Boolean} [opts.encoded=false]
* @param {Boolean} [opts.filename=true]
* @param {String} [opts.replacer="_"]
* @return {string}
*/
function getContentDisposition(name, opts = {}) {
const {
type, encoded, filename, replacer
} = Object.assign({type: "inline", encoded: false, filename: true, replacer: "_"}, opts);
const fixName = (name) => name.replaceAll(`"`, replacer); // The most trivial fix, since it uses the quoted filename. Prevents from the incorrect header parsing. For, example: `";"` (3 chars in ` quotes).
const encodeMap = new Map([["'", "%27"], ["(", "%28"], [")", "%29"], ["*", "%30"]]); // Required to escape it for old browsers (For example, a Chromium of 2013)
const getEncodedName = (name) => encodeURIComponent(name).replaceAll(/['()*]/g, (val) => encodeMap.get(val));
const encodedStr = encoded ? `filename*=UTF-8''${getEncodedName(name)}` : "";
const filenameStr = filename ? `filename="${fixName(name)}"` : "";
const header = [type, filenameStr, encodedStr].filter(x => x).join("; ");
return Buffer.from(header).toString("binary");
}
- Click on the link http://localhost:8000/?dl=1 to download a file with
Rock & roll 音楽 («🎵🎶»).txt
name. - Click on the link http://localhost:8000/?dl=1&te=0 to download a file with
Rock & roll 音楽 («🎵🎶»).txt
name (in this case "Transfer-Encoding" header will be removed)
How often does it reproduce? Is there a required condition?
Always.
What is the expected behavior?
Both files are downloaded with Rock & roll 音楽 («🎵🎶»).txt
name.
What do you see instead?
The first file has the correct name — Rock & roll 音楽 («🎵🎶»).txt
.
The second one has wrong name — Rock & roll é_³æ¥½ («ð__µð__¶Â»).txt
Additional information
TL;DR
If there is "Transfer-Encoding: chunked"
header (exactly chunked
) setHeader
works properly, it sets the input header (ByteString
) as is.
(Note: "Transfer-Encoding: chunked"
is set by default.)
In any other case it additionally (unnecessary) encodes the header to ByteString
.
So, the header is encoded twice, that is wrong.
Additional info
The most of HTTP headers are contains only ASCII characters. But when you need to put in a header (For example, "Content-Disposition"
, or any custom header) a string that contains non-ASCII* character(s), you can't just put it in as issetHeader
.
For example:
// TypeError [ERR_INVALID_CHAR]: Invalid character in header content ["Header-X"]
res.setHeader("Content-Disposition", `attachment; filename="Rock & roll 音楽 («🎵🎶»).txt"`);
A HTTP header is a Binary String (ByteString
) — UTF-8
bytes within String
object.*
There is no problem with the headers which contain only ASCII characters, since ASCII charset is subset of UTF-8
and Latin 1
encodings, so toByteString(ASCIIString) === ASCIIString
.
To get a ByteString
from USVString
you just need to take UTF-8
bytes from an input string then represent them in Latin 1
(ISO-8859-1
) encoding.
For example, in Node.js:
function toByteString(string) {
return Buffer.from(string).toString("binary"); // or "latin1"
}
*To be honest, the entire quote of [ByteString
](https://webidl.spec.whatwg.org/#idl-ByteString:
Such sequences might be interpreted as UTF-8 encoded strings [RFC3629] or strings in some other 8-bit-per-code-unit encoding, although this is not required.
As I can see, a browser also can detect if the string is "just" 8859-1
, not UTF-8
bytes encoded in 8859-1
.
So, both "Content-Disposition"
headers are valid "valid"**:
res.setHeader("Content-Disposition", `attachment; filename="¡«£»÷ÿ.png"`); // Correct ONLY for the some certain browser's languages
res.setHeader("Content-Disposition", `attachment; filename="${toByteString("¡«£»÷ÿ.png")}"`); // Always is correct
The result in both "both"** cases is a file with "¡«£»÷ÿ.png"
** name, even while "¡«£»÷ÿ.png" !== toByteString("¡«£»÷ÿ.png")
.
UPDATE:
**Using non-UTF-8 bytes ("some other 8-bit-per-code-unit encoding") in ByteString
is browser/OS language dependent!
For example, in Firefox with non-EN language using of "¡«£»÷ÿ.png"
as is (without toByteString()
) results to ������.png
filename, instead of ¡«£»÷ÿ.png
In Chrome it will be Ў«Ј»чя.png
for Cyrillic.
So, I think it (using of 8859-1
in "usual way") should be highly unrecommended.
Headers should always be a ByteString
with only UTF-8 bytes represented as 8859-1
(Latin 1
).
Problem
The problem is that I can't correctly set a header that is a ByteString
(UTF-8
bytes in Latin 1
) if the original string contains non-ASCII characters.
Like the other servers do it.
The problem appears only when the Transfer-Encoding: chunked
header (which is present by default) is removed (or changed).
In this case setHeader
encodes the header to Binary String.
That is unnecessary, since it's already a ByteString
.
It's not possible to put in setHeader
a USVString
, since in this case it will throw TypeError [ERR_INVALID_CHAR]: Invalid character in header content
error.
So, the header is encoded to "binary"
twice, and browsers download the file with the wrong filenames:
Rock & roll é_³æ¥½ («ð__µð__¶Â»).txt
instead of Rock & roll 音楽 («🎵🎶»).txt
.
You can open the demo server with disabled Transfer-Encoding: chunked
header (http://localhost:8000/?te=0 ) and check it:
function bSrt2Str(bString) {
return new TextDecoder().decode(binaryStringToArrayBuffer(bString));
}
function binaryStringToArrayBuffer(binaryString) {
const u8Array = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
u8Array[i] = binaryString.charCodeAt(i);
}
return u8Array;
}
let cd = (await fetch("")).headers.get("Content-Disposition");
console.log(cd);
console.log(bSrt2Str(cd));
console.log(bSrt2Str(bSrt2Str(cd)));
The header is encoded twice!
Examples
A lot of forums encodes headers such way for the attached files (XenForo, vBulletin, for example).
The real life examples:
- https://xenforo.com/community/attachments/rock-roll-音楽-«🎵🎶»-png.266784/
- https://xenforo.com/community/attachments/¡«£»ÿ-png.266785/
Oh, wait, it requires an account, if you don't have/(want to create an account), just use my demo server.
Anyway, just look at the screenshots below.
In the browser console you can verify that header are ByteString
:
function bSrt2Str(bString) {
return new TextDecoder().decode(binaryStringToArrayBuffer(bString));
}
function binaryStringToArrayBuffer(binaryString) {
const u8Array = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
u8Array[i] = binaryString.charCodeAt(i);
}
return u8Array;
}
let cd = (await fetch("")).headers.get("Content-Disposition");
console.log(cd);
console.log(bSrt2Str(cd));
As a bonus, here is an example of Java server made with ServerSocket
which also works properly:
Main.java
package com.company;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.ServerSocket;
import java.net.Socket;
import java.nio.charset.StandardCharsets;
public class Main {
public static void main(String[] args) {
try (ServerSocket serverSocket = new ServerSocket(8000)) {
System.out.println("Sever run on http://127.0.0.1:8000");
while (true) {
Socket socket = serverSocket.accept();
try (BufferedReader input = new BufferedReader(new InputStreamReader(socket.getInputStream(), StandardCharsets.UTF_8));
PrintWriter output = new PrintWriter(socket.getOutputStream())) {
while (input.ready()) { // Print headers
System.out.println(input.readLine());
}
String name = "Rock & roll 音楽 («\uD83C\uDFB5\uD83C\uDFB6»).txt";
System.out.println(name);
output.println("HTTP/1.1 200 OK");
output.println("Content-Type: text/html; charset=utf-8");
output.println("Content-Disposition: attachment; filename=\"" + name + "\"");
output.println();
output.println(name);
output.flush();
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
}