JavaScript file encoding
Although JavaScript itself uses Unicode internally, you can still run into charset conversion problems. Consider the following example of charset conversion issues with a very simple HTML and JS file.
In this example, a hello.html document says “Hello” when you click a button. The button calls a snippet of JavaScript (the sayHello function) to display an alert dialog box. BTN1 invokes the sayHello function using a local variable localCustName. The localCustName variable contains the text “José”. BTN1 invokes the same function using an externally defined variable remoteCustName. The remoteCustName variable also contains the text “José”.
hello.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset="UTF-8">
<title>Hello, world!</title>
<script type="text/javascript" >
var localCustName = "José";
function sayHello(custName) {
if (custName == null || custName == "undefined") {
custName = "world";
}
alert("Hello, "+ custName);
}
</script>
<script type="text/javascript" src="./remoteCustName.js"></script>
</head>
<body>
<p>Hello, world!</p>
<p>
<button onclick="sayHello(localCustName)">BTN 1: Say hello to local José</button>
</p>
<p>
<button onclick="sayHello(remoteCustName)">BTN 2: Say hello to remote José</button>
</p>
</body>
</html>
remoteCustName.js
// this file is encoded as charset = 8859-1 var remoteCustName = "José";
When you load the hello.html file, you’ll see a couple buttons. One button says hello to “José”, which is stored in a local JavaScript variable. One button says hello to José that is stored in an external js file. Note that the html file encoding is UTF-8, and the js file encoding is 8859-1. These are arbitrary encodings and could have been any of the encodings defined by the IANA charset registry?. The point is that the encodings are different from each other.
Suppose you click BTN 1. You should see this:
Figure 1:

In this example, the HTML file is UTF-8. Also, the localCustName variable begins as UTF-8 in the HTML file itself, and the interpreter converts it from UTF-8 into its own charset encoding — which is conveniently also Unicode.
Now let’s imagine you click BTN 2. You should see this:
Figure 2:

In Fig 2, we have linked to an external JS file, which has the encoding ISO-8859-1. When the browser pulls that remoteCustName.js file in, it converts it to Unicode. However, how does it know the source encoding? It assumes the source encoding is the same as the HTML document, which is UTF-8. So, now within the browser interpreter, the remoteCustName variable text is Unicode, but the conversion was incorrect. It guessed incorrectly that the external JS file was encoded as UTF-8; instead, the JavaScript file itself is encoded as ISO-8859-1. The visible display of the remoteCustName variable shows a garbled character for what should have been an ‘é’ character.
What’s the fix?
We can fix this by simply telling the interpreter explicitly what the JS file encoding is. The following revised HTML file does this:
... <script type="text/javascript" charset="ISO-8859-1" src="./remoteCustName.js"></script> ...
Now, when we click on either BTN 1 or BTN 2, we see the same thing:
Figure 3:

The Problem
JavaScript uses Unicode as its underlying character set for all text strings. However, characters don’t instantly appear in the interpreter; they get there from a file. Common file types that include JavaScript program text include these:
- html
- js
- jsp
The JavaScript interpreter receives text from these files and interprets that text into JavaScript. Although all text inside the interpreter is Unicode, a text’s source encoding from its surrounding html, js, or jsp file is not always Unicode. The text that contains JavaScript language lines can be in a variety of charset encodings.
The Solution
There are a couple things to remember about charset encodings and JavaScript:
- The JavaScript interpreter works with Unicode.
- The JavaScript interpreter converts JavaScript text into Unicode.
- The JavaScript interpreter assumes that JavaScript strings are encoded in the charset of the enclosing HTML or JSP document.
- When linking to external JavaScript files (.js) from HTML, the interpreter will assume that the external file is encoded in the same charset as the HTML document unless you override that assumption with a charset attribute
- Always use the charset attribute in script tags.
- Specifically, you probably should save all JavaScript files as UTF-8 encoded files and use the charset=”UTF-8″ attribute in script tags.

Useful info,
thanks!
thanks a lot!
Hello webmaster
I would like to share with you a link to your site
write me here preonrelt@mail.ru
Very helpful!Tks.
Thank a lot. Solve my problem. Nice explanation.
Man, you save my life! Thanks a lot!
This is good info. I had fixed most of my problems with keeping consistent use of the UTF-8 character set throughout my code. However I still had a problem on AJAX submissions which is why I suspected the javascript code and found this article. I had every bit of my code set to UTF-8 and it still converted non-standard characters like smartquotes and ellipsis and en/em dash to junk. It wasn’t the JS though. I finally found I had to set my web server software (Tomcat for me since I’m using Java) to UTF-8 as well. This site was helpful to me – http://www.jvmhost.com/articles/tomcat-java-mysql-jdbc-and-unicode . As you see there, I added it to my Connector directive. That article showed a way to do it independent of the server, but it involved a clunky conversion like – name = new String(request.getParameter(“name”).getBytes(“ISO-8859-1″), “UTF-8″); – and that seemed ridiculous to me, especially after I already set the request object to use UTF-8 encoding from the beginning.
excellent…thnks
very very helpfull. i had some buttons text in english on js sample script, and i changed them to hebrew (windows-1255) letters – and got Gibberish. then i added the encoding to js tag and all got fixed.
perfect, I was wondering such a long time about this.
congrats for place #1 in google search “encode js files utf8″
Sir, you just have fixed my day.
This article was helpfull for me too. It also works for cyrillics.
But there is an error in a source code
replace
with
Damn! It removed the code from comment. So remove fourth bracket in “meta” tag