Home > JavaScript > JavaScript file encoding

JavaScript file encoding

Although JavaScript itself uses Unicode internally, you can still run into charset conversion problems. Consider the following example of charset conversion issues with a very simple HTML and JS file.

In this example, a hello.html document says “Hello” when you click a button. The button calls a snippet of JavaScript (the sayHello function) to display an alert dialog box. BTN1 invokes the sayHello function using a local variable localCustName. The localCustName variable contains the text “José”. BTN1 invokes the same function using an externally defined variable remoteCustName. The remoteCustName variable also contains the text “José”.

hello.html

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset="UTF-8">
        <title>Hello, world!</title>
        <script type="text/javascript" >
            var localCustName = "José";
	    function sayHello(custName) {
                if (custName == null || custName == "undefined") {
                    custName = "world";
                }
                alert("Hello, "+ custName);
            }
        </script>
        <script type="text/javascript" src="./remoteCustName.js"></script>
    </head>
    <body>

        <p>Hello, world!</p>
        <p>
        <button onclick="sayHello(localCustName)">BTN 1: Say hello to local José</button>
        </p>
        <p>
        <button onclick="sayHello(remoteCustName)">BTN 2: Say hello to remote José</button>
        </p>
    </body>
</html>

remoteCustName.js

// this file is encoded as charset = 8859-1
var remoteCustName = "José";

When you load the hello.html file, you’ll see a couple buttons. One button says hello to “José”, which is stored in a local JavaScript variable. One button says hello to José that is stored in an external js file. Note that the html file encoding is UTF-8, and the js file encoding is 8859-1. These are arbitrary encodings and could have been any of the encodings defined by the IANA charset registry?. The point is that the encodings are different from each other.

Suppose you click BTN 1. You should see this:

Figure 1:
btn1

In this example, the HTML file is UTF-8. Also, the localCustName variable begins as UTF-8 in the HTML file itself, and the interpreter converts it from UTF-8 into its own charset encoding — which is conveniently also Unicode.

Now let’s imagine you click BTN 2. You should see this:

Figure 2:
btn2

In Fig 2, we have linked to an external JS file, which has the encoding ISO-8859-1. When the browser pulls that remoteCustName.js file in, it converts it to Unicode. However, how does it know the source encoding? It assumes the source encoding is the same as the HTML document, which is UTF-8. So, now within the browser interpreter, the remoteCustName variable text is Unicode, but the conversion was incorrect. It guessed incorrectly that the external JS file was encoded as UTF-8; instead, the JavaScript file itself is encoded as ISO-8859-1. The visible display of the remoteCustName variable shows a garbled character for what should have been an ‘é’ character.

What’s the fix?

We can fix this by simply telling the interpreter explicitly what the JS file encoding is. The following revised HTML file does this:

...
<script type="text/javascript" charset="ISO-8859-1" src="./remoteCustName.js"></script>
...

Now, when we click on either BTN 1 or BTN 2, we see the same thing:

Figure 3:
btn1

The Problem

JavaScript uses Unicode as its underlying character set for all text strings. However, characters don’t instantly appear in the interpreter; they get there from a file. Common file types that include JavaScript program text include these:

  • html
  • js
  • jsp

The JavaScript interpreter receives text from these files and interprets that text into JavaScript. Although all text inside the interpreter is Unicode, a text’s source encoding from its surrounding html, js, or jsp file is not always Unicode. The text that contains JavaScript language lines can be in a variety of charset encodings.

The Solution

There are a couple things to remember about charset encodings and JavaScript:

  1. The JavaScript interpreter works with Unicode.
  2. The JavaScript interpreter converts JavaScript text into Unicode.
  3. The JavaScript interpreter assumes that JavaScript strings are encoded in the charset of the enclosing HTML or JSP document.
  4. When linking to external JavaScript files (.js) from HTML, the interpreter will assume that the external file is encoded in the same charset as the HTML document unless you override that assumption with a charset attribute
  5. Always use the charset attribute in script tags.
  6. Specifically, you probably should save all JavaScript files as UTF-8 encoded files and use the charset=”UTF-8″ attribute in script tags.
VN:F [1.9.13_1145]
Rating: 4.5/5 (11 votes cast)
VN:F [1.9.13_1145]
Rating: +4 (from 6 votes)
JavaScript file encoding, 4.5 out of 5 based on 11 ratings
Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Technorati
  • Twitter
  • Add to favorites
  • Yahoo! Bookmarks
  • DZone
  • LinkedIn
  • Reddit
  • Slashdot
Categories: JavaScript Tags:
  1. mozz
    September 9th, 2008 at 00:24 | #1

    Useful info,
    thanks!

    VA:F [1.9.13_1145]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.13_1145]
    Rating: 0 (from 2 votes)
  2. Milos
    February 16th, 2009 at 09:35 | #2

    thanks a lot!

    VA:F [1.9.13_1145]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.13_1145]
    Rating: 0 (from 2 votes)
  3. Alexwebmaster
    March 3rd, 2009 at 02:57 | #3

    Hello webmaster
    I would like to share with you a link to your site
    write me here preonrelt@mail.ru

    VA:F [1.9.13_1145]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.13_1145]
    Rating: -1 (from 3 votes)
  4. April 8th, 2009 at 10:00 | #4

    Very helpful!Tks.

    VA:F [1.9.13_1145]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.13_1145]
    Rating: -1 (from 3 votes)
  5. Rastael
    May 21st, 2009 at 23:55 | #5

    Thank a lot. Solve my problem. Nice explanation.

    VA:F [1.9.13_1145]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.13_1145]
    Rating: -1 (from 3 votes)
  6. Andre
    July 31st, 2009 at 10:12 | #6

    Man, you save my life! Thanks a lot!

    VA:F [1.9.13_1145]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.13_1145]
    Rating: -1 (from 3 votes)
  7. March 8th, 2012 at 06:29 | #7

    This is good info. I had fixed most of my problems with keeping consistent use of the UTF-8 character set throughout my code. However I still had a problem on AJAX submissions which is why I suspected the javascript code and found this article. I had every bit of my code set to UTF-8 and it still converted non-standard characters like smartquotes and ellipsis and en/em dash to junk. It wasn’t the JS though. I finally found I had to set my web server software (Tomcat for me since I’m using Java) to UTF-8 as well. This site was helpful to me – http://www.jvmhost.com/articles/tomcat-java-mysql-jdbc-and-unicode . As you see there, I added it to my Connector directive. That article showed a way to do it independent of the server, but it involved a clunky conversion like – name = new String(request.getParameter(“name”).getBytes(“ISO-8859-1″), “UTF-8″); – and that seemed ridiculous to me, especially after I already set the request object to use UTF-8 encoding from the beginning.

    VA:F [1.9.13_1145]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.13_1145]
    Rating: +1 (from 1 vote)
  8. Anand
    April 16th, 2012 at 02:02 | #8

    excellent…thnks

    VA:F [1.9.13_1145]
    Rating: 5.0/5 (1 vote cast)
    VA:F [1.9.13_1145]
    Rating: +1 (from 1 vote)
  9. April 18th, 2012 at 22:58 | #9

    very very helpfull. i had some buttons text in english on js sample script, and i changed them to hebrew (windows-1255) letters – and got Gibberish. then i added the encoding to js tag and all got fixed.

    VA:F [1.9.13_1145]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.13_1145]
    Rating: +1 (from 1 vote)
  1. November 17th, 2009 at 12:33 | #1