Home > Java, Unicode > Writing UTF-8 CSV Files for Excel

Writing UTF-8 CSV Files for Excel

Yesterday a coworker complained that Excel wasn’t displaying a CSV (comma separated values) file correctly. Our application allows the user to send a report via email. The application provides the report as a CSV file. Because the report can contain multilingual text, we’ve decided to encode it in UTF-8. Unfortunately, when users click on the file to display it, usually in Excel, all of the multi-byte encoded characters display incorrectly.

The problem was immediately clear to me…Excel was opening the UTF-8 encoded files, but it was incorrectly identifying them as Latin-1 encoded files. In the absence of any charset identification, Excel must guess about a file’s content encoding. In our environment, many host PCs use en_US locales with Latin-1 as the typical charset. Excel uses that default to read and display CSV files.

My solution to the problem was to use the byte-order marker (BOM) to identify the CSV file as a Unicode file. I instructed my colleague to prepend the FEFF character to the file. The Java application that writes the file uses a FileWriter that encodes to UTF-8 to create the CSV file. It was simple to just output the BOM as the first character in the file.

Now when our customers double-click on these files, Excel opens the file, notices the BOM, and automatically selects UTF-8 as the file’s charset encoding. Now Excel displays the previously mangled characters correctly. And I was able to help resolve a problem with an easy solution.

Maybe you can give your applications a hint about plain text files as well. Writing the BOM to your file can help Unicode-enabled applications know how to decode your Unicode files.

VN:F [1.9.22_1171]
Rating: 3.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)
Writing UTF-8 CSV Files for Excel, 3.0 out of 5 based on 1 rating
Be Sociable, Share!
Categories: Java, Unicode Tags:
  1. Ben
    March 24th, 2010 at 09:29 | #1

    pardon the obvious pun but that’s the BOM

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  2. VM
    June 18th, 2010 at 13:12 | #2

    Hi there,

    I am trying to display Thai baht symbol in a CSV file. I write it using java.util.Currency

    Currency.getInstance(“THB”).getSymbol(new Locale(“th”, “TH”))

    I enabled Thai using Control panel but Excel was showing garbage character where as I am able to see this symbol in notepad. I used your suggestion but it didnt work.

    I tried UTF-16LE encoding for setting the bom by encoding but that didnot work either. Any suggestions?? Thanks.

    Here is the test case:
    ======================
    import java.io.*;
    import org.apache.commons.io.FileUtils;
    import com.Ostermiller.util.CSVPrint;
    import com.Ostermiller.util.ExcelCSVPrinter;
    import java.util.*;

    public class ThaiCurrencySymbolTest {

    public static void main(String[] args) throws Exception{
    File f1 = new File(“C:\\Test-Thai-Symbol.csv”);
    CSVPrint csvWriter = ThaiCurrencySymbolTest.getCSVPrintWriter(f1);
    csvWriter.print(Currency.getInstance(“THB”).getSymbol(new Locale(“th”, “TH”)) + “1523.65″);
    csvWriter.flush();
    csvWriter.close();
    }

    public static CSVPrint getCSVPrintWriter(File file) throws IOException {
    //OutputStreamWriter buf = new OutputStreamWriter(new FileOutputStream(file),”TIS620″);
    OutputStreamWriter buf = new OutputStreamWriter(new FileOutputStream(file),”UTF-8″);
    CSVPrint csvWriter = new ExcelCSVPrinter(buf);
    csvWriter.setAlwaysQuote(true);
    csvWriter.print(“\uFEFF”);
    return csvWriter;
    }
    }

    VA:F [1.9.22_1171]
    Rating: 1.0/5 (1 vote cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  3. January 25th, 2011 at 06:34 | #3

    Isn’t FEFF the BOM for UTF-16 BE ?

    VA:F [1.9.22_1171]
    Rating: 5.0/5 (1 vote cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  4. October 30th, 2011 at 16:36 | #4

    This is great. I think the only missing thing is the ability to customize ribbons with it these (csv addin) commands.

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  5. x
    May 8th, 2012 at 22:53 | #5

    great!

    VA:F [1.9.22_1171]
    Rating: 5.0/5 (1 vote cast)
    VA:F [1.9.22_1171]
    Rating: +2 (from 2 votes)
  6. x
    May 8th, 2012 at 22:56 | #6

    you can try this one: it works, you can open csv or text file without any symbol or character.

    —————————————————-
    Sub OpenTextFile()

    filetoopen = Application.GetOpenFilename(“Text Files (*.txt;*.csv), *.txt;*.csv”)
    If filetoopen = Null Or filetoopen = Empty Then Exit Sub

    Workbooks.OpenText Filename:=filetoopen, _
    Origin:=65001, DataType:=xlDelimited, Comma:=True, Tab:=True

    End Sub
    ———————————————-

    VA:F [1.9.22_1171]
    Rating: 5.0/5 (1 vote cast)
    VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)
  1. No trackbacks yet.