PDF files are bad source documents for translation?
While reading my latest copy of Multilingual magazine, I found an interesting assertion about creating translation-friendly source documents. One article starts its discussion by stating this:
Whenever possible, avoid using PDF files as the source document format
for translation. Always try to provide the original file format … [because]
PDF files cannot currently be edited in some programs and instead have to
be transformed into another format (usually Word) before translation.[Multilingual, Oct/Nov2011, "Creating translation-oriented source documents," p. 43]
This statement makes sense in some ways I suppose. For example, I don’t usually think of PDF files as being easily editable, and perhaps translation tools don’t usually understand the format. However, being somewhat new to the PDF format, I’ll ask that you please forgive my ignorance, but I do have some questions, especially if you are a translation or translation tools company:
- Do you agree with the author’s statement. If so, why?
- If PDF files do fall into this category of being difficult source documents, can translation tool vendors do something about that? What is possible?
- If you get a PDF document to translate, what do you do? Return it? Does it fit into your translation workflow and toolsets?

PDF loses the original sentence (or segment) structure and inserts hard carriage returns for formatting purposes. Both of these are bad things for translation environment tools. Also, in some cases the running text comes out in a different order than intended which can complicate the whole process. Original file formats preserve all of this, but of course are only useful if your vendor and/or translator has the same application (or if the translation environment tool has a filter for that application). Best of all would be XML as it separates the “data” from the format (for the most part).
1. For the love of god, yes! Because most conversion tools do not so much convert as “crack” the .pdf format either directly or using OCR. I use some pretty sophisticated tools which eliminate many of the problems which BobD mentions, but they are not free and both the initial conversion and post-formatting can take some time.
2. In my experience, it is best for the conversion to take place outside of the translation environment.
3. I translate it with a smile – but I add a surcharge and might, depending on the condition of the source document, add time for conversion and formatting. Like I said, my nice tools are not free. There are free tools available, but most of these do not preserve confidentiality or formatting.
-Jenn Mercer
French to English Translation
Thank you for your comments. Very helpful. To me, PDF files have somehow been out-featured by products like Word. PDF files seem very good at representing the visual look-and-feel or the visual intention of the author, but seem inadequate for almost anything else.
The interesting thing about products like Word is that you *can* distribute locked or password protected documents to preserve the content, and I *think* it’s possible to include fonts with your document so that visual representation is preserved, AND it has very nice editing/review/comment tracking too. PDF doesn’t seem to fit my workflow for anything anymore, and yet I’m in an environment where PDF document distribution is the norm.