|
|

DTP and Graphic Design

Published on 06/06/2024

The TMX format has become the standard for exchanging translation memories between translation systems, even those installed on different operating systems. In 2007, when our current marketing director wrote the article "Testing the implementation of the TMX standards" in Multilingual, most programmes allowed the use of the .txt format created in the version available at the time of RWS Trados Studio.

Today, this article would not make sense. Keep reading to expand your knowledge of one of the most adopted standards in the industry alongside the XLIFF format.

What is the TMX format?

The Translation Memory eXchange format is an XML-based open standard designed for storing translation memories (TM) and exchanging them between different computer-assisted translation (CAT) tools and applications. A translation memory is a database that stores previously translated text segments so they can be reused in future translations, speeding up work while ensuring stylistic and terminological consistency, whether through 100% matches or partial matches (what we often call fuzzy matches in professional jargon).

The TMX format was one of the legacies of LISA (Localization Industry Standards Association) before its dissolution in 2011 and has been widely adopted by multiple translation and localisation tools. Its main objective is to standardise the storage of TMs, facilitating their interoperability between different platforms.

Structure of the TMX format

TMX files are based on XML, which implies the use of tags to encode information so that it can be read by both humans and machines. In general, its structure consists of a header followed by one or more sections or body that contain translation units (TU), that is, each of the previously translated text segments.

The header contains descriptive information about the TM, such as the name, the source language and the target language(s), as well as other additional data like the tool used for its creation, the creation date or any revisions made.

The rest of the sections contain the TUs, including both the original plain text segment and its translation into one or more languages, as well as the format tags (depending on the level of TMX format implementation, which we will see later). Furthermore, additional information can be included such as the context of use of the segment or the translator's comments.

Multilingual and bilingual TMX files

As mentioned above, TMX files can be multilingual or bilingual, depending on the number of target languages included:

  • Bilingual TMX files: contain text segments in two languages, the source and the target language. They are the most common and are mainly used in specific translation projects.
  • Multilingual TMX files: contain text segments in multiple languages, allowing translations to be managed and reused in several languages from a single file. These are useful for large localisation projects.

Metadata in a TMX file

In addition to the translated text segments, TMX files can contain various types of metadata that provide extra information about the TUs and facilitate the effective management and use of TMs. We have already mentioned some of them, but here is a complete list of the most common ones:

  • Source and target language: specifies the languages of the original segment and its translation. This is essential to ensure that the TUs are used correctly in multilingual contexts, by filtering and applying only those that match the specific languages of the current project.
  • Author and creation date: indicates who created the segment and when, which facilitates the assignment of responsibilities, the tracking of translation quality, and communication between professionals.
  • Client and project: relevant information about the client and the project associated with the TM. When working on a specific project, previously used TUs for the same client can be prioritised, ensuring terminological and stylistic consistency.
  • Translation status: indicates whether the translation has been reviewed, approved or has a pending review, helping to manage the workflow of the translation company.
  • Notes and comments: allows additional annotations to be inserted that may be useful for future translators and reviewers when making decisions.
  • Context and segmentation: information about the context of the segment within the source text, which helps maintain consistency and accuracy in future translations.

Implementation of the TMX format

The implementation of the TMX format in different CAT tools is carried out at three different levels, depending on the type of format codes and tags that can be recognised. In other words, the implementation levels determine the complexity and amount of information that can be included in a file. These levels are the following:

  • Level 1 (Plain Text Only): this is the most basic level and ensures compatibility between different CAT tools. It includes minimal information such as the pairs of text segments in the source and target language. It is the simplest option when looking for equivalences between pairs, as the reading of the segments is not hindered by the presence of tags.
  • Level 2 (Meta-Markup): this level examines the information of the tags in its TMX format, taking into account details about the text format (bold, italics, underline…)
  • Level 3 (Native-Markup): this is the most advanced level and enables the recognition of both TMX tags and the native code of each element, without losing any information. This implies the possibility of recreating the exact structure and format of the original document in the translation using only the TMX file.

Compatibility issues of TMX formats in CAT tools

Despite being an open standard, TMX files can face compatibility issues between different CAT tools. Some common issues include:

  • Different levels of implementation: as we saw in the implementation levels, not all tools are capable of interpreting the same type of data contained in a TMX file, which entails the possible loss of important information from one tool to another.
  • Differences in XML parsing: some tools do not use standard XML parsers, so they may not accept some valid TMX files.
  • Generation of invalid TMX files: even if they can correctly read the XML, certain tools are not capable of generating valid TMX format files, which causes problems when being read later by other programs.
  • New XML versions: there are still tools that work with older versions of XML, so they will not be able to read more recent TMX files.
  • Multilingual TMX: some tools only allow two languages and do not support multilingual TMX files.

Conclusion

The TMX format is an essential tool for both professional translators and translation agencies, as it proposes a standardised storage of memories, allowing the exchange of these memories between professional translators, regardless of the CAT tool they use. However, users must be aware of the implementation levels and possible compatibility issues to avoid loss of information. Furthermore, by knowing how to manipulate and manage TMX files, and understanding all the information they can contain, we are able to streamline the translation process, save time and effort, and improve our work.

Iván Vázquez's picture
Iván Vázquez

Graduate in Translation and Interpreting from the University of Granada, specializing in French and Chinese. He has worked on several literary translation and web translation projects in Spain and France. Currently, he is a project management assistant and content writer at AbroadLink.

1
Published on 31/05/2024

Communication between companies from different cultures and languages has become a common necessity today. It is crucial for companies seeking to expand internationally to understand the linguistic and cultural particularities of their target markets. One of these often overlooked particularities is the reading direction. 

Did you know that some cultures read from right to left? This detail may seem minor, but it has a significant impact on the layout of documents and marketing materials.

[TOC]

Why reading direction is important

When designing documents, websites or any other form of visual communication, we take into account the reading habits of our audience. In most Western cultures, we read from left to right. However, in many other cultures, especially those using Arabic, Hebrew and Persian, reading is done from right to left.

This difference in reading direction entails more than the simple adjustment of text. In fact, it affects the overall visual structure of a document, the layout, the alignment of elements and even the order in which information is perceived and assimilated.

The challenges of layout for RTL languages

1. Alignment and structure

In RTL languages (right-to-left), the main alignment must be on the right, contrary to LTR languages (left-to-right). This means that titles, paragraphs and even images must be aligned differently. For example, a document in English will normally have titles aligned to the left and a text flow that goes from left to right. In Arabic, this same document must be reorganised so that the text begins on the right and flows to the left.

2. Images and graphics

Visual elements such as images and graphics also require special attention. If an image contains text, it must be translated and reoriented to follow the reading direction of the target language. Moreover, the direction of the visual flow of graphics and diagrams may also need to be reversed. For example, a chart showing a temporal progression from left to right must be modified to show this same progression from right to left in an RTL language.

3. User Interface (UI) and User Experience (UX)

For digital companies, adapting user interfaces for RTL languages is essential. This includes menus, buttons and even icons. A website or application must be completely reconfigured to offer an intuitive and natural user experience for RTL readers. Navigation menus must be moved to the right and interactive elements must be reorganised accordingly.

Examples of successful adaptations

Take the example of large companies like Apple or Google, which have seamlessly integrated RTL compatibility into their products. Operating systems like iOS and Android support RTL languages, offering a smooth and consistent experience to users who have these languages as their mother tongue.

These companies demonstrate that special attention to layout and user interface details can make a significant difference in the reception and use of their products in RTL markets.

How a translation company can help you

Any professional translation company must understand the importance of accurately adapting the text to the target culture and language. Their translation services are not limited to translating words, but adopt a holistic approach to localisation. With their high-quality translations, they ensure that documents and digital media are perfectly adapted to the reading habits and cultural expectations of target markets. Ultimately, using their language services will allow you to strengthen your brand image.

Services offered:

  1. Document translation and localisation: they adapt your documents to be perfectly readable and aesthetically appealing, both for LTR and RTL readers.
  2. Adapted graphic design: professional translators will take care of reorienting images, illustrations and diagrams.
  3. User interface adaptation: professional translators help reconfigure your websites and applications to offer an optimal user experience.

Conclusion

Reading direction is a fundamental aspect of intercultural communication that is often overlooked. For companies wishing to expand into regions that use RTL languages, understanding and adapting to these differences can make a difference. Translation agencies guide you through every stage of your translation project, ensuring that your message is not only understood but also captures the cultural nuances of your audience.

Investing in accurate and culturally respectful adaptation shows your clients that you care about their needs and are willing to go the extra mile to serve them. Contact our translation company to discover how we can help you succeed in the foreign market.

Other articles you may be interested in:

Virginia Pacheco's picture
Virginia Pacheco

Blog writer and Community Manager interested in multiculturality and linguistic diversity. From her native Venezuela, she has travelled and lived for many years in France, Germany, Cameroon and Spain, passing on her passion for writing and her intercultural experiences.

1
Published on 05/11/2020

PDF files are one of the most widespread formats for displaying text content in documents. This is why it is often the only format we have for translating a contract, a brochure, a data sheet or a user manual. However, PDF files are only content exchange files. In fact, PDF stands for Portable Document Format.

The only purpose in life of a PDF file is that we can see it and share it without compatibility problems. It's important to understand this. A PDF file can be modified, but it is not actually intended to be modified.

[TOC]

Let's see here what we have to do to be able to translate a PDF using the computer assisted translation programs normally used by professional translators and translation agencies. Or to easily use a machine translator such as Google Translator, if we do not need a 100% reliable translation. Or even if we simply want to translate in the old-fashioned way, overwriting the text, but keeping the document format.

To make use of language technology we will need to have the PDF file in an editable format that we can easily handle. The most normal way to translate them is to convert them to Word. It can be a .doc or .docx document.

1. What is the ideal solution for translating a PDF file?

What is the ideal solution for translating a PDF file

This is going to sound a little bit stupid, but the best way to translate a PDF file is not to translate it. I mean, better not to translate the PDF file but the original, editable file with which the PDF was created.

For example, if that PDF file was created in Word, it is best to use the original Word file. If it was created with FrameMaker or InDesign, it is best to use these formats. If you have InDesign files and can forget about the PDF, you might be interested in our blog: "DTP/layout best practices for translating InDesign files.”

However, it is quite possible that your company no longer knows where the original files are or who created them. Or, if you are a distributor or importer, it is possible that the manufacturer has only provided you with the PDF files to do the translation.

However, it is worth investing a little bit of time in investigating whether someone still has them, or insisting that the manufacturer send them to us. This will save us a lot of headaches during translation. When I say headaches, I mean the most common issues: time and money.

When we want to preserve the format of the original text, getting the originals will be even more important. PDF files are usually created in low-resolution versions, so they won't work if we need them for high quality professional printing.

On the other hand, when we convert the PDF files to Word for translation, we will see that the layout can change quite a lot. So it can be very laborious and expensive to reproduce 100% of the original layout.

In short, the best results in terms of quality and time/money will be achieved by working from the originals for translation. However, you probably wouldn't be reading this article if you did, would you?

2. How do you know which program the PDF was created with?

How to know the program used to create the PDF

In general, we will be able to easily find out which program was used to create the original PDF. If we open the PDF with Acrobat or Acrobat Reader and go to Files>Properties, we can see the application used to create it. Here is an example of a PDF created in Word:

Acrobat Reader

We can also see who the author is and the date when it was created, which can give us a clue as to who to ask for the originals.

3. Why convert a PDF to Word for translation?

As I said, the standard solution for translating PDF files used by translation companies is to go through Word. However, someone might think this doesn't make sense because PDF files can be edited.

Well, this is true to a certain extent. First, yes, they can be modified, but for that we will need to have the paid professional version of Acrobat. Most translators will only have the free version: Acrobat Reader. This free version is limited to a few functions.

Second, even if we use a translator who does have the professional version, changing the text in Acrobat will take much longer. We may be able to convince novice translators to work in this way at their usual rate.
However, more experienced professional translators are likely to surcharge us to work this way. In the worst case, they will directly reject a translation project under suchconditions.

These two problems can be solved by sending a Word file for translation. In addition, Word files will allow translators or translation agencies to use translation assistance tools. These tools create a database of the translations performed by the professional translator.

These tools also allow you to analyse how much repeated text the documents to be translated have. This is especially important when it comes to translating technical manuals.

Technical manuals often include a lot of repeated text both in the same manual and/or between manuals for similar products. Many translators or translation agencies, like ourselves, will agree to offer discounts based on the volume of repeated text.

Converting a PDF to Word can be a very simple and efficient step. There are also cases where it will be a real headache. Next, we will explain how to create a Word file depending on the type of PDF we have.

4. How can I translate an editable PDF file

How can I translate an editable PDF file

Once we have made sure that we do not have the original files, there will be no option other than using the PDF files for translation. The best situation we can face is that the PDF files are editable.

When we say that they are editable, we mean that the text of the PDF can be easily modified. That is, it will not be an image as it happens with scanned PDFs or vectorized text. We'll look at these cases later.

The best solution will be to convert the PDF into a Microsoft Word document. This file format is a standard today. This means that we can send them to any translator or translation company. Today all translation professionals have Word or a compatible program.

Word documents are also easily handled by translation programs. With translation programs I refer to both computer assisted translation tools (such as SDL or memoQ) or automatic/machine translation tools (such as Google Translator).

The secret to getting a good PDF to Word conversion is the program we use. There are a whole series of programs called OCR on the market that make very decent conversions. These are programs that have been on the market for years and are mature programs.

Out of the experience working with PDF files in our translation agency, the best programs for this purpose are Adobe Acrobat, Omnipage and Abbyy FineReader. There are also other good programs. See other programs on this blog: Extract Text from Images and PDFs with Best OCR Software.

The best practice is to have several of these programs, if our budget allows for it. Depending on the document, sometimes Adobe Acrobat will give us an optimal result. Other times it will be OmniPage or Abby FineReader.

Once we have the converted document, we will have to review the layout and modify it if necessary. For example, the converter may have placed a paragraph mark in the middle of the sentences, splitting them. This type of formatting will complicate the translation process and should be avoided.

5. How can I translate a scanned (or vectorized) PDF file

How can I translate a scanned (or vectorized) PDF file

The conversion of a scanned PDF does not differ from that of an editable PDF, except for the result we can expect. In general, scanned PDFs are going to have poorer results.

If working with OmniPage, you can instruct the program to help you interpret your scanned document. You can basically tell OmniPage if there is a table, if it is a text paragraph or if it is an image. It also allows you to indicate the text orientation when it changes. These basic instructions can optimize the text to be translated.

A problem that can be insurmountable is when the resolution of the scanned PDF is not sufficient to perform OCR. OCR programs will need to have a minimum resolution to work. If we encounter this problem, we will have to ask for the documents to be rescanned at a higher resolution. If this is not possible, we can print the documents and scan them ourselves. This will be a valid work around sometimes.

A similar case of scanned PDFs is that of PDFs where the text has been converted to vector graphics. In design programs such as Illustrator, InDesign or Corel there is the function of passing the text to vectorise the text, losing the ability to edit and translate them. This is done to avoid having to send in the sources. In general, this type of PDF will convert the text well.

6. PDFs generated from databases

PDFs generated from databases

There are many programs that use the PDF format to create documents from data in a database. An example of this might be a simple invoice or report generated from an ERP or CMR.

Safety data sheets are a typical example that we usually find in translation companies. Most product safety data sheets are generated from a management program that has all the necessary information.

Since these PDFs are not generated from a previous design that you can have in Word or InDesign, they tend to generate more problematic conversions. Typical problems are text in split text boxes, columns whose text does not follow a logical order or sentences cut by paragraph marks.

Watermarks are usually the biggest problem that we can find in this type of documents when we want to translate them, since they are usually put there precisely to avoid the conversion of the document into an editable format.

In conclusion, the translation of PDF files is often a headache for translation agency project managers. There is a wide variety of possible problems, and sometimes managing them can be a major effort in parallel with translation. It is important that you, as a customer, are clear about the results you expect from the translation of a PDF document. In many occasions, getting the same format of the original implies a lot of layout work that someone has to pay for.

Josh Gambin's picture
Josh Gambin

Josh Gambin holds a 5-year degree in Biology from the University of Valencia (Spain) and a 4-year degree in Translation and Interpreting from the University of Granada (Spain). He has worked as a freelance translator, in-house translator, desktop publisher and project manager. From 2002, he is a founding member of AbroadLink and is the CMO of the company.

linkedin logo
1
Published on 29/05/2019

Although there are still marketing departments that send Word documents for translation of documents created in InDesign, many of them are already aware of the possibility of sending InDesign files directly for translation. Today all computer assisted translation programs used by professional translators and translation companies have the ability to filter text from InDesign files. This process is more efficient was we can avoid the manual cut-and-paste process, which is time consuming and carries a greater risk of human error. In this blog, I make a series of recommendations to be taken into account in order to best integrate the creation of documents in InDesign with the translation of these documents.

[TOC]

The recommendations made in this blog can and should affect the way the production department works to achieve a more efficient workflow throughout the multilingual production chain. It is therefore important for creatives and DTPers to understand how translations are done in order to understand how design and DTP work can affect the quality and cost of the translation.

1. How do translation agencies and professional translators translate InDesign files?

Today, and for more than 30 years, professional technical and commercial translation is almost entirely carried out with the use of computer-assisted translation software. These programs have two basic features: 1. They generate a database with the translations so that they can be efficiently reused. Translators can also easily perform searches for terms already translated; 2. These programs are capable of extracting text from a large number of formats, InDesign among them, and segmenting the text to provide a homogeneous translation interface, usually in the form of a double column.

Below is a screenshot of Trados SDL Studio, the computer aided translation tool we use mostly in our translation company and market leader, although there are many other solutions:

As we can see the text is divided into segments and it is the way these segments are created where the format of InDesign files plays a fundamental role. In general, we can say that a segment will correspond to a sentence. In any case, it is advisable that it is at least a complete unit of meaning.

2. Formats to avoid for a more efficient InDesign file translation

There are two fundamental aspects of translation that will be affected by the way your InDesign files have been formatted. The first aspect concerns the segments that are translated by the translator and, the second, the ability to identify text repeated sentences by the CAT tool. The first aspect can affect the quality of the translation and the second can affect the level of consistency and cost of the translation. This is especially true for the translation of technical manuals or other documents with a lot of repeated text, be it medical or other translation speciality. In other cases, such as marketing translation or the translation of legal documents, it will not have much relevance since it is not a type of document that usually contains many repeated text.

In general, the formats to avoid are all those that are to be entered in the middle of a sentence. It is common to use tabs, paragraph marks and line breaks to adjust the text to the design. As an example, we present a text in InDesign where these formatting strategies have been used and its effect on the segmentation to be used by the translation tool. 

We'll now see how this text would look once the InDesign IDML file is processed with an assisted translation tool for translation:

We can see that on this occasion a phrase appears splitted in several segments. This is basically due to the use of paragraph marks to cut sentences for design purposes. If this has to be done, it is always preferable to use line breaks, which will have an effect on the detection of repetitions with respect to similar sentences where they are not used, but will not cause a sentence to be divided into several segments.

Below is how this text could have been formatted to achieve a similar format and achieve a segmentation that facilitates the translator's work and increases the possibility of detecting repetitions.

We observed now how the text would look once processed. We can see that each phrase or title corresponds to a segment: 

3. Layout with the translation extension of in mind

Naturally, each language will need a different space to display the same information. This is a factor that if taken into account during the document production phase will help to achieve a smoother process of producing translated documentation, reducing the time needed to adapt the translated text to the initial layout.

If we take English as a reference we can expect an extension when translated into French of 20%, into German of 30%, into Spanish between 20%-30%, into Italian of 10%, etc. The problem during the layout of InDesign files of the translated versions arises when the text has been arranged very tightly in the original template with small font sizes and little spacing. When this happens we will need to invest more time during this DTP phase to find solutions to fit the expanded translation and often lower the design standards of the original document.

4. Text boxes on images

This is another recommendation that may affect how the images and illustrations in our advertising brochures, catalogues and user manuals are handled, but it has a strong impact on the resources we will need later for their translation.

When we add text to an image or illustration that we will link to or embed in our InDesign documents, we can do it in basically two ways. The most immediate and natural way is to add the text directly in the image editing program we are using (for example, Photoshop or Illustrator). When we do so, we'll need to process each of these images with text for each of the languages we have, whether we do it through cut-and-paste or use an application to export/import the text automatically (see our blog on how to translate Illustrator files). However, text in image files can be a significant workload. For example, if we had a catalogue with 100 images with text that we translated into 10 languages and we spent an average of 3 minutes on each image for management and layout, we will need to spend about 50 hours that could be reduced to 5-10 hours using the recommendation we make here.

When for design reasons we do not need to enter the text into the image/illustration editing program, the layout of the translated versions will be more efficient if we enter the text directly using text boxes in InDesign placed over the linked images. In the case of AbroadLink Translations, when our customers make use of our integrated translation services with multilingual DTP, if we find an InDesign document with images with text to be translated into several languages, the first step in preparing the document for translation will be the creation of text boxes over the images linked to InDesign, whenever possible, to achieve a reduction in delivery time and total cost.

5. Maintain editable text versions of images and illustrations

When, for design reasons, it is necessary to use the text directly in the editing program, it will be very important to have the image file with the text in an editable format during the translation phase. It is common along the document production chain to not have the editable files from which the .jpg, .tiff or .png files were created. These are bitmap files that do not allow separate editing of the text, so the translation of these files becomes practically impossible when the text is not on a homogeneous background, as is often the case with advertising brochures, or even catalogues, when the text is placed on a creative photograph. The same problem often occurs with vector files (.eps or .ai) that allow designers to make very creative transformations of the text and where many times the non-editable version of the file is used to avoid the problem that may arise for not having the original fonts used.

The problem during the translation layout phase when you don't have the editable files is that recreating the files to be translated from non-editable files will take much longer and sometimes we will have to accept solutions that change the layout. In short, more time, more costs and, sometimes, for a low quality DTP work.

6. Outsourcing of the layout of the translated versions to translation companies

Many companies or translation service agencies, such as AbroadLink Translations, offer their clients integral solutions that include among their services the layout of the translated files. As such, these services are not layout services per se, as they use the templates of the original documents to enter the translated text. Outsourcing this task to translation companies means that both the InDesign files and the graphics and fonts used in the design must be sent to the company. As discussed in the previous section, it will be advisable to provide the graphics in a format in which the text to be translated can be edited to reduce costs and shorten working times.

The advantages that this usually offers over commissioning work from an external design company or advertising agency is above all that translation companies tend to have more experience in solving problems inherent in multilingual DTP, such as adapting space to the length or shortening of translated texts, managing languages from right to left (such as Hebrew or Arabic) or adopting new Unicode fonts that do have the characters required by the translated language.

What to do if we work with an advertising agency and they do not provide us the InDesign files with the artwork?
 

In cases where the creation of brochures or manuals was entrusted to an advertising agency, we may find that, due to the contractual relationship with the agency, they cannot provide us with the InDesign files and the corresponding fonts and graphics. When this happens, the layout of the translated versions will have to be done by the advertising agency. However, it is possible that they can provide you with the files in IDML format, so that the translation agency can deliver them translated for subsequent layout. This process will simplify the work of the advertising agency and it is possible that a reduction in the cost of DTP can be negotiated with respect to sending them the translations in Word files.

7. Conclusion

The most efficient way to translate InDesign files is by processing them in IDML format through computer aided translation tools. Following a series of good practices can have an effect on the quality and cost of translations and the layout of translated versions.

Did you find this blog useful? Help us and share it on your social networks.

Do you work with a translation agency but did not know about discounts for repetitions or the integration of DTP with the DTP process? Ask us for a translation quote for your next project and take advantage of these benefits.
 

 

 

 

 

 

 

Josh Gambin's picture
Josh Gambin

Josh Gambin holds a 5-year degree in Biology from the University of Valencia (Spain) and a 4-year degree in Translation and Interpreting from the University of Granada (Spain). He has worked as a freelance translator, in-house translator, desktop publisher and project manager. From 2002, he is a founding member of AbroadLink and is the CMO of the company.

linkedin logo
1
Published on 23/03/2017

Years ago Sysfilter came into my life, and I can ensure you that it is one of the best investments we have made in the company.

[TOC]

It provides DTP operators with a tool that, I would say, is essential in the day-to-day operations of the translation industry and, of course, of multilingual layout (or DTP if you prefer).

Extracting texts for translation

For those of you who aren't familiar with Sysfilter, it is a programme (or rather, a number of programmes) that extracts text for translation from different programmes such asIllustratorPhotoshopCorel DrawInDesignVisioPowerPoint... provided that these texts are editable. These programmes include the one that I will address in this blog: Sysfilter for Illustrator.

In this manner, you can forget about creating horrible bilingual tables in Word for translators or translation companies, and the resulting cut and paste that can become tiring, and sometimes frustrating. Firstly, because you have to write the text in the aforementioned Word table, and secondly because you have to create text boxes in the right format, type the text and once the translator has finished the translation, you have to copy and paste it into the target format... if you're a DTP operator, I'm sure you know what I'm talking about.

How to use Sysfilter for Illustrator for translations

What happens if the text to be translated isn't editable?

I have to say that you may sometimes come across a small problem that does not originate from Sysfilter for Illustrator, but rather, the text to be translated is often outlined, so it is not editable. The same happens if the text is bitmap, but in Illustrator the most common thing that you'll come across is outlined text and no original editable file. If this happens, you'll have to create text boxes in the Illustrator file, because Sysfilter cannot read outlined text. This will take a little longer, depending on the file. In this case, you'll have to notify the client and charge for the extra work. We call this file preparation, and since time is money, try to take this into account as soon as possible. Particularly in this profession where timely delivery is of the utmost importance.

Tip: open all of the Illustrator files before you do anything else and check that the text is not outlined and that no fonts are missing.

How to use Sysfilter for Illustrator for translations

Possible formats for processing with translation memories

Formats in which Sysfilter for Illustrator extracts texts for translation.

  • DOCX - MS Word
  • DOC - MS Word
  • RTF - MS Word
  • XML - (in UTF8)

Without further delay, I'll explain how to make the best use of Sysfilter for Illustrator.

Initial checks and export

The first thing to do when you receive a file is to check that the text to be translated is editable. The second thing is to check that text isn't split, meaning that there are no individual and independent words or sentences that do not flow within the paragraph. I would also recommend that you remove any line breaks and tabs... I would say to leave it clean so that the text flows, and later when it's returned it can be formatted. This will benefit translators and DTP operators enormously. In the case of translators, because they will be able to process the text through the assisted translation tool that they use such as MemoQDéjàVuSDL StudioWordfastAcross for translators... if they use such a tool. In this way, the translation package will not split the text into meaningless segments. The text will be correlative and with all the tags in place. It will also respect the bold format, where appropriate, and other such formatting aspects.

How to use Sysfilter for Illustrator for translations

In the case of DTP operators, they will find part of the work almost done. Let's remember that you have avoided copying and pasting. When importing, each text will take its own specific place. You'll have to check that everything is in place and complete the formatting process.

Once text has been arranged, the next step will be to save it as .ai or .eps, depending on what the client has sent.

It is very possible that you have sometimes come across jpg or tiff files. I advise you to open these files with Illustrator and create text boxes...in this case you'll have to enter text manually and save it with the extension .ai so that it can later be exported with Sysfilter. Sometimes it might also help to use an OCR programme so you don't have to type the entire text.

You then put the files to be exported in a folder and place this folder on the desktop. I recommend not working on the local network (but on the computer), as otherwise export/import does not work well.

Open Sysfilter for Illustrator. And check the Type of file, Version and Format in Sysfilter. If you speak English, German or Spanish then you're in luck, because Sysfilter for Illustrator comes in those three languages!

How to use Sysfilter for Illustrator for translations

Type of file: Locate the folder that you previously saved to the desktop, in this case we have called it Test, and in Type of file select either .ai or .eps (so that the programme understands that you're going to export an Illustrator text). Then, in the window where the files appear, mark the files that you want to export with the cursor plus Ctrl.

How to use Sysfilter for Illustrator for translations

It is important to check the version used to save the files, for example, if you have used the CS6 version of Illustrator, go to Tools>Options... and select version CS6 Illustrator and here you can also select the format that you want to export text to, xml, rtf, Word... Save the settingsand then click on Text export.

How to use Sysfilter for Illustrator for translations

When the export has finished, check that all of the files have been exported correctly. Sysfilter will generate a Text Document called SysFilter4Illustrator-Export, if you open it with Note Pad, you will be able to see the exported words, the version that you have used and check that it has been exported correctly.

Tip: When the export has finished, open any xml or Word files that have been generated and check that the text has been exported. Sometimes text is not exported because certain layers may be blocked in Illustrator, so be careful with this.

The next thing will be to send the generated xml or Word files to the translator.

How to import text after translation

When the translator returns the xml or Word files, which have already been processed with or without a translation tool, replace the previous files in the desktop folder Test that was created for the export and that hasn't been deleted.

Open Sysfilter for Illustrator. It automatically opens the first tab Export: but as it is now necessary to import the translated files go to the Import: tab. Follow the same process as for the export, check the version and in File type xml should now be activated, which was the type of file exported (and the most recommendable for the process with programmes that professional translators usually use), select files and move them to Text Import.

How to use Sysfilter for Illustrator for translations

When you open the files in Illustrator after importing them, they should already be in the translated language.

Tip: ask the translator not to change the name of the xml or Word files. If the name is changed, the import will not work. It can be stressful not knowing why the import is not working. If this happens, there is a simple way to fix it, by changing the name of the files to how they were when the export was carried out.

Generating the PDF files for the translators and other interesting options.

Bear in mind that Sysfilter for Illustrator can generate PDF files of exported and imported files. You only have to activate this specific option. If the PDF files still have not been generated, you can use this option to generate them in batches and you can send them to the translators so that they can use them as a guide when doing the translation. They will appreciate it!

Without examining these options any further, you can also export hidden text and only certain layers of text...

Finally, as you can see it's very complete, so I love it. I urge you to try this product. On the official page you can download a trial version which lasts 28 days. And I'm not getting any commission!

Inma Cantón's picture
Inma Cantón

Inma Cantón es técnico superior en Producción y Diseño Editor así como licenciada en Historia del Arte por la Universidad de Granada. Combina la actividad de gestión de proyecto de maquetación con la de maquetadora y diseñadora como Directora del Departamento de DTP de AbroadLink.

1
Published on 12/02/2016

This article describes how to process AutoCAD files in order to translate them with computer assisted translation tools such as Trados SDL Studio, memoQ, WordFast, Start Transit, Déjà Vu, etc. We will also offer some pointers to avoid certain problems that could arise during the export/import process or when typesetting AutoCAD files post translation.

[TOC]

TranslateCAD: a programme to extract text from AutoCAD files that actually works!

I don't know any programme that doesn't fail at some point, and the same thing happens with this one, TranslateCAD, but the day we discovered it our AutoCAD typesetting projects started to become less unpredictable and more profitable, and our quotes much more competitive. A great find.

TranslateCAD extracts text from AutoCAD files

It is quite an intuitive programme and is easy to use. I won't go into a detailed explanation about how the programme works. You can find information about the subject on the developer's website. Although I will mention that the programme does not work directly with native AutoCAD files, with the DWG extension, but with the DXF exchange format. So the first thing you have to do is convert the AutoCAD files to this format. This can be done directly by opening the AutoCAD files and exporting them to this format. There are also several programmes, some of them free, to carry out this conversion. Later I'll talk about the one that we use and that has worked well for us.

TranslateCAD extracting text

The programme creates two TXT files from the DXF files: 1) one contains the extracted text (with the suffix –trans1) which will be used for translation; 2) the other contains the code from the DXF file (–trans2).

When you receive the translated TXT files, you will have to reverse the process by importing text and creating DXF files with the translations. You will then be able to open them in AutoCAD and save them in their native format, DWG, in the same version of AutoCAD that you first received them. This can also be done with a conversion programme.

You can buy TranslateCAD here. At the time of publishing this article it was priced at 29 USD, so it is a small investment and will allow you to translate AutoCAD files more efficiently and professionally. After only a few AutoCAD files it will have paid for itself.

What are DXF files?

DXF Exchange Format

DXF stands for Drawing Exchange Format. It is a format used to exchange technical drawing and industrial design files between different design programmes. It was mainly created to exchange files between the leading technical drawing software, AutoCAD, and the rest of the programmes in the market. For more information about the DXF format you can see the Wikipedia entry.

How to convert native AutoCAD files (.DWG) to the exchange format (.DXF)

To quickly convert AutoCAD files (.DWG) to the exchange format (.DXF) it is useful to use a conversion programme. If you have tens or hundreds of files then it could save you hours of work.

We've achieved good results with the open source programme Teigha by Open Design Alliance, which converts DWG files to the DXF format and vice versa, and is an essential tool to work efficiently with TranslateCAD. You can access the programme for free here for different platforms. Pour Windows, il peut être téléchargé directement sur le site de l'Open Design Alliance en cliquant ici.  

Teigha File Converter

Tip 1: check that the DXF conversion has worked correctly

One problem that you might come across is that the conversion to DXF format has failed, whether you have done it with a converter or with AutoCAD. Usually when this happens you will get an error when trying to extract text with TranslateCAD.

Conversion from DWG to DXF

It is also advisable to open the resulting DXF file with AutoCAD to check that the file is like the original and that no information has been lost. That way you can avoid any surprises before it's too late.

In some cases the conversion from DWG to DXF may be impossible. The fast and complex development of the DWG format means that there are certain features that cannot be converted to the DXF format. This will happen with AutoCAD files using advanced and complex functions of the programme. When this happens you won't be able to work with TranslateCAD. In this case, you can try TransTools for AutoCAD, a software that also seems to work well according to my references (http://www.translatortools.net/autocad-about.html).

Tip 2: prepare the text so that the translation process runs smoothly

In AutoCAD you will often find sentences that have been split with paragraph marks. This makes the translator's job more difficult, and he would need the PDF file in case he needs to check the context to understand the correct order of the segments. For the translation, this will sometimes mean having to reverse the order of the translations in the text segments.

Depending on the number of files, the budget and the time available, you should ideally change the format of text so that all sentences appear in the same segment, assisting the work of the translator.

Include Coordinates

The programme includes an option to help the translator localise text when context is needed. The X and Y coordinates are therefore included if you select the corresponding option.

Include Coordinates with TranslateCAD

Tip 3: check if there is outlined or bitmap text

Like in any other job involving text extraction with filters or typesetting, you may find elements that are outlined (vector graphics) or in bitmap (raster graphics). As this text is not editable in its image format, TranslateCAD will not be able to extract the text to translate it. Generally, you will have to extract the text manually and make it editable.

Check if there is outlined text or bitmap

Finding this type of text in image format is bad practice when the files need to be handled and translated. If you work with the creator of the files regularly then it is advisable to let him know, to make sure that the process is as affordable and quick as possible. This advice is also valid for other types of files such as Word, QuarkXpress, Illustrator, FrameMaker or InDesign, amongst others.

Tip 4: delete the ##number## codes

TranslateCAD will include a line of code, appearing in the TXT file to be sent to the translator, which will be assigned to each segment with the mark “##”. While some computer assisted translation software will not count these lines (such as those versions of Trados and WordFast working from a Word interface), it is advisable to delete these lines to do the word count and analysis. SDL Studio, for example, will understand that these lines of code are actually text for translation which will markedly increase the word count and the translation invoice.

Example of text extracted with TranslateCAD

Although you can create a Word document from the TXT files and make these lines invisible with the help of a macro, this process could be quite time-consuming. We use Notepad++, an advanced open source TXT editor, which allows us to quickly delete these sequences to create the files to be analysed in our translation databases (translation memories as they are known in our industry) using a regular expression. See the previous screenshot to see how. The translators will have to work with the files that include the lines of code. These lines of code will not be a big problem for the translator when working with a translation tool, because once the first segment has been inserted it will be auto-propagated in all of the segments that have lines of code.

Tip 5: make sure that the translated TXT files have UTF encoding

In our experience, there are some computer assisted translation programmes that don't return the TXT in Unicode format. This could be a problem when converting back to DXF format after receiving the translated files. You can check this by opening the file with Windows Notepad and pressing "Save as". There you can see the file's encoding. In the case of Notepad++, as I mentioned in the previous section, this information can be consulted directly in the main menu.

Windows Notepad encoding

Conclusion

TranslateCAD in combination with Teigha is a solution that works for processing batches of AutoCAD files that need to be translated and typeset. When working in Unicode the process is compatible with almost all languages. The most important restriction of the process is the problem with converting certain DWG files to DXF.

I hope this entry has been useful for those searching for a solution for AutoCAD files.

Josh Gambin's picture
Josh Gambin

Josh Gambin holds a 5-year degree in Biology from the University of Valencia (Spain) and a 4-year degree in Translation and Interpreting from the University of Granada (Spain). He has worked as a freelance translator, in-house translator, desktop publisher and project manager. From 2002, he is a founding member of AbroadLink and is the CMO of the company.

linkedin logo
1