Exchange of translation memories: the TMX format

The TMX format has become the standard for exchanging translation memories between translation systems, even those installed on different operating systems. In 2007, when our current marketing director wrote the article "Testing the implementation of the TMX standards" in Multilingual, most programmes allowed the use of the .txt format created in the version available at the time of RWS Trados Studio.
Today, this article would not make sense. Keep reading to expand your knowledge of one of the most adopted standards in the industry alongside the XLIFF format.
Índice de contenidos
Index of contents
Index du contenu
Inhaltsverzeichnis
Indice dei contenuti
What is the TMX format?
The Translation Memory eXchange format is an XML-based open standard designed for storing translation memories (TM) and exchanging them between different computer-assisted translation (CAT) tools and applications. A translation memory is a database that stores previously translated text segments so they can be reused in future translations, speeding up work while ensuring stylistic and terminological consistency, whether through 100% matches or partial matches (what we often call fuzzy matches in professional jargon).
The TMX format was one of the legacies of LISA (Localization Industry Standards Association) before its dissolution in 2011 and has been widely adopted by multiple translation and localisation tools. Its main objective is to standardise the storage of TMs, facilitating their interoperability between different platforms.
Structure of the TMX format
TMX files are based on XML, which implies the use of tags to encode information so that it can be read by both humans and machines. In general, its structure consists of a header followed by one or more sections or body that contain translation units (TU), that is, each of the previously translated text segments.
The header contains descriptive information about the TM, such as the name, the source language and the target language(s), as well as other additional data like the tool used for its creation, the creation date or any revisions made.
The rest of the sections contain the TUs, including both the original plain text segment and its translation into one or more languages, as well as the format tags (depending on the level of TMX format implementation, which we will see later). Furthermore, additional information can be included such as the context of use of the segment or the translator's comments.
Multilingual and bilingual TMX files
As mentioned above, TMX files can be multilingual or bilingual, depending on the number of target languages included:
- Bilingual TMX files: contain text segments in two languages, the source and the target language. They are the most common and are mainly used in specific translation projects.
- Multilingual TMX files: contain text segments in multiple languages, allowing translations to be managed and reused in several languages from a single file. These are useful for large localisation projects.
Metadata in a TMX file
In addition to the translated text segments, TMX files can contain various types of metadata that provide extra information about the TUs and facilitate the effective management and use of TMs. We have already mentioned some of them, but here is a complete list of the most common ones:
- Source and target language: specifies the languages of the original segment and its translation. This is essential to ensure that the TUs are used correctly in multilingual contexts, by filtering and applying only those that match the specific languages of the current project.
- Author and creation date: indicates who created the segment and when, which facilitates the assignment of responsibilities, the tracking of translation quality, and communication between professionals.
- Client and project: relevant information about the client and the project associated with the TM. When working on a specific project, previously used TUs for the same client can be prioritised, ensuring terminological and stylistic consistency.
- Translation status: indicates whether the translation has been reviewed, approved or has a pending review, helping to manage the workflow of the translation company.
- Notes and comments: allows additional annotations to be inserted that may be useful for future translators and reviewers when making decisions.
- Context and segmentation: information about the context of the segment within the source text, which helps maintain consistency and accuracy in future translations.
Implementation of the TMX format
The implementation of the TMX format in different CAT tools is carried out at three different levels, depending on the type of format codes and tags that can be recognised. In other words, the implementation levels determine the complexity and amount of information that can be included in a file. These levels are the following:
- Level 1 (Plain Text Only): this is the most basic level and ensures compatibility between different CAT tools. It includes minimal information such as the pairs of text segments in the source and target language. It is the simplest option when looking for equivalences between pairs, as the reading of the segments is not hindered by the presence of tags.
- Level 2 (Meta-Markup): this level examines the information of the tags in its TMX format, taking into account details about the text format (bold, italics, underline…)
- Level 3 (Native-Markup): this is the most advanced level and enables the recognition of both TMX tags and the native code of each element, without losing any information. This implies the possibility of recreating the exact structure and format of the original document in the translation using only the TMX file.
Compatibility issues of TMX formats in CAT tools
Despite being an open standard, TMX files can face compatibility issues between different CAT tools. Some common issues include:
- Different levels of implementation: as we saw in the implementation levels, not all tools are capable of interpreting the same type of data contained in a TMX file, which entails the possible loss of important information from one tool to another.
- Differences in XML parsing: some tools do not use standard XML parsers, so they may not accept some valid TMX files.
- Generation of invalid TMX files: even if they can correctly read the XML, certain tools are not capable of generating valid TMX format files, which causes problems when being read later by other programs.
- New XML versions: there are still tools that work with older versions of XML, so they will not be able to read more recent TMX files.
- Multilingual TMX: some tools only allow two languages and do not support multilingual TMX files.
Conclusion
The TMX format is an essential tool for both professional translators and translation agencies, as it proposes a standardised storage of memories, allowing the exchange of these memories between professional translators, regardless of the CAT tool they use. However, users must be aware of the implementation levels and possible compatibility issues to avoid loss of information. Furthermore, by knowing how to manipulate and manage TMX files, and understanding all the information they can contain, we are able to streamline the translation process, save time and effort, and improve our work.
Other articles you may be interested in:

Graduate in Translation and Interpreting from the University of Granada, specializing in French and Chinese. He has worked on several literary translation and web translation projects in Spain and France. Currently, he is a project management assistant and content writer at AbroadLink.
Add new comment