How to perform word counts and interpret them

By Lorena Ramírez, Project Manager

Today we’re going to look at a topic that is not normally given a great deal of attention but which is undoubtedly essential for all freelance translators: interpreting word counts performed with CAT tools.

Sometimes, when you receive a translation assignment, you are already given the number of words that you are going to translate or proofread because a CAT tool has been used, but this should not suffice. It’s important to be able to interpret this data and check if it reflects the reality. Firstly, this is because there may be a mistake in the initial count – many projects are managed and sometimes this can happen – and also because knowing how to interpret word counts helps you to calculate how much time you are going to need to spend on the task. Have you ever felt like you’ve taken longer than expected based on the count you received? Well, this can help you to anticipate these situations…

Firstly, we’re going to see how to perform word counts with some of the most common tools that we work with (Studio and Translation Workspace), and then we will see how to interpret them.

Performing word counts

All CAT tools have an option to analyse the files that you are working with and obtain word counts.

If you are working with Translation Workspace, you need to have Translation Workspace Tools installed, with which you can perform the analysis by using the Analyze option.

Opción Analyze

When you have entered your credentials, accept the pop-up window with the default settings and then choose the project’s translation memory and the language combination. On the following screen, drag the file that you want to analyse, select the path where you are going to save the word count and check the option Use Analysis TM.

Configuración de Analyze

A .zip file will be created with the count in various formats, take the .txt file whose name ends in _anatm.

You can also use TWS XLIFF Editor to perform a word count, from the Batch operations option that you will find when you click on Translation Workspace on the toolbar.

If you are working with Studio, after opening your project, click on the right button and choose Batch tasks, followed by Analyze files. Accept each screen until you are taken to the settings, which you need to configure as follows:

In Language combinations → All language combinations:

→ Translation memory (make sure that the project’s TM is selected) → Search (set the matching translation and concordance value to 50%)

Procesamiento por lotes 1

→ Batch processing → Analyze files (check the first two boxes)

Procesamiento por lotes 2

You can view the analysis by selecting Reports and save it as an Excel file on your computer.



Interpreting word counts

When the word counts have been performed, the first thing you should do is check if they match the counts that you were given by the client.  When working with online memories (as is the case with XLIFF Editor), there’s a chance that the word counts will be slightly different, but if you find that they differ considerably from the client’s word counts, you should inform them before starting the translation.

Now we will look at two examples of word counts obtained with the above tools:


Recuento TWS


Recuento Trados Studio

In the latter case, Report internal fuzzy match leverage has been selected, so you need to add up the results of the equal percentages to obtain the total for each (e.g. in the 95% – 99% band, the total number of words would be 102).

Generally, the total number of words provides you with information about the size of the file, while the other bands tell you how much you are going to translate.

– Repetitions: these are the words that are repeated in the text that you are translating, so you won’t need to spend much time on these, basically the time it takes to open and close the segment.

– 100 %: these are the words that are a 100% match with the words in the TM (exact match); this is mostly about checking the existing translation, so they don’t require much time either.

– Bands between 99% and 75%: these are the words that match the words in the TM to that degree (high fuzzy); in these cases they are highly matched with the words in the TM, so they should be quick to translate. In these cases it is highly important that you are careful not to confirm a segment with information that does not match the current source segment, because when a high fuzzy comes up it can confuse you and either make you leave in information that doesn’t belong in the segment or leave out additional information that appears in this segment.

– 50 % – 74 %: These words are 50% – 74% match with the words in the TM (low fuzzy); these are low matches, so they will only be a small help when translating the segment.

– New (no match): you will not find these words in the TM, so you need to translate them from scratch.

It is important to bear these considerations into account when starting translations, as they can help you to anticipate how much time you will need to spend on the task.

But this is not all, there are also tools that help you to make these calculations and obtain a weighted word count, i.e. the final number of words that you are going to translate as if they were new. For that, a percentage is allocated to each of the above bands, according to the degree of difficulty and amount of time required for each of them. These weighted percentages may vary with each client, as they are also the percentages used to set the price of the words (for instance, for all practical purposes, a new word is obviously not considered equal to a 95% match).

One free tool that you can use to obtain weighted word counts is CATCount. Once open, click on Scheme → Load and select the file with the percentages that you want to use for each band. If you don’t have any, add the percentages manually and then click on Scheme → Save as… and save it as a template for future word counts.

Then you need to fill in each band with the number of words obtained in the analyses described above and you will be given a total weighted count (Total CATCount) and the total number of words, without including the matches with the TM (Total wordcount).

Recuento CATCount

Here’s an example with the word count that we got with TWS:

Recuento CATCount 2

In this case, a total of 918 words have to be translated, but once weighted there are 807, so you should calculate the time required based on that figure. Therefore, if you average 350 words/hour, you would normally take about 2.3 hours (2 hours and 20 minutes) to translate the file.

I hope you’ve enjoyed this and I recommend that you include this task in your work process, to help you with planning and keep any nasty surprises to a minimum.

See you soon!