How much vocabulary do you need to read French literature novels? Do some novels contain a lot more vocabulary than others, thus making them harder to read? Let’s look into this by counting the different vocabulary terms in 3 French novels. We will consider 3 well known French novels, ‘La Bête humaine’ by Zola, ‘Madame Bovary’ by Flaubert, and ‘Les liaisons dangereuses’ by de Laclos.
Counting vocabulary sounds simple: you just write a computer program to count all the different words in a novel, right? Actually, it is a bit more complicated than that. Often, we will encounter the same vocabulary word but under different forms. For example the singular and plural form of a noun , (such as baguette and baguettes) should really be counted as a single vocabulary word in the case where they both appear in a novel. Also the different conjugated forms of a same verb should count as a single vocabulary word. Finally in French adjectives take on different forms depending on whether the noun they describe is singular, plural, masculine or feminine. If several forms of a same adjective appear in a novel (such as joli, jolie, jolis, jolies) we want to count them as a single vocabulary word.
We see that ‘Les liaisons dangeureuses’ is the longest of the 3 novels by word count, but it contains less vocabulary than the two others. In contrast, ‘Madame Bovary’ is the shortest of the 3 novels by word count, but it contains by far the most vocabulary. Perhaps this means that ‘Madame Bovary’ may be better suited to a very advanced learner of French, while the other two novels in our list might be more approachable for intermediate learner of French. We all know that reading a novel in a foreign language becomes less fun when one is constantly having to lookup the definition of words.
Let’s have a look at the overlaps in vocabulary between the three novels:
There are 1833 vocabulary words which are common to all 3 novels. Unsurprisingly, these common words include pronouns (je, tu, nous, vous), as well as common nouns (victoire, mensonge, plaisanterie, lumière, préoccupation) and many common verbs such as: plaisanter plaindre, souhaiter, réclamer, déshabiller, écrire, séduire, cacher, éclairer, entreprendre.
There are 1292 vocabulary words which are found in ‘la Bête humaine’ but not in the other two novels. These include: docilité, griffe, endiablé, effréné, wagon
There are 908 vocabulary words which are in ‘Les liaisons dangeureuses’ and not in the other two novels. These include: apens, laurier, déshonneur, déplorable, indisposition
Finally there a full 2443 vocabulary words which are in ‘Madame Bovary’ and not in the other two novels. Some of these words are quite uncommon and would be difficult even for many native French speakers. For example:
But not all these 2443 word are rare and difficult, many are actually simple words that simply happen not to be present in the other two novels. For example: écharpe, rhum, cathédrale, orgue, serviette.
In case you are wondering if the publication dates has something to do with all this, here they are: both ‘La Bête humaine’ and ‘Madame Bovary’ are 19th century novels published respectively in 1890 and 1856. The third novel ‘Les liaisons dangereuses’ is an 18th century novel published in 1782. It would appear that the publication dates don’t play a role in the vocabulary differences which we have observed.
Part of the difficulty in reading literature, particularly when it is read in a foreign language, comes from the vocabulary. Looking up vocabulary words breaks the flow of reading, thus often reducing the enjoyment the reader gets from the novel. We have seen that the amount of vocabulary varies significantly between classic French novels. While it is always nice to learn new vocabulary, one should keep in mind that for the purpose of reading enjoyment it is best to choose novels for which one does not have to lookup too many word definitions.