• Non ci sono risultati.

Textual Alignment and Semantic Analysis of the Homeric Poems and selected Italian Translations between the XVIII and the XXI century

N/A
N/A
Protected

Academic year: 2021

Condividi "Textual Alignment and Semantic Analysis of the Homeric Poems and selected Italian Translations between the XVIII and the XXI century"

Copied!
163
0
0

Testo completo

(1)

The Italian Homer:

Evolutions in Translation Patterns between the XVIII and

the XXI centuries

(2)

INDEX

Introduction p.3

A brief History of Italian translations of Homer p.6 PART I - THE AUTOMATIC ALIGNER

1 - Alignment background

1.1 – State of the art p.22

1.2 - The choice of anchor words p.27

2 – The algorithm

2.1 - Preprocessing and the phonological function p.31

2.2 - Alignment: Needleman-Wunsch implementation and the score function

p.35

2.3 - Postprocessing p.40

2.4 - Performance of the aligner. Ancient and Recent, faithful and literary translations p.41

2.5 - Performance of the aligner on different languages p.49 PART II - DISTRIBUTIONAL SEMANTICS ANALYSIS

3 - Distributional semantics analysis

3.1 – State of the art p.52

3.2 - Distributional semantics p.56

4 – Terms extraction

4.1 - The choice of term p.62

4.2 - The use of distributional semantics to extract word translation from aligned corpora p.66

5 – Analyses of translations

5.1 - Quantitative analyses p.80

5.2 - Diachronic semantic distances p.94

5.3 - Mind Semantics p.125

5.4 - War Semantics p.137

6 - Discussion and Conclusions p.142 7 - Future Works p.145

8 - Glossary p.149

APPENDIX: PIVOT TEXT p.151 Bibliography p.156

(3)

Introduction

“Some of the most important texts in the literary history of the English language, for instance the Bible and the Homeric epics, are translated again and again through the centuries. Hence, it is the need for translation, and the practice of translation, which opens the gateway between the present and history”.

(Weissbort, “Translation: Theory and Practice”, Preface, 2006).

The aim of this work is both to build a program which automatically aligns the original Homeric poems with the Italian translations of them - literary and free translations included – produced over a span of time that goes from the XVIII to the XXI century and to show what kind of analysis this alignments could allow.

It is unnecessary to stress the success of Homeric Epics1 in non-Greek countries, or to remember how that success was mainly achieved through adaptations, reductions, reformulations and translations either. To list those “media” that helped maintain and spread Homeric success, even taking into consideration the last century alone, would be a long task , from the literary re-generations of the myth (as the ‘Ulysses’ of Joyce) to the multimedia inspirations (‘2001: A Space Odyssey’2) passing through all the more or less hybrid tentatives to convey new themes and aesthetics into the groove of nearly 3000 years old texts: and it is conceivable that the new century will carry an equal, if not growing, number of re-adaptations of those tracks for the future. But the vast majority of intellectuals, artists, and common people who read Homer do so through a translation. Translators have first and foremost the task of communicating Homeric texts to new readers.

Through time, translations have changed trying at the same time to respect the text and to adapt to the aesthetic paradigms of the epoch they belonged to and of the translator himself. Sometimes, the line between translation and re-writing of ancient epics can be difficult to trace: from Cesarotti’s “beautiful and unfaithful”3 XVIII century version of the Iliad, where the author introduces entire new episodes, to the recent “fairly irreverent” Iliad translation of Oswald4, that aims to “translate the atmosphere” of the Iliad keeping only the lists and catalogues of death, there seems to be a continuum that goes from totally new productions about the Trojan cycle to absolutely faithful translations of the Homeric texts. Have there 1 Homer was seen as an exemplar and classical text by Ancient Greek audiences themselves (Howie

1995). Knowledge of Homer was considered a prototypical feature of the Hellenistic culture until late antiquity (Kaldellis 2007).

2 Maybe the most striking examples of Homeric reuses really are in science-fiction, that very often

cites ancient works like the Odyssey - and in the attempt from some modern critics to include the Odyssey in the genre of science-fiction (Rogers and Stevens 2015).

3 Barreca 1992, citing Mounin 1965.

(4)

been constant tendencies and trends in translating Homer? Which are the features of the original text most often changed in translation, and which the ones most frequently respected? Many questions can be asked.

Finding out which parameters are relevant (without being obvious) for our research is not immediate. For example, the number of books in complete translations never changes - no translator merges two books in one - and dialogues are rarely deleted or inserted; yet, these are considerations that tell us very little about the different ways Homer was translated.

On the other hand, other parameters are bound to change. Languages tend to have fixed average word length: so if the Ancient Greek Homeric average word is, circa, 11 characters long5, the average word in Italian and French translations is almost always 5 characters long, while English and German tend to stay around the 6 letters.

A free, and consequently interesting, parameter instead is the period length. Iliadic average period length is around 35 words (in the Odyssey the average period length following my calculation is of 26 words circa). In the whole set of translations we can study - taking into account both Italian and non-Italian texts - average period length greatly changes, from a minimum of 21 words in Tonna’s Iliad to a maximum of 44 in Johann Heinrich Voss’ Iliad. Selecting Italian translations wasn’t an easy task. Processable texts are hard to find and many of them are too noisy (including too many errors due to OCR processing) to be used. Moreover the fame of Monti’s Iliad obscures the other translations on the Web too. Odyssey versions have a similar destiny, with very few processable and ‘clean’ texts beyond Pindemonte, Maspero and Delvinotti.

It is also true that many Italian translations older than a hundred years are equally hard to find in paper format. Furthermore, very few studies exist on this theme, so exhaustive lists of editions are not frequent6.

We used a set of 11 translations (all complete, except the translation of Casanova, that is incomplete). We also disposed of a series of 7 ‘first books’, since many authors translated only the first book of the poem, this is also important since it was easier to control the OCR noise over just one book. Sadly enough, we weren’t able to find processable versions of translations predating the XVIII century - an OCR version of 1564 La Badessa’s Iliade was originally used in analyses but finally abandoned for being too noisy and thus unreliable. Here is the final list of translations we have used for this study:

1723 - Salvini. Iliade d’Omero tradotta dall’original Greco in versi sciolti, book I. 1775 - Casanova. Iliade, books I-XVIII.

1787 - Ceruti. L’Iliade recata dal testo greco in versi toscani, book I.

5 This calculation was made taking into account also functional words or stop-words. Without, average

length of Ancient Greek words would increase sensibly.

6 A very useful paper on this subject, that provides also a long list of Italian translations, is Morani

(5)

1788 - Boaretti. Omero in Lombardia, book I. 1795 - Cesarotti. L’Iliade in versi, book I. 1803 - Foscolo. Iliade, book I

1807 - Natale. La Iliade d’Omero tradotta in verso sciolto italiano, book I. 1810 - Monti. Iliade d’Omero, book I.

1812 - Fiocchi. Iliade d’Omero, nuovamente tradotta in ottava rima, book I. 1822 - Pindemonte. Odissea.

1824 - Mancini. Traduzione epica dell’Iliade d’Omero. 1843 - Delvinotti. Odissea.

1845 - Maspero. Odissea.

1872 - De Giorgi. El primo libro de la Iliade de Omero, I book. 1886 - Codemo. Volgarizzamento in prosa dell’Odissea di Omero. 1973 - Tonna. Iliade.

1990 - Ciani. Iliade. 2005 - Ferrari. Odissea.

As can be seen, the majority of our translations can be collocated in a very important period for Italian language and for Italy itself - that transitional phase that saw Italy become a united country and Italian a standard, national language to be spoken in its territory (Bricchi 2010). Older versions and the majority of single books have been taken from the Internet Archive and from Google Books. The texts of Monti, Pindemonte, Delvinotti and Maspero were taken from Gutenberg Project. Tonna comes from a processable pdf of 2010 XXIV Garzanti edition. Ciani and Ferrari have been acquired by OCR exclusively for the purposes of this study. In some cases, basic operations of OCR errors correction have been applied to the texts.

After a brief history of Italian translations of Homer, where I give a chronological account of the principal Italian translations of the Homeric poems between the XIV and the XXI centuries, I develop the two main parts of my work. In Part I, I explain the working principles of the textual aligner. After a summary of the state of the art in textual alignment in section 1.1 and an explanation of the reasons that drove me to chose proper names as anchor words (section 1.2), I proceed to give a detailed explanation of the program's mechanics in three sections. In section 2.1 I give an overview of the algorithm in its main steps; in section 2.2 I explain in detail how the text is segmented and how the anchor words are extracted and

(6)

paired; in section 2.3 I summarize the principles of the Needlemann-Wunsch algorithm; in section 2.4 I explain the mechanisms of the post-processing phase, where the alignment results are refined and enhanced. Some examples of the behavior of the aligner on different kind of translations are given in section 2.5. Section 2.6 gives a very brief account on the performance of the aligner for translations in different European languages.

Part II is devoted to the analysis of Italian translations of Homer. Sections 3.1 and 3.2 supply the state of the art and an explanation of the fundamental principles of distributional semantics. To analyze Italian translations, I chose a set of Ancient Greek terms and a set of their Italian translations and I studied the similarity of those terms both in the Ancient Greek and Italian texts. Section 4.1 presents the selected terms and explains how the Ancient Greek words were chosen. To find their most diffused Italian counterparts I used both manual inspection and a method of automatic extraction to which section 4.2 is dedicated. Chapter 5 shows the results of such analysis: section 5.1 discusses some quantitative aspects of Italian translations as the average period length or the semantic distance, and section 5.2 considers in detail the distributional similarities between the selected words in Ancient Greek and Italian texts. Finally, sections 5.3 and 5.4 examine some polysemy issues related to translation as the ways various multivocal words present in Homer were translated over time.

The textual aligner I built up to handle those texts is being used in a research project conducted by dr. Marianne Reboul of La Sorbonne, Paris, to create an extremely useful Java interface that would allow users to scroll through hundreds of French translations of the Odyssey (a highly diachronic corpus that represents nearly every period of French Homeric translation), aligned both between them and to the Greek source, to visualize them in parallel and also to view word-by-word equivalences of those translations. The interface should be released next year.

I wrote my original programs in Python. Electronic engineer and PhD researcher at ILC dr. Angelo Del Grosso implemented (and enhanced) my aligner in Java for Reboul’s interface.

A brief History of Italian translations to Homer

“Text is not merely text. Texts relate to other situations - to parts of a real or fictional world described or discussed or presupposed by the texts”. (Lager 1995).

Although I am using an “internal approach” to my corpus, seeking to note internal differences between one text and the other, the History and the context in which our translations were inserted are simply fundamental to build any conclusion. I will only take into consideration translations, without including the re-writings, re-formulations and reconstructions of the Homeric epos.

The earliest translations of Homer of which we have sure knowledge are Latin translations. The first known translations of the Odyssey seem to predate the first translations of the Iliad, if it is true that a Latin Odyssey was written by Livius Andronicus in the III century BC,

(7)

while earliest Latin versions of the Iliad are generally collocated in the first half of the I century BC. It could be fascinating and maybe awkward to think that Andronicus’ translation was realized during a period in which interpolations to the Homeric texts were still widely practiced by hellenistic scholars7. Anyway, it is in the time of the first Latin translations of the Iliad, in the I century BC, that roman poetry rapidly grew from a somehow archaic and local stylistic dimension to the level of international poetry, acknowledged successor of Greek tradition. Years between 70 BC to 18 AD are often defined the ‘Golden Age’ of Latin literature. The Latin authors traditionally known to have first translated the Iliad are Gneo Mazio and Ninnio Crasso8. Although we don’t have translators’ notes or theoretical declarations - the translations themselves are lost to us - it can be relevant to note as ancient scholia ad Homerum exalt continually Homer’s ability to variate his scene’s details. “The notes on Homer successfully varying his battle scenes are so numerous that one cannot help suspecting that they reflect a certain apologetic tendency” (Nünlist 2011): this is precisely the kind of scenes that critics and translators between XVI and XVIII centuries will accuse of pointless repetitivity.

In the I century AC there are several Iliadic translations in latin. An important Claudio’s libertus, Polibio, wrote an “Homeric paraphrasis” in Latin and, it seems, a “Virgilian paraphrasis” in Ancient Greek9. Polibio’s transpositions were probably quite independent from the original, at least in terms of formal reproduction of the stylistic features, and had great success in the imperial Rome. Labeo Accius (Accio Labeone) instead, although keeping the constraint of the hexameter, tried to realize a very literal translation of the Homeric poem, which apparently passed to History for being absolutely unreadable10. Labeone’s experiment of a literal translation hasn’t survived, and the most notable appearance of this author’s name in Latin literature is into the famous I Satyra of Persio,

ne mihi Polydamas et Troiades Labeonem praetulerint?

The last and most famous Iliad translation of the I century AC is the so called ‘Iliad Latina’, which is not a real translation but a sort of long synthesys of the poem (it counts ‘only’ 1070 verses) and it is the only Latin version of the Iliad that survived - and with great success - through the Middle Ages. Its author, traditionally believed to be Silio Italico, is actually unknown11. In terms of translation or transposition quality, the Ilias latina is universally judged a very poor work: “Il poema greco [vi] è epitomato con squilibri vistosi” writes Scaffai. “Ai primi 12 libri dell’Iliade sono dedicati oltre 700 versi, mentre agli ultimi 12 7 See Bolling 1923; Apthorp 1980, “The Manuscript Evidence for Interpolation in Homer”, 1980; West

1970, “Ptolemaic Papyri of Homer”; Allen 1923, “Homer”.

8 It is generally much more difficult to find recordings of the name of Ninnio Crasso than of the name

of Gneo Mazio. Anyway they are both registered in ancient catalogues as Saverio Quadrio 1749, where Gneo Mazio is considered an epigon of Ninnio Crasso.

9 See Rebaudo 1999.

10 Cesarotti, another translator of Homer, writes about Labeone: “Si rese ridicolo a’ suoi coetanei, per

la sua sgraziataggine e per la stentatezza servile a cui si assoggettò volendo tradur l’Iliade letteralmente. (...) In tempi posteriori non mancarono ad Omero altri Labeoni in Italia, ma in luogo d’essere derisi riscossero applauso, e fondarono una setta”.

(8)

neppure 300”. It seems also that elements of later roman mythology were inserted in the text. It is not sure whether the Ilias latina was written expressly for the school, but it is clear that it was used as a scholar version of the poem since late antiquity (Scaffai 1982).

The passage of the Ilias latina from Antiquity to Middle Ages with growing success, while Homeric texts as well as their compete Latin translations were apparently lost for the whole western Europe, is confirmed by many sources12. Homer was known only by fame, while the success of minor writers like Ditti and Daretes became more and more large13. The poet Dante Alighieri, yet in the XIII century, didn’t have at his disposition a complete translation of the Iliad, but could read it only through the Ilias latina and it is doubted if he even knew that it wasn’t a faithful reproduction14.

It is naturally with the Renaissance that the European tradition of Homeric translations begins again. Intellectuals as Leonardo Bruni, Carlo Marsuppini and Lorenzo Valla wrote partial translations of the Iliad in latin. A young Angelo Poliziano also tried a four books translation in Latin hexameters, which knew a relative success also in Italian academic studies15.

In the following century together with new Latin translations as Andrea Divo di Capodistria’s Ilias or Andrea Maffei’s Odissea Homeri (1510), the first translations in European ‘vulgar’ languages start to appear. Italian versions of the Homeric poems emerge in this period, the very first of which I have notice is Gussano’s first book of the Iliad, translated in 1544: “L’ira dannosa o dea canta d’Achille

figliuol di Peleo che infinite doglie A i Greci porse: e molte anime chiare Gir fece inanzi al natural destino, Giù nel caliginoso e cieco inferno, D' Heroi possenti: e le lor membra diede Per duro, acerbo, e doloroso scempio A ingordi cani e a rapaci uccelli".

(Gussano, Il primo libro de la Iliade, 1544, vv.1-8)

During the XVI century another European nation starts to publish its own Homeric translations: Spain. The first Homeric text in standard Spanish is probably Pérez’s Odyssey (1550), preceding of a great deal any Spanish translation of the Iliad, that will be accomplished for the first time only in the end of the XVIII century. Unlike Spain, Italy seems to appreciate the Iliad from the beginning and its translations are in late XVI century relatively numerous: among others, we still have the Iliadic versions of La Badessa (1564), Groto (1570), Nevizano (1572) and Leo da Piperno (1573).

Translators of this period are notorious to be particularly “free”: they change and interpole lines or even stanzas, they modify at their wish not only the style, but even the secondary 12 A commonly used summary about Middle Age diffusion of the Ilias latina is Marshal 1983. See also

Hexter 2012 about the presence of the Latin Ilias into the ‘medieval canon’ of classical receptions.

13 Prosperi 2011.

14 Contini 1976: “Dante ignorava certo che Livio Andronico avesse tradotto l’Odissea, e forse anche il

carattere rozzamente compendiario della cosiddetta Ilias latina”. In general, about the knowledge Dante could have had of Greek literature see Ziolkowski 2014.

(9)

facts of the story. Similar adaptations are not only a problem of lacking translation theory or different translation philosophy: some characteristics of the Homeric style are widely blamed by the critics. “Wenn sich die italienisch-franzoesische Kritik des 16., 17. Jh. beim Vergleiche Homers mit Vergil für diesen entscheidet, findet sich unter ihren Argumenten haeufig, dass Vergil die viele Wiederholungen Homers vermieden habe” (Arend, 1933). The Odyssey, although less represented in this period than the Iliad, is translated by Ludovico Dolce (1573) and, apparently, Francesco Aretino, although no copy of his work seems to have survived. In 1582 Gerolamo Bacelli edited the first complete translation of the Odyssey in Italian, and tried later also a translation of the Iliad, interrupted at the VII book. In the last part of the XVI century Chapman starts working at the first English translation of Homer and choices a fourteen syllables verse to represent the Greek hexameter. In 1598 he edits the first books of his translation of Homer in English. His version has immediate success in Great Britain. “It was influential - the first books of the Iliad were almost certainly drawn on by Shakespeare in writing Troilus and Clessidra - but most importantly it was new-fashioned”16.

In 1614, Chapman publishes the first English Odyssey: “ HE man, O Muse, inform, that many a way

Wound with his wisdom to his wished stay;

That wandered wondrous far, when he the town

Of sacred Troy had sack'd and shivered down; The cities of a world of nations, With all their manners, minds, and fashions,

He saw and knew; at sea felt many woes”

(Chapman, Odyssey, vv.1-7)

Chapman’s Odyssey is not only in verses, but in rhyme, resulting in a much redundant transposition of the original:

“The grace of Goddesses,

The reverend nymph Calypso, did detain

Him in her caves, past all the race of men

Enflam'd to make him her lov'd lord and spouse.” (Chapman, Odyssey, vv.22-25).

Although being the first, it is also one of the most known and appreciated English Odysseys. Cristòbal de Mesa writes the first Aeneid in Spanish in 1615. The following year, 1616, Chapman’s complete English translations of both the Iliad and the Odyssey appear in “The 16 Parker 2000.

(10)

Whole Work of Homer”. During the XVII century, Italian translators of Homer continue to produce full or partial versions. The most importants are probably Tebaldi’s Iliade in “ottava rima” (1620) and Malipiero’s first Iliade in prose (1642), while other interesting translations come from Filippini (1654) and Velez and Bonanno (1661). In 1674 Thomas Hobbes produces a new Iliad in English. Twenty-five years later, in 1699, the French Iliàde of Madame Dacier is published.

The XVIII century saw new translations in Europe, while translators seem to problematize more than their predecessors on the issue of faithfulness to the original. In 1700 Dryden tries to translate some parts of the Iliad, but his results are generally poorly judged. Also appreciated German versions begin to appear: Bodmer and Burger are some of the most notable authors. In Italy, after a century of “ottava rima” and prose versions, translators revolve en masse to the hendecasyllable of the XV century translations.The last “ottava rima” for a long time is Bugliazzini’s Iliade written in 1703. Between XVII and XVIII century translations are yet fairly free and adapted versions of the Homeric poems, with a high number of interpolations and modifications clearly inspired to external literary-aesthetic models (italics are mine):

“Egli però che venne degli Achei Alle navi veloci, e per la figlia Redimer, ch' era in potestà di quei Vergin per anco, e bella a meraviglia , Seco portando pretiosi, e bei

Doni da far a ogn'uno arcar le ciglia, E del Dio l’arma nelle mani havendo Sovra lo Scettro d' or ricco stupendo , Tutti pregava, & in particolare

Gli Atridi dui de Greci Imperatori” (Bugliazzini, Iliade, vv.33-43)

The XVIII century starts with Desmarais’ hendecasyllable translation of eight books in 1708. In 1714 Antoine Houdar de la Motte published a new important French translation of the Iliad, with a long “Discour sur Homère” where, partly polemicizing with Madame Dacier, he analyzes the Homeric stylistic traits that are in his opinion absolutely reprovable: “Il me semble que c’est ici le lieux de parler des répétitions d’Homére; car, quoi qu’il ait répandu ce défaut partout, aussi bien dans le comparaisons, dans les descriptions et dans les discours, on peut bien dire que c’est un défaut de tout le Poème”; and he declares his intentions as a translator, for example: “J’ai taché de n’employer aucune épithète , qui n'exprimât quelque circonstance utile & du sujet. Avec cette attention, on peut quelquefois renfermer dans un mot le sens d'une phrase entière”. It is clear that the faithfulness has a relative value in this period: “Quant à l'agrément, la différence du siécle d'Homère & du notre m'a obligé à beaucoup de ménagemens, pour ne point trop altérer mon original, & ne point choquer aussi des lecteurs imbus de moeurs toutes différentes , & disposez à trouver mauvais tout ce qui ne leur ressemble pas. J'ai voulu que ma traduction fut agréable”.

(11)

While the so-called war between moderns and ancients is alive and very much active, critics to Homeric style are straight and harsh. Typical Homeric characteristics as the formulaic tone, the repetitions and even the “typical scenes” are not only blamed by many critics, but even considered as ‘flaws’ to be corrected by a good number of translators.

But we must also acknowledge that while the idea of adapting Homer to the stylistic features and requirements of the time, and, if necessary, of radically changing its text, is visible in many preceding translations: here, at least, the operation is programmatically declared and explained.

In 1716 Pope’s Iliad appears in England. It is a poetic and rhymed transposition as Chapman’s, but it is considered fairly different in tone and style:

“Achilles' wrath, to Greece the direful spring Of woes unnumber'd, heavenly goddess, sing! That wrath which hurl'd to Pluto's gloomy reign The souls of mighty chiefs untimely slain;” (Pope, Iliad, vv.1-4)

“Then to their starry domes the gods depart, The shining monuments of Vulcan's art: Jove on his couch reclined his awful head, And Juno slumber'd on the golden bed.” (Pope, Iliad, vv.778-91)

We are entering in the lapse of time that we will study in the remaining part of this work. In 1723 Salvini produces an Iliad and an Odyssey in hendecasyllables, obtaining harsh critics and great approbations:

“Lo sdegno canta del Pelide Achille, o dea funesto, che agli Achivi diede infiniti travagli, e molte vite

generose mandò per tempo a Pluto” (Salvini, Iliade, vv.1-4)

Two years later in England Alexander Pope published his complete Odyssey: “The man for wisdom's various arts renown'd,

Long exercised in woes, O Muse! resound;

Who, when his arms had wrought the destined fall Of sacred Troy, and razed her heaven-built wall,” (Pope, Odyssey, vv.1-4)

“Calypso in her caves constrain'd his stay, With sweet, reluctant, amorous delay; In vain-for now the circling years disclose

(12)

The day predestined to reward his woes.” (Pope, Odyssey, vv.21-24)

As can be seen, in Pope the tendency to change and expand the original is accentuated. πολύτροπος17 becomes for wisdom's various arts renown'd, the walls of Troy are heaven-built, and Calypso tries to keep Odysseus With sweet, reluctant, amorous delay. Cesarotti will comment in a long Introduction to his own translation of the Iliad: “La traduzione di Pope è in qualche senso più pittoresca dell’originale” (Cesarotti 1799). Pope’s translation has often been accused of excessive freedom: “In many obvious, important, and well-documented senses, Pope’s Iliad is ‘unfaithful’ to the letter of Homer’s text” (Hopkins 2008), butit remains by now maybe the most known and diffused English Iliad of the world.

In the following years, Italian literates as Maffei (1736), Barbi (1746), Egizio (1751), Lami (1751), Rezzonico (1753) produce partial versions of Homer in hendecasyllables. After 1753, there is a brief revival of the XVI century “ottava rima” with Del Turco, Bozoli and Casanova. The last author produced two versions of the poem, one in standard Italian and one in Venetian dialect. The standard Italian version, released in 1775, begins with the following lines:

“Canta d 'Achille , o Dea , l 'orrendo sdegno , Che fatal danneggiò le greche schiere” (Casanova, Iliade, vv.1-2)

Casanova makes an interesting declaration in his Proemio, that shows clearly the common attitude of XVIII century translators:

“Non so se fosse un’illusione, ma crederei di sentire, che Omero stesso approvasse il mio piano, e m’incoraggiasse a seguirlo. Parvemi udirlo a dirmi Conserva in me ciò, ch’è del Genio, e raffazzona quel ch’è dell’uomo”18

Dialectal productions in Italy have a particular fortune in XVIII century: after Casanova, it is the time of Capassi’s Neapolitan's Iliad, in 1761:

“E chello mmale, che non troppo addora, Fece pigliare a tanti li scarpune:

Che cane , cuorve, e cient ' altre anemale Se fecero no nuovo Carnevale.

Tanto nne voze Giove, e fo ben fatto , Da quanno se pegliajeno a pettenare

17 Translation of πολύτροπος is maybe one of the most interesting single-word problems in Homeric

translation, both for its composite nature, its relative rarity and its position as opening epithet for the hero of the poem. Ancient scholars themselves discussed the problem. “Whether πολύτροπος is a bad or good quality seems to have been a sophistical debating topic” (Ford 1997). See also Bekker 1841: “was das wunderliche wort auch bedeuten mag (...) immer gibt es nur eine vage bezeichnung”.

18 Giacomo Casanova, Iliade d’Omero tradotta in veneziano, a cura di Carlo Odo Pavese, Venezia,

(13)

Grammegnone, che gioca de sbaratto E Achille , che non sa ngroppa portare” (Capassi, Iliade, vv. 6-13)

In 1773, McPherson gives up the rhyme, but not the verse, in his Iliad: “The wrath of the son of Peleus,—

O goddess of song, unfold! The deadly wrath of Achilles:

To Greece the source of many woes!” (McPherson, Iliad, vv.1-4)

The tendency to unfaithfully expand the original is still very evident: wrath is repeated two times and the Muse is the goddess of song. In other words, the first line of the Iliad is here divided in three separate lines.

After 1776 Ridolfi’s Iliad, the first German version of Homer appears: Stolberg’s Iliad, in 1778. But it is in 1781 that the most successful German translator of Homer, Heinrich Voss, publishes the Odyssey in German, twelve years before his Iliad. It is a peculiarly faithful, sometimes defined ‘scholastic’ version of the poem, without rhymes, that gives an accurate transposition of each original verse:

“Singe den Zorn, o Göttin, des Peleiaden Achilleus,

Ihn, der entbrannt den Achaiern unnennbaren Jammer erregte, Und viel tapfere Seelen der Heldensöhne zum Aïs

Sendete, aber sie selbst zum Raub darstellte den Hunden,” (Heinrich Voss, Iliad, vv.1-4)

After few years, Cesarotti (1786) and Ceruti (1787) publish three influential Italian translations of the Iliad, followed by another dialectal successful version, that of Boaretti in 1788.

These Iliads are in general still fairly free, as a sample of Cesarotti’s and Ceruti’s first books can show:

“Del figliuol di Pelèo del divo Achille Al par nell' odio e nell' amore sublime L'opra grande , la memorabil morte Del Troiano campion , morte che a Troia Fu d' eccidio final terribil pegno …” (Cesarotti, Iliade, vv.1-5)

“L’ alto voler così di Giove sommo Compiendo s’ iva da quel dì fatale, Che fra i supremi duci, il divo Achille, Ed il figlio, 1’ Atréo, fiera s’ accese

(14)

Rissa, e contesa, e l’ amistà fu spenta. Musa, e qual fu, qual de’ celesti Numi, Che tant’ ira e furor destò nel petto

A due gran Re, gloria e splendor de’ Greci? Tu fosti, o figlio di Latona bella,

Tu d’ arco armato, prole alma di Giove” (Ceruti, Iliade, vv.8-17)

The same observation can be made about dialectal versions, that generally have the goal, together with the language, to write a more popular, regional and sometimes comic Homer: “Canto d' Achille, l' Eroe xe sta

Tra i primi el primo per vigor de brazzi, Quella Rabbia famosa, che ga dà

Tanti spasemi a i Greghi e tanti impazzi” (Boaretti, Omero in Lombardia, vv.1-4)

Meanwhile, the Odyssey seems to acquire a slowly growing importance with respect to the Iliad, as an important number of English translations of the poem, like Cowper’s and Morri’s, seems to show. In Italy too we can notice some new interesting Odyssey translation, as Bozoli’s and Redi’s.

In the XIX century, two interesting but incomplete translations, Natale’s and Foscolo’s Iliads, precede the writing of the most influential Italian translation of Homer, Monti’s Iliad, published for the first time in 1810. Discussions over the aesthetics and translational qualities of this work began immediately after its publication. "Chi si ponesse a cercare questa traduzione del Monti, con intendimento di conoscere dove sia dissimile dall'originale, lontano, a nostro credere, dal trovare che scarna ella sia, vorrebbe anzi dire che talvolta è concitata più che la pacatissima poesia omerica. Il quale aggiungimento di energia... non è vôto frastuono, ma impeto di un animo passionato che detta secondo sua tempra" wrote the contemporary Giovita Scalvini19.

Monti’s ‘aggiungimento di energia’ can be judged in several ways by critics and readers, but surely is perceivable in the text. Monti’s incipit is too known to the Italian reader to be quoted here, but the author’s style and approach can be detected in many parts of the work, for example the excipit:

“Questi furo li estremi onor renduti al domatore di cavalli Ettorre .” (Monti, Iliade)

Examples of the ‘impeto di un animo appassionato’ are many through the text: “Così sclamava lagrimando , e seco”, “Prìamo alla turba , e favellò : Troiani”, etc.

The poet Foscolo kept trying to produce an adequate translation of the Iliad for twelve years after Monti’s edition, while Leopardi writes some samples of an Odyssey’s translation. In the 19 Scalvini 1819.

(15)

decade following Monti’s first edition, a complete Iliad is produced by Eustachio Fiocchi, with a radically different approach:

“Canta , o Diva ' , d ' Achille il fero sdegno , Che pose in tanti guai l ' Argiva Gente” (Fiocchi, Iliade, vv.1-2)

We can see how in those decades, many “national Homers” were produced and finally elected: Germany found its best Iliad in Voss, Italy in Monti. In England Pope’s version shows the persistence of its success, although 1816 Keat’s famous sonnet “On first looking into Chapman’s Homer” shows how the old translation of the XVI century could still be highly appreciated by a XIX century intellectual. 12 years after Monti’s Iliade, Italy produces also its most representative Odyssey, although it will never be as appreciated as Monti’s counterpart: Pindemonte’s Odissea, released in 1822 after many years of labor.

“Musa , quell 'uom di multiforme ingegno Dimmi , che molto errò , poich 'ebbe a terra Gittate d 'Ilïòn le sacre torri”

(Pindemonte, Odissea, vv.1-3)

Pindemonte’s version also shows a slight tendency to expand the original (see for example the “multiforme ingegno”).

Italian translators of Homer in the XIX century are too much, maybe, to be enumerated here without producing an empty catalogue; important works were Leoni’s 1823 and Mancini’s 1824 Iliads. The latter constitutes a relevant attempt to re-introduce the rhyme in Iliad’s translations:

“L’ ira funesta del Pelide Achille

Cantami , o Dea , che fu sorgente a ' suoi Di sciagure infinite , e mille e mille …” (Mancini, Iliade, vv.1-3)

Other authors that try Homer in the same years are Bianchi and Lampredi.

In Europe, generally, the epoch of faithful prose translations had begun. In some contrast with the stylistically refined, poetic versions of the first half of the XIX century, these translations are more respectful of the content of the poem, but give up on trying to reproduce Homeric metric and verse.

In 1843, Bareste writes a famous Odyssée in prose. Although it is not completely literal, it surely tends to a higher fidelity in respect to the majority of Italian poetic translations of the first half of the XIX century:

“Muse, chante ce héros, illustre par sa prudence, qui longtemps erra sur la terre après avoir détruit la ville sacrée de Troie, qui parcourut de populeuses cités, s'instruisit de leurs mœurs” (Bareste, Odyssée, Chant I)

(16)

It could be interesting to confront it with an Italian important Odissea edited in the same year, Delvinotti’s:

“Dimmi l 'accorto eroe , Musa , che tanto Errò , poiché le sacre a terra sparse Ilìache mura , che di molte genti Visitò la città , l 'indol conobbe” (Delvinotti, Odissea, vv.1-4)

The difference is so evident we could nearly wonder whether they both had the same original at hand. Two years later, another interesting and successful Italian Odissea is written by Maspero:

“Canta , o Musa , l’eroe di vario ingegno , Che gran tempo vagò , poiché distrutta Ebbe la sacra Ilïon ; che d’infinite Genti i costumi e le città conobbe…” (Maspero, Odissea, vv.1-4).

Although many differences can be noticed between Maspero’s and Delvinotti’s interpretations - for example Maspero’s ‘la sacra Ilïon’, or the choice of translating ‘costumi’ against ‘indole’ - but the common ground, when compared with French translations, is in my opinion evident.

In 1861 it is published Pessonneaux’s Odyssée:

“Chant Muse, dis-moi ce guerrier, fécond en ressources, qui erra si longtemps, après avoir renversé la ville sacrée de Troie : il vit les cités de bien des peuples, et s'instruisit de leurs mœurs; sur la mer, il souffrit en son cœur des peines sans nombre, dans le but d'assurer et son salut et le retour de ses compagnons. Mais il ne sauva pas, même à ce prix, ses compagnons, quelque désir qu'il en eût”

(Pessonneaux, Odyssée, Chant I)

In 1863 Sommer produces an Odyssée renown to be very literal:

“Muse, dis-moi ce sage héros qui erra de longues années après qu'il eut renversé les murs sacrés de Troie, qui visita les cités et apprit les mœurs de tant de peuples”

(Sommer, Odyssée, Chant I).

“Alors, pendant toute la nuit, couvert de la toison d'une brebis, Télémaque songea dans son âme au voyage que lui avait conseillé Minerve”.

(17)

“Quand parut la fille du matin, l'Aurore aux doigts de rose” (Sommer, Odyssée, Chant II)

“Télémaque et Pisistrate étaient arrivés dans la profonde vallée de Lacédémone; ils se dirigèrent vers le palais du glorieux Ménélas”.

(Sommer, Odyssée, Chant IV).

“Ils débarrassèrent du joug les coursiers baignés de sueur, les attachèrent aux râteliers, leur apportèrent de l'épeautre mêlé d'orge blanche”

(Sommer, Odyssée, Chant IV).

The fortune of the Odyssey in this period is not only Italian or French. English translations continue to be published at régime, from Barnard’s to Palmer’s versions.

In 1866, Leconte de Lisle’s prosaic version of the Iliad has great success in France. In contrast to France, Italy doesn’t show particular enthusiasm towards the prosaic versions. Monti’s Iliad becomes an acquired standard and no new complete translation of the Iliad appears until the XX century: Manara, De Giorgi, Lanzalone and Ricci just write partial renditions, but in 1886 we have the first relevant Italian complete Odyssey in prose, Codemo’s:

“Raccontami , o Musa , di quell'uomo scaltrito , il quale andò moltissimo errando , poich’ebbe atterrata la sacra rocca di Troja ; che visitò le città di molti popoli , e ne conobbe la mente”

(Codemo, Odissea)

The cultural prominence of the Homeric poems caused the Iliad to be one of the first books translated in the newly invented language of Esperanto, by Koffman in 1895:

“Kantu, diino, koleron de la Peleido Aĥilo,

ĝin, kiu al la Aĥajoj kaŭzis mizerojn sennombrajn Kaj en Aidon deĵetis multegajn animojn kuraĝajn” (Koffman, Iliad, vv.1-3)

Important poets continue to practice their style on the Homeric poems. In England, Buttler edits a translation of the Odyssey. In Italy, great literates as Pascoli, Carducci, and later Quasimodo (but also many less known writers as Cesareo) show that the concern to find a pattern that reproduce in Italian the ancient Greek hexameter still produces important efforts: “L’ira, o Dea, tu canta del Peleìade Achille

funebre, causa agli Achei già d’infiniti dolori: ch’anime molte d’eroi si gittò innanzi nell’Hade, mentre gli eroi lasciava che fossero preda de’ cani,

mensa per gli uccellacci – di Giove era anche la voglia” (Pascoli, Iliade, vv.1-5)

(18)

XX century starts in Italy with translators of futurist/fascist inspiration, that find in the ancient classics some kind of ideal basis for the new world. In Italy, there is in that period also a somehow ‘nostalgic’ factor linked to the appreciation of classical texts - first of all Latin texts and consequently also the Greek ones, as the writer and Homeric translator Liparini affirms: “Ma in verità, per noi Italiani, il latino non deve essere una lingua morta. È la lingua dei padri Romani, è l'italiano antico dei dominatori del mondo” (Lipparini, Le Foglie dell’Alloro, 1916).

Faggella’s Iliad of 1923 is dedicated to D’Annunzio’s Fiume. In the same year Nicola Festa, that was switching from anti fascism to persuaded fascism, translates the Iliad. Ettore Romagnoli, convinced fascist and future highest representant of the classical studies during the regime, edits his own translation of Homer in 1924. In those years also other European countries produce new and important translations of Homer: for example, Pierre Mazon writes a version of the Iliad still used in France, while Spain produces maybe its most diffused Homeric translations, Segalà y Estella’s Ilìada and Odisea:

“Canta, oh diosa, la cólera del Pelida Aquiles; cólera funesta que causó infinitos males a los Aqueos y precipitó al Hades muchas almas valerosas de héroes”

(Segalà y Estella, Ilìada, 1930)

As can be seen from the first lines, the translation is not entirely faithful (for example ‘còlera’ is repeated).

The last Italian translation of the Iliad before the World War II is from Vitale (1937).

While these translators already start to affirm the importance of faithfulness to the Homeric text - Faggella for example states that he doesn’t want to “adulate” his own epoch by embellishing the original text - also the general western reception of Homer seems to have undergone important changes.

Direct attacks to Homer as the ones we saw in the XVIII century appear rarer, while stylistic features that were considered so unpleasant to be sometimes counterfeited in translation - as some nude descriptions of “low life” or the use of crude words by some characters - seem now generally appreciated: “Realism in literature can go no farther than Homer did in imparting like fo a picture of the past” (Bassett 1938).

This doesn’t mean that every translator looks for textual fidelity, as demonstrates for example the weird translation realized by Jolanda de Blasis during the war (1944) - nearly contemporary to the French version of Lasserre (1942) - with the intention to have the same style that “eruppe dal petto di San Francesco quando modulò i sublimi capitoli della sua lode di Cristo”20.

The first Italian Iliad that we have after the World War II is also the first to be still used in Italian schools after Monti’s: Rosa Calzecchi Onesti’s Iliade, created under the greetings of Pavese and generally considered to be one of the most literal. New interest in Homeric translation appears from a number of full or partial translations of the Iliad (Vitali 1950, again 20 Morani 1989

(19)

in hendecasyllables; Lussignoli 1963; Faggella 1953; Villa 1964; and the most literarily distinguished, Quasimodo’s edition of selected Homeric episodes)21.

Between 1960 and 1970 the Odyssey is particularly appreciated - the 1960s register the highest number of Italian translations until now (De Caprio 2012). The majority of Italian Homeric translations of this period (between 1950 and 1975 circa) is characterized by a declared faithfulness, a tendency to privilege the economy of the text and sometimes a slight imitation of the spoken language also for the narrator. We can see those characteristics in 1973 Tonna’s translation of the Iliad in prose:

“E la ragione fu che l’Atride non rese onore a Crise, là, sacerdote” (Tonna, Iliade, 1973)

Like many contemporary authors, he translates plainly and literally, but it is also clear that the introduction of ‘là’ carries a stylistic bias toward the spoken language.

The extreme freedom that characterizes the Homeric translations from XVI to XVIII century is far from the majority of contemporary translations of the poems. Authors that aim at transposing Homer with a similar degree of unfaithfulness generally claim to write an ‘adaptation’ or to ‘re-write’ the text. Translators, and not only Italian translators, are highly concerned with the issues of reproducing the style of the original text22. Stylistic features that were so often blamed in the XVII and XVIII century, such as the repetitions, are now “in der Homer-Philologie mainstream” (Bannert 1988) of scholastic studies and are declaratively appreciated by many critics and translators. Elements generally appreciated also from older commentators, such as the Homeric similes, are greatly exalted and considered fascinating insights in daily life from the Dark Ages23 as well as poetic masterpieces. This could be partly due to the diffusion of the oral theory about the origins of the poems, that induced many scholars to apply different standards when analysing Homer, not a writer anymore, but the name of the voice of an entire people24. But the vitality of Homeric reductions, reproductions, and the number itself of literal translations produced in the XX century seem to suggest that the aesthetic appreciation is sincere. In the last seventy years, Homer has been the classical author with the highest number of new integral translations in Italy - 22 between the Iliad and the Odyssey (De Caprio 2012). Although translations in verses are still numerous, the number of translations in prose have become relevant. Most importantly, the majority of translations in verses have abandoned the idea of reproducing the meter (rhythm and musicality) of Homeric verse, becoming in fact another kind of translation in prose. Naturally this too is a precise aesthetic choice and not necessarily a sign of increased faithfulness to the features of the original text: as those translations are more literal in the contents, they lose the metrical and musical side that is so relevant in the original texts25.

21 In 1973 Garzanti published Tonna’s Iliad, a version relatively faithful that we have largely used as

‘test’ in the aligner construction.

22 Bardollet 1993, for example, is worried that his own translation of specific epithets could have been

“une trop volumineuse périphrase” while it took eight words instead of two.

23 Wilkinson 1993, ‘Introduction’.

24 see Graziosi 2014 about Di Benedetto’s theories. Naturally, an essential text on this theme is

(20)

In the nineties, finally, we see another “Iliadic wave” in Italian translations. One of the most known is probably Ciani’s 1990, that, as can be seen by the incipit, keeps a prosaic simplicity of expression (‘figlio di Peleo’ is prefered to ‘Pelide’) but isn’t always completely faithful to the original (“l’ira” is repeated not two, but even three times in the first sentence) :

“L’ira canta, dea , l’ira di Achille figlio di Peleo , l 'ira funesta” (Ciani, Iliade)

Other important translations are Cerri 1996, Paduano 1997 and Giammarco 1997. New enthusiasm toward the classical epics is perceivable also in editorial statistics: between 1990 and 2001 the number of editors that have promoted new Italian translations of the classical epos (Iliad, Odyssey and Aeneid) is the highest in the last fifty years (De Caprio 2012).

In the last 15 years new translations of Homeric poems have continued to appear (Ventre, Marinari, Mirto are some of the translators), and the versions of the second half of the XX century have often been re-printed and sometimes transformed in ebook. Here is the incipit of Ferrari’s Odyssey, edited in 2001:

“L 'uomo dai molti percorsi , o Musa , tu canta , colui che molto vagò dopo avere abbattuto la rocca sacra di Troia, e di molti uomini vide le città , scrutò la mente”

(Ferrari, Odissea, 2001)

Homeric vitality naturally doesn’t shine only in Italian language: in the last ten years, at least ten relevant translations of Homer in English have appeared. I give here three samples of the most recent:

“Remind us, Muse, of that man of many means, sent spinning the length and breadth of the map after bringing the towers of Troy to their knees” (Armitage, Homer’s Odyssey, 2006)

“Goddess, sing me the anger of Achilles, Peleus’ son, that fatal anger that brought countless sorrows on the Greeks, and sent many valiant souls of warriors down to Hades”

(Kline, Ilias, 2009)

“The rage sing, O goddess, of Achilles, son of Peleus, the destructive anger that brought ten-thousand pains to the Achaeans”

(Powell, Ilias, 2013)

The tendency to translate Homer in prose is an international trend. As we can see from the examples I quoted, modern translations aren’t necessarily literal. Furthermore, we can see 25 There are several tentatives to find a compromise between faithfulness and musicality. One of the

most recent is Ferrari 2014, that adopts “una pentapodia accentuale in cui il numero costante dei picchi d'intensità sostituisce il numero costante delle sillabe”.

(21)

from our two last examples that there is a number of translators, we couldn’t say whether majoritarian or not, that kept the ‘virtue’ to address the θεὰ as goddess and the ‘vice’ to redouble Achille’s μῆνιν.

(22)

PART I - THE AUTOMATIC ALIGNER

1 – Alignment Background

1.1 – State of the Art

Text segmentation and alignment could figure as an old task in Machine Translation, but, as Xu 2010 writes in an interesting PhD dissertation about cross-lingual sequence segmentation, there is still much room for improvement. Articles and theses on this argument continue to be published, together with new suggestions for the “open problem” of word alignment, comparative studies between different aligners, and overviews of the field26. Since segment alignment is widely considered a necessary step to proceed toward any kind of word alignment attempt27, a good textual alignment is regarded as a very important pre-requisite for many studies about automatic word translation.

In my case, the main problem was the necessity to align long, non segmented texts with translations that are often noisy, literary and unfaithful. Very popular alignment tools as ParaConc aligner (Barlow 2002) worked poorly with such texts, and usually lost the track within the first 6 segments.

Although the specific object of this work were Homeric poems, I tried to keep the alignment algorithm as text-independent as possible, so to enable it, under certain constrains, to align texts from very different authors with no need of external databases, manually made lists, pre-tagged translations or other time honoured devices. Although sophisticated techniques exist to create lexica and thesauri from corpora28, and it would be interesting to measure the performance of similar systems over our translations, I prefered to remain on simpler approaches. A bilingual dictionary is automatically created by the program matching similar strings from a selected array, as I’ll show in detail later.

The task of aligning bilingual textual blocks is considered to be a fundamental stepstone for machine translation improvements since three decades at least. Naturally, a form of cross-lingual alignment has always been necessary for any attempt to build systems of machine translation, but the perspectives with which these alignments have been made has changed in time. Works about machine translation from more than thirty years ago often talk about the building of abstract representations29 that could, in some way, formalize the underlying 26 See Bisson and Fluhr 2000; Och and Ney 2003; Kohen et al. 2003; Mermer 2010; Otero and

Campos 2013 and Nicolas et al. 2013 for some recent contribution. The related field of sentence translation is also intensely studied (Stanojevic 2014, Kundu et al. 2014).

27 Piao 2002.

28 See for example Hoelter 1999.

29 This was in tune with the formalist perspectives of the 1970s and 1980s, that looked first of all for

abstract systems of rules to explain and reproduce language phenomena. In 1972 Gladkij Eléments de linguistique mathématique there is not even a word about statistical approaches, although they had been already invented and explored until the 1950s.

(23)

common structure of a sentence and its translations, and consequently could find the best equivalents of a given input in other languages. In his book “The Core Language Engine”, the summary of a long work started in the mid-1980s, Alshawi writes: “The point of the exercise is that the artificial representations involved should be less vague, ambiguous, and syntactically varied than natural language, making them more suitable for (...) the mappings required by machine translation”. This approach is now blamed by different linguists. As writes Kustron 2000, “An over-theoretical approach seems to have haunted Machine Translation research since its beginning”.

Consequently, real interest in text alignment rose up in the early nineties, in parallel with the new success of heavily statistical and corpus-based machine translation. While corpus linguistics is much older than 25 years ("Der Beginn der Korpuslinguistik wird traditionell auf die 60er Jahre festgelegt, weil damals die ersten elektronischen Korpora entstanden" writes Rainer 2003), many linguists speak now of a ‘statistical revolution of the late 1980s and early 1990s’30 that put the need of massive corpora before that of logic formalisms, while others, as Bouchard 1997, describe it as a minimalist shift toward more economic theories. Anyway, it was a relatively slow process and still in 1991, approaches to Machine Translation were more rule-based or principle-based than statistically based - see for example Berwick et al. 1991, “Principle-Based Parsing”.

The first important works on sentence alignment are generally considered to be - after the pioneer effort by Brown in 1988 - Brown 1991 and Gale and Church 1993. Brown in his 1991 work suggested to align sentences composed of a similar number of words. This idea has inspired the latter article, Gale and Church 1993, that based its alignment heuristics on the principle that the original and translated sentences would both be of similar length, but in characters rather than in words. “This seems uncontroversial and turns out to be sufficient information to do alignment, at least with similar languages and literal translations”31. The length heuristic is still widely used for a number of alignment purposes32.

Naturally, what proves to be problematic for my task if I were to follow this approach is the need for a very literal translation, a condition hard to satisfy in Homeric tradition. The necessity of either clear or literal corpora, or both, is common to many alignment techniques, as Singh and Husain 2005 point out. Furthermore, Gale and Church first algorithm is designed to work on pre-aligned paragraphs.

Anyway, Church himself shows, in 1993, how the methods based on sentence length can only work well for clean texts, and not noisy documents or OCR outputs. So he presents another approach, based on finding cognates in the text. Cognates are words that remain relatively similar in different languages, as ‘Germany’ - ‘Germania’. He argues that this method can be useful for any language that is written in the roman alphabet, since texts usually contain a high number of names and dates - elements that maintain a degree of similarity through languages - and he goes further arguing that it could be even used for texts written in non-Roman alphabets, provided such texts have a reasonable distribution of 30 This specifical expression is taken from Blackburn & Bos 2005.

31 Manning 1999, “Foundations of Statistical Natural Language Processing”. 32 Singh and Bandyopadhyay 2010 still recommend this straightforward heuristic.

(24)

numbers and names in Latin characters. The technique we adopted to align Ancient Greek and modern European languages is inspired under some aspects by such method.

Also in the same year, 1993, Kay and Röscheisen tried a more sophisticated and computationally intensive technique that works alternating sentence and word alignment repeating two basic operations in this way: in the first step, sentences are aligned; in the second, words with similar distributions are considered translations of each other and thus used as new anchor words to improve the first path. At first, the aligned sentences are only the first and the last. My work uses some techniques similar to Kay and Röscheisen’s (for example the individuation of words with a similar distribution as translational pairs) but it is also true that their method is highly expensive and probably not very robust for free, unfaithful, literary translations.

Later approaches include the use of a pre-determined bilingual lexicon for very distant languages (Wu 1994) and the use of manually identified cognates, a line of thoughts I will generally follow in the first part of this work, with the difference that I don’t require the existence of a pre-compiled bilingual lexicon, since the dictionary of anchor words is automatically generated.

Fung and McKeown 1994 present an interesting algorithm, called DK_vec, designed to find word pairs in completely unrelated languages. The idea is to assign to each term a vector whose components are given by the number of words interluding each occurrence of the term. So if a term appears as the first word of a text and then it appears again after 90 words, its vector will be (0,90). Fung and McKeown argue that in parallel corpora - also noisy, linguistically unrelated parallel corpora - many words will have vectors similar to the vectors of their translations. This approach was inspired by a technique used in signal processing, and represents one of those cases where speech processing methods are transferred to text-based computational linguistic tasks33. This language-independent approach is very interesting for my purposes, but it is based on the assumption that the translations used are relatively faithful. Although it can, actually, find pairing words in fairly noisy texts too34, on the other hand if the translator varies the recurrence of a single term too much, using synonyms or ellipses - if, in other words, he wants to create a translation with a different style - this kind of distribution vectors could soon become unreliable.

In an almost contemporary article (Fung and Church 1994) is proposed a method similar to a technique we will use later in this work to find single word translations. Such method is based on counting how many times a word recurs in a corpus splitted in N parts, and then to find in the translation, which must also be split in N parts, the words with most similar recurrence. Naturally this system works on the hypothesis that the two corpora (original and translation), once splitted in N parts, will give roughly similar blocks with roughly similar contents, which is not always the case with our translations.

More recent alignment strategies, as Haruno and Yamazaki 1996, tend to rely on tools and resources developed for rich, standard languages, such as Japanese or English. In the cited 33 It happens more rarely than could be imagined. “There continues to be a division between signal

processing courses, normally taught in electronic engineering degrees, and computational linguistics, directed to students of computer science or linguistics”. (Coleman 2005).

34 Small differences in the orthography of the same words due to noisy transcriptions could be

(25)

article, the authors at first rely on a good Part of Speech tagger to filter potential anchor words and then on big, reliable online dictionaries to create the actual pairs. Chen 1996 bases his deductions on a statistical pre-built translation model while Melamed 1999 uses a sophisticated token-matching system through pattern recognition and segment boundary information.

Bilingual text alignment gives way to a vast number of multilingual applications, as bilingual lexicography (Langlais et al. 1998, concerned also with alignment evaluation; Klavans and Tzoukermann 1995, proposing a system to integrate dictionary and corpus-extracted informations about bilingual pairs), cross-language information retrieval (Grefenstette 1998, one of the firsts to address cross-language information retrieval over the Web resources; Hull and Grefenstette 1996, a dictionary-based experiment about multilingual queries; Nie et al. 1998, a probabilistic model trained over parallel corpora and augmented with a dictionary), machine translation (Gildea 2003 uses a syntactic-tree approach to bilingual alignment; Och et al. 1999 and 2003 use both a language and a translation model), automatic translation verification (Macklovitch and Hannan 1996), corpus construction (Gale and Church 1991, about the construction of sentence-aligned parallel corpora) and terminology research (Dagan et al 1993 try to find composite terms translations in parallel corpora)35. Block alignment is considered so important because fine-grained algorithms as Giza++ work well on small blocks of text with equivalent meaning, but as soon as long undivided texts are given to them, problems show up.

Using anchor words is an efficient and popular heuristic. While it sometimes is considered conservative in its approach, it is still frequently used for a variety of tasks. It is used for text segmentation, but also for sentence alignment even in already paragraph-aligned corpora, as in Xu et al. 2006. A relatively recent paper that shares some similarities with my work, first of all in the use of anchor words for aligning corpora, is Feng and Manmatha 2006, although its main scope - identification of OCR errors - makes its main heuristics and practices of little use to my task. Using anchor words brings also its own limitations, such as the different ways different authors can translate a single word and the necessity to provide those anchor words from external databases or manually constructed lists.

As I'll show later in further details, the choice of the right names as anchor words is one of the best ways to overcome the first problem. In order to overcome the second problem, that can prove to be harder to solve, I tried to automatically build an anchor words dictionary. More sophisticated techniques in textual aligning, such as the machine learning algorithms used in automated text categorization or in automated document classification, could have been used, but, if not of a bilingual set of anchors, they would nevertheless be in need of external tools to find anchor words and similar patterns, like wordnets, thesauri or even treebanks36. Moreover they risk to become computationally heavy when analyzing large texts. We both didn't want and could not use external thesauri and wordnets in the alignment phase, 35 For a good synthesis of these applications see Bunt 2004, Chapter 1, “New Developments in

Parsing Technology”.

(26)

because we hoped to keep the procedure independent from external tools and because we didn't have, at that stage, a complete Ancient Greek WordNet or similar resources at our disposition.

Naturally, many procedures acquired in machine learning and machine translation assume either the use of at least one rich language as pivot – a rich language in this sense is a language that has a great number of already made NLP tools and neither Homeric Greek nor XVIII century Italian are of this genre – or the presence of vast amounts of data and large corpora. Furthermore, I was looking for something that, with some changes, could be easily re-used for other authors, maybe even for other romance languages beyond Italian, without the need of a new training: in fact, my aligner doesn’t require training data at all.

The problem of aligning multiple translations to their common original has been widely studied also in the field of automatic paraphrase generation. Barzilay 2003 considered the different translations of a common text as a big source of “naturally occurring paraphrases”. Barzilay and McKeown 2001 extracted paraphrases from sentence aligned translations using Gale and Church heuristic, while Pang et al. 2003 used a syntax-based algorithm to the same purpose. A recent overview of this ‘parallel’ research, together with an interesting method of extracting word level paraphrases from aligned parallel corpora, is given in Challison-Burch 2007. A recent and interesting contribution about this topic is Barancíková and Tamchyna 2014, that expressly proposes to consider paraphrasis as a ‘monolingual translation’.

Sometimes, the issue has also confronted in the field of Word Sense Disambiguation37, where parallel data are used to disambiguate polysemous words in a language through the aid of its translations. Gale et al. 1992 is a case of study of single word polysemy through parallel translations. They aligned the Canadian Hansards (and English-French parallel corpus) sentence by sentence and tagged the sentences where a given English polysemous word was translated with different French words: for example, they tagged differently sentences where duty was translated with devoir and sentences where duty was translated with droit. After that, they trained a Naive Bayes classifier to automatically disambiguate English word senses looking at the context. The method can be presented as a supervised machine learning where the ‘labels’ are already present in the training corpus itself: they are the different translations of the polysemous word. Naturally, this approach is not very robust when faced with data sparsity and can work only for words frequent enough to provide reasonable context distinctions to their different senses (Charniak 1993).

Alignment algorithms and heuristics are also studied in the vast field of Speech Processing38, but their approaches, that generally include Hidden Markov Models and systems as the Viterbi alignment, seemed to me, although extremely interesting, not particularly useful to my specific purposes. Anyway, there are also several cases of use of HMM and Viterbi for textual alignment.

37 Word sense disambiguation is extremely useful for a number of NLP tasks as event extraction and

hierarchical clustering, since they are often based on ‘triggering words’ (Mehryary et al. 2014).

Riferimenti

Documenti correlati

It can be noted that the thermal performance of the countercurrent microHEX, in terms of overall heat transfer coefficient, is strongly influenced by the thermal

A questo punto siamo di fronte a diverse strade: diventare dei tecnici del progetto di restauro e competere con altri tecnici acquisendone gli strumenti, essere coordinatori

We developed INSPEcT −, a computational method based on the mathematical modeling of premature and mature RNA expression that is able to quantify kinetic rates from steady-state or

La conformazione delle coste liguri è certamente responsabile del­ la povertà di approdi, ma le vicende umane giocarono un ruolo non meno importante. Il lento

The intervention was carried out in Italy in 2013-14 within the wider context of the European project e-Engagement against violence and involved about 25 students who were engaged

Scartabelli T, Gerace E, Landucci E, Moroni F, Pellegrini-Giampietro DE (2008) Neuroprotection by group I mGlu receptors in a rat hippocampal slice model of cerebral ischemia

Predicted signal and background yields, as obtained from the fit, compared to observed data in the three-lepton channel for events containing at least three jets and at least one