F.Chessa & G.Brelstaff 1
Going beyond Google Translate?
Francesca Chessa
DLS, University of Sassari, Italy
Gavin Brelstaff (gjb@ crs4.it)
CRS4 09010 Pula (CA) – Sardinia, Italy
CHItaly2011, Alghero September 2011
Ciclo 2012 di seminari interni CRS4
29/02/2012
F.Chessa & G.Brelstaff 2
today
http://bioinformatics.oxfordjournals.org/content/25/2/258.full
F.Chessa & G.Brelstaff 3
F.Chessa & G.Brelstaff 4
http://videolectures.net/w3cworkshop2011_brelstaff_interactive/
May 2011
F.Chessa & G.Brelstaff 5
poetry 2011
F.Chessa & G.Brelstaff 6
Healthy Living
Fiction Poetry
poetry corner
F.Chessa & G.Brelstaff 7
Healthy Living
Fiction Poetry
poetry corner
F.Chessa & G.Brelstaff 8
Healthy Living
Fiction Poetry
poetry corner
F.Chessa & G.Brelstaff 9
Healthy Living
Fiction Poetry
poetry corner
F.Chessa & G.Brelstaff 10
Healthy Living
Fiction Poetry
poetry corner (UK, 2011)
F.Chessa & G.Brelstaff 11
The Waste Land iPad app earns back its costs in six weeks on the App
Store
The Guardian 8 Aug 2011
F.Chessa & G.Brelstaff 12
Alghero app - html
F.Chessa & G.Brelstaff 13
Echo Chamber –
natural resonances
F.Chessa & G.Brelstaff 14
F.Chessa & G.Brelstaff 15
He was the cat that walked by himself and all places were alike to him.
Kipling
Internet economy - www
Genius loci the
creative spirits of place
– not just geolocation.
Minority language a
seed-bed
for poetic expression,
- not just ICT communication.
Sardinia
Whenever we lose a language the
“genetic basis” for such expression
diminishes, globally
F.Chessa & G.Brelstaff 16
This talk:
Alghero
Poetry
HCI
F.Chessa & G.Brelstaff 17
Screenshot
F.Chessa & G.Brelstaff 18
Resulting alignment
F.Chessa & G.Brelstaff 19 Una llavor de mela
és caiguda a un fos
entre la terra negra.
Language Barrier
writer reader
Echo Chamber
(ear,tongue,thought,eye)
F.Chessa & G.Brelstaff 20 Una llavor de
mela és caiguda a un fos entre la terra negra.
Parallel text alignment ↔ to communicate semantics
• standards-based markup (TEI,XML), html delivery
• going beyond GoogleTranslate →
An almond seed into furrow fell
amongst dark earth.
An almond seed into furrow fell
amongst dark earth.
Una llavor de mela és caiguda
a un fos
entre la terra negra.
fell
és caiguda
Translator
writer
reader
F.Chessa & G.Brelstaff 21
Poetry is an
extreme challenge GoogleTranslate:
is pitiful when read aloud as poetry Echo Chamber
(ear,tongue,thought,eye)
F.Chessa & G.Brelstaff 22
We are not doing statistical machine translation (SMT) here
A human translates each poem and marks up the
equivalances at three different levels: word, phrase, idea.
Spatio-visual cues are thus obtained.
HCI and SMT
word
phrase
idea
F.Chessa & G.Brelstaff 23
Spatio-visual cues are thus available.
Q: Can such cues provide an intermediate representation that can be usefully manipulated by SMT?
Q: Might an algorithm compute emergent semantic context from them?
HCI and SMT ?
F.Chessa & G.Brelstaff 24
és caiguda
fell
Out In
Algorithm Common spatio-visual representation
Emergent semantic context ?
Translator
poin t &
clic k
Semantic context
view
HCI and SMT ?
F.Chessa & G.Brelstaff 25
HCI: design factors
• non-verbal interface – non-linguistic thinking
• faciltate smooth switching of gaze between parallel texts via the secondary highlighting
• text-selection by point & click at words
• see through cursor-sprite when on text
Translator
poin t &
clic k
Semantic context
view
F.Chessa & G.Brelstaff 26
interactive alignment: steps
• atomic segmentation {word,phrase,idea}
• restoring state
• select by click
• merge operation
• alignment between parallel texts
• split operation
• review
jQuery implemention
F.Chessa & G.Brelstaff 27
Demo (a desktop browser: IE8-9,FF3-10,Opera11,Chrome,Safari)
F.Chessa & G.Brelstaff 28
Demo:
selection by click
F.Chessa & G.Brelstaff 29
Demo:
selection & alignment
F.Chessa & G.Brelstaff 30
Presentation Content Structure Semantics
eXist XML db
XML CSS
XHTML
not RDF
Unicode
http put
REST/ajax
DOM Javascript
jQuery
I n t e r a c t
TEI-p5
XMLSchema
XSL
XQL
not w3cRange
now go mobile...
F.Chessa & G.Brelstaff 31
Text Encoding Initiative – the TEI way
F.Chessa & G.Brelstaff 32
Conclusion
• Alghero html app for aligning parallel texts
• genric multilingual web application
• text highlights = spatio-visual cues
• intermediate representation for semantic context?
• poetry: extreme challenge for SMT
• poetry: a new market?
F.Chessa & G.Brelstaff 33