Friday, October 18, 2013

The Georgetown-IBM Experiment: The Rise Of The Machine Translations

Though a fairly touchy subject amongst professional translators, machine translation is a field that has always interested us here at The Lingua File. Machine translation had been theorised before the 1950s but today we'll be looking at one of the first forays into the field.

In the 1950s Soviet-American relations were poor, as they were during much of the late 20th century. The Russian language was of particular interest to the Americans, and though professional translations were available, there were concerns that human translations were subject to political bias and interference.

The concept of machine translation had been suggested as early as the 17th century by philosophers René Descartes and Gottfried Wilhelm Leibniz. However, it was a discussion between Warren Weaver and Andrew Booth in 1947 that suggested that natural languages could be translated via the use of a computer.

Between the late 1940s and early 1950s, several experiments in machine (or mechanical) translation were conducted. However, these experiments were limited, used punched card systems, and were hardly groundbreaking.

Leon Dostert, a translator who had worked with American president Dwight D. Eisenhower during the war and had acted as a liaison officer for Charles de Gaulle, was invited to a conference on mechanical translation at MIT in 1952.

Though Dostert was sceptical of the potential of machine translation, by the end of the conference he was convinced there was a future in the field. He did doubt the capable scope of machine translation and preferred experimental methodologies over theoretical approaches to the field.

Dostert had discussed with several other linguists whether or not machine translation was a viable aim, and following the feedback that it was, set out to complete work in machine translation.

Convinced that a small-scale experiment could prove fruitful, Dostert contacted IBM founder Thomas J. Watson, a close friend, to collaborate. The IBM 701 machine that had been released the year previous was used and the programming was written in machine code, a programming language that gives instructions directly to the machine's Central Processing Unit or CPU. IBM chose Peter Sheridan to complete the task of writing the code for the experiment.

White-Gravenor Hall, Georgetown University.
Given that translating from Russian would be the best choice for the experiment since German was no longer considered the language of the enemy and information coming from Soviet Russia was limited, Dostert believed that another language expert was needed.

He found help and a collaborator in the form of Paul Garvin, a lecturer from the Institute of Languages and Linguistics at Georgetown University in Washington D.C., which was in fact set up by Dostert himself.

Garvin was an expert in Russian, as well as many other languages. He was born in Karlsbad, Czechoslovakia and had emigrated to the US in 1941. He and Dostert decided to test various expressions and phrases from organic chemistry and a few general phrases for their machine translation.

As decided by Dostert, the lexical database was very small, containing only 250 words and six grammatical rules. However, the aim was to show the application of machine translation when it came to morphological and grammatical problems, rather than provide vast quantities of word-for-word translations.

The experiment was such a success that it was widely published in mainstream newspapers such as the Los Angeles Times, the New York Herald Tribune, and the Washington Herald Tribune as well as scientific journals and publications. The story later found its way into local and regional newspapers and excitement was so high that the authors of the experiment claimed that the problem of machine translation would be solved in a matter of three to five years.

Though the estimate appears to be miles from the truth, the Georgetown-IBM Experiment raised the expectations of machines to translate natural languages and made machine translation a potential solution to the wonderfully beautiful and complex problem of translating languages.

Monday, October 14, 2013

Fiesta Nacional de España: The Languages of Spain, Part 2

On Friday we started our look at the languages of Spain as Saturday was the Fiesta Nacional de España, Spain's national day. We covered two of Spain's regional languages, Catalan/Valencian and Galician, and today we'll be continuing our linguistic journey with more of the recognised, regional, and immigrant languages found in modern-day Spain.

Regional Languages

Basque

Whilst only spoken by around 1% of Spain's population, the Basque language, known as Euskara in Basque, is interesting as it's a language isolate. We discussed the interesting nature of Basque as a language isolate back in May. Since Basque is a language isolate, it is also the only non-Romance language with official status in Spain.

Aranese

Aranese is considered to be a dialect of Gascon, a type of Occitan. However, it does hold co-official language status in the Spanish region of Catalonia along with Catalan. It was given co-official status as recently as 2010 and, as a result, is the newest of Spain's regional co-official languages.

Today there are only around 5,000 speakers of Aranese, and whilst there are estimated to be between 100,000 and 800,000 speakers of Occitan, Aranese is not considered endangered as the language has seen somewhat of a revival since it has been taught alongside Spanish in schools since 1984.

Aragonese

Though the name may sound similar to Aranese, the Aragonese language is another distinct Romance language with around 10,000 speakers. It doesn't hold co-official language status but it is recognised as a language native to the region of Aragon.

Astur-Leonese

Astur-Leonese is a group of mutually intelligible languages. Central Asturian is the principal dialect of the Astur-Leonese languages and is spoken natively in central Asturias by around 100,000 people and understood by around 450,000 people. Western Asturian, also known as Leonese, is spoken in western Asturias as well as Castile and León, principally in the province of León.

The language is also present in parts of Portugal, where the Mirandese dialect is used. There are around 15,000 people who speak this form of Astur-Leonese.

Due to its geographical proximity to other languages, Astur-Leonese also has various dialects that are considered transitional languages between Astur-Leonese and other Romance languages. As you approach Galicia, you are more likely to encounter Galician-Asturian, which has been proposed to be either a dialect of Galician, a dialect of Astur-Leonese, or its own distinct language, known as Eonavian. Eonavian is spoken by around 45,000 people.

Sede de Caja Cantabria, Santander.
In Cantabria, the transition language between Astur-Leonese and Castilian Spanish is known as montañés, or Cantabrian. Only 3,000 people are considered to be speakers of Cantabrian.

In the region of Extremadura, there is another transitional language between Astur-Leonese and Spanish, but here it is known as Extremaduran or estremeñu. This version of the language is said to have around 200,000 speakers, but it is very difficult to measure as there is not a clear consensus on the boundary between the Spanish spoken in Extremadura, castúo, and estremeñu.

Immigrant Languages

Spain has various speakers of immigrant languages principally owing to immigration across Europe, and because of Spain's historic empire, many speakers of Latin American Spanish. Languages such as Arabic, Romanian, English, French, German, Italian, Bulgarian, Chinese, Portuguese, and Javanese are also spoken by immigrant populations and communities.

Part 1 | Part 2

Friday, October 11, 2013

Fiesta Nacional de España: The Languages of Spain, Part 1

Today we'll be looking at the languages spoken in Spain as tomorrow is the country's national day, known as the Fiesta Nacional de España in Spain's principal official language, Spanish. Since we covered the Spanish language as one of our first language profiles, we felt today would be best suited looking at the other languages spoken in Spain.

Regional Languages

Since Spain is made up of autonomous communities, certain languages, particularly those native to a certain region, can hold co-official status with the national language, Spanish, also known as Castillian Spanish.

Catalan/Valencian

The Catalan language is principally spoken in the autonomous community of Catalonia, known as Catalunya in Catalan. The Catalan language is a descendant of Vulgar Latin, which was spoken in the regions surrounding the Pyrenees during the time of the Roman Empire.

As a relative of Occitan, which is principally spoken in France, Catalan shares more similarities with other Gallo-Romance languages such as French and Italian than it does with its geographical neighbours on the Iberian peninsula, such as Spanish and Portuguese.

The Mediterranean coast seen from Vinaròs, a town in the
Valencian Community near its border with Catalonia.

In the Valencian Community, the language is known as Valencian, or to use its endonym, valencià. This has been subject to much debate amongst those in the Valencian Community and Catalonia and as it stands, both Catalan and Valencian are considered the same language and different languages. Some linguists even believe that they are immensely similar languages that just so happened to evolve identically side-by-side and become mutually intelligible, though we're a bit sceptical of that last one.

The Catalan/Valencian language has a total of 7.2 million native speakers and the regions where it is spoken are home to some of the highest levels of bilingualism in Europe, not to mention being the largest communities where the main spoken language is not a national official language.

Galician

Galician, which is another language that may or may not be a language, is spoken principally in Galicia. As a close descendant of Portuguese or arguably a dialect of the language, Galician shares many qualities with the Portuguese language.

In the 13th century, the language known as Galician Portuguese diverged to become what some linguists say is now the Portuguese language and the Galician language. Other linguists believe that the two are part of a dialect continuum that includes Galician, Portuguese and rural dialects of both languages which are mutually intelligible between one another.

In Galicia, around 58% of the population are said to speak Galician as their first language, while over 3 million people are said to speak the language natively worldwide.

On Monday we'll be back with more regional languages of Spain and some of the prominent immigrant languages that have shaped the culture, history, and modern lifestyle of the country.

Part 1 | Part 2

Friday, October 4, 2013

German Unity Day: The Languages Of Germany, Part 2

On Wednesday, we were looking at the events that led up to the reunification of Germany. Though we didn't get onto the languages of Germany, we did enjoy looking at the rich and interesting landscape of contemporary German history. Today we're straight back into languages as we look at the languages of this fascinating nation.

Of course, German is the principal and official language of Germany with over 95% of the population speaking German as their first language. Statistics for Northern Low Saxon are also included as part of Standard German, though Northern Low Saxon is considered a recognised regional language in Germany.

Recognised Minority Languages

Romani

The Romani languages consist of seven distinct varieties: Balkan Romani, Baltic Romani, Carpathian Romani, Finnish Kalo, Sinte Romani, Vlax Romani, and Welsh Romani. In total, Romani languages have around three million speakers.

Sinte Romani is the variety found in Germany, and is spoken by around 80,000 people. There are estimated to be around 320,000 total speakers spread across Germany, France, Austria, and Italy. Interestingly, Sinte Romani is heavily influenced by the German language and is not mutually intelligible with the other varieties of Romani.

Sorbian

The Sorbian languages are spoken by a group of 50,000 Slavic people known as the Sorbs. The two varieties, known as Upper Sorbian and Lower Sorbian, are spoken in Saxony and Brandenburg respectively. 40,000 of the speakers reside in Saxony and speak Upper Sorbian, whereas the remaining 10,000 are speakers of Lower Sorbian in Brandenburg.

Sand dunes on the island of Sylt, one of the North Frisian
Islands in Germany's state of Schleswig-Holstein. 
Danish

The Danish language can be heard in the northern region of Schleswig-Holstein, the German region that unsurprisingly borders with Denmark. Only 0.1% of the population of Germany are speakers of Danish. However, this amounts to around 50,000 people.

North Frisian

The Western Germanic language of North Frisian is spoken by around 10,000 people in Germany, principally in the Schleswig-Holstein region where we encountered Danish. Naturally, North Frisian is related to West Frisian, which is spoken mainly in the Netherlands.

Other Regional Languages

There are several other languages that are native to particular regions in Germany. Languages such as Limburgish, Luxembourgish, Alemannic German, Bavarian, and Low German. Many of these are considered to be dialects of either German or Dutch, or precursors to the modern variant of German spoken in the country today.

Immigrant Languages

Due to immigrant populations, Germany has sizeable populations for whom German is not the main language. This includes speakers of Turkish, Kurdish, Russian, Arabic, Greek, Dutch, Igbo, Italian, Polish, Serbo-Croatian, and Spanish.

Part 1 | Part 2

Wednesday, October 2, 2013

German Unity Day: The Languages Of Germany, Part 1

In preparation for German Unity Day, the day honouring the unification of West Germany and East Germany in 1990, we're going to be looking briefly at the history prior to this event. On Friday we'll be looking at the languages spoken in Europe's most populous country.

US tanks face-to-face with Soviet tanks at Checkpoint Charlie
during the Berlin Crisis.
Following the Nazi defeat in Germany during WWII, Allied Forces occupied a large portion of Western Germany. The US held Bavaria and Hesse in the south, France held a portion of the regions in the southwest and the British Zone of Occupation was in the northwest. The remaining regions were occupied by Soviet forces and from 1945-1949 Germany would remain divided between the Allies and the Soviet forces, exacerbating tensions between the West and the Soviet  Union.

From 1949 the American, French, and British zones of occupation were unified as what was known as West Germany or the Federal Republic of Germany. The Soviet Zone of Occupation would become East Germany or the German Democratic Republic.

The German capital, Berlin, was divided between the four nations, with East Berlin held by Soviets and the western parts of the city under the occupation of Allied forces split between the US, the UK, and France. Though Berlin's location meant that it was entirely within the Soviet Zone, it was never considered wholly as part of East Germany.

The Brandenburg Gate, 13 August 1961.
The day the Berlin Wall was erected.
From 1949 to 1990, Germany and Berlin remained divided thus. The erection of the Berlin Wall, which started in 1961, put great strain on the relations between both East Germany and West Germany, not to mention between the West and the Soviet Union.

The Berlin wall spanned 155 kilometres (96 miles) along the border of East and West Berlin. The wall effectively stopped all immigration from East Germany to West Germany until 1989 when, though the physical wall still stood, East Germans were allowed to pass into both West Berlin and West Germany.

The German people began chipping away parts of the wall and eventually, in 1990, the physical wall began to be torn down. Amidst the fall of one of the world's most fierce representations of separation and isolation between the West and the East, the movement for German reunification gained great momentum.

As you know, the reunification of East and West Germany was formalised on 3 October 1990. We'll be back on Friday with our look at the languages of Germany.

Part 1 | Part 2

Friday, September 27, 2013

September 27: World Tourism Day

As today is World Tourism Day, we felt it only apt to recognise one of the greatest benefits of learning languages, travelling the world.

The United Nations World Tourism Organisation has celebrated this day since 1980 despite the day being decided ten years earlier in 1970. World Tourism Day is all about raising awareness of global tourism and promoting the ways in which tourism improves the world. The benefits of global tourism can be seen in many elements of life, be they social, cultural, political, or economical.

Algorrobo, Chile. Sun, sea, sand, and,
most importantly, Spanish.
Over the years, World Tourism Day has had various themes, including world peace, development, education, job creation, ecological sustainability, tourism for sport, heritage preservation, and the tourism industry itself.

Aside from the aforementioned benefits, we can't ignore the huge benefits world tourism lends to learning languages. If, like us, you speak English as your mother tongue, then you will be more than familiar with being spoken to in English whilst on holiday. If you're in a particularly touristy area, then it's very likely that the reason the locals speak to you in English is because of the high level of tourism in the area, combined with the fact that many others in the world speak English as a lingua franca and that many native English speakers, particularly those from the US and the UK, are embarrassingly monolingual.

On one hand, you find that many of those who grow up in tourist areas have improved linguistic abilities compared to those who have not. On the other hand, many of those who have spent long periods of time abroad tend to have better linguistic abilities than those who have never left home.

So celebrate world tourism! Everybody deserves a good holiday, especially if you can learn languages doing it!

Monday, September 23, 2013

How To Be An English Language Tourist? by David Crystal

The Lingua File is delighted to have David Crystal as our guest contributor today as he tackles the question, "how to be an English language tourist?":

Hilary and I asked ourselves this question repeatedly when we were planning the tour that we eventually wrote up as Wordsmiths and Warriors: The English-Language Tourist’s Guide to Britain. Where can you find out about the places that influenced the character and study of the English language in Britain? How do you get there? And what do you find when you get there?

Places are often mentioned in textbooks and historical accounts, but you can get only so much out of such drab statements as 'the Anglo-Saxons arrived at Pegwell Bay in 449 AD', or 'King Alfred defeated the Danes at Edington in 878', or 'Dr Johnson compiled his dictionary in the attic of a house in Gough Square in London'. For textbook writers, that is usually the end of the story. For us, it was the beginning. What was that coastline like? What was the battlefield like? What was the attic like?

Pegwell Bay, Edington, Maldon, Lindisfarne, Lichfield, Stratford ... We went to over 50 places where something important happened. Most of the time, we found that the relevance of the language to the place had been forgotten - if it had ever been realised. But there are a few spots where it is remembered. There is even the occasional monument. Our favourite is the memorial to English dialect-writers in Rochdale, Lancashire. A runner-up is the huge monument to Bible-translator William Tyndale, in North Nibley in Gloucestershire - though 'runner-up' is perhaps not the best way of describing it, as it is is on the top of a hill which takes some climbing.
The dialect writers' memorial in Broadfield Park, Rochdale. The building to the left is the
town hall. © Hilary Crystal.
That's a point. If you want to be an English-language tourist, you have to be fit, or reasonably so, as some of the places where important things happened involve a bit of a walk, and sometimes over quite muddy and hilly countryside. So you should take boots too. But the outcome is always worth it. Even though I thought I knew some of the places very well, from my past reading and writing about the language, I was never prepared for what we found when we made the actual visit. The photographs often tell the story better than the words, and are an essential part of the narrative. It confirmed me in my feeling that the English language is not only diverse and fascinating, but unpredictable and exciting as well. For instance...

In Jarrow, up in the north-east of England, where Bede worked and wrote, we were not expecting to encounter a class of mini-monks all dressed in tiny habits. In Alloway, Scotland we were not expecting to see the worship of Scots national poet Robert Burns extend to his being portrayed in a mischievous re-creation of Da Vinci's 'Last Supper'. In Old St Pancras churchyard in London, we were not expecting to find piles of gravestones to be part of the story of pronunciation lexicographer John Walker. In York, we were not expecting to find the aftermath of lead-thieves, when we visited the places where Lindley Murray wrote his grammar.

Murray's summerhosue at The Mount School, York. His writing desk and wheeled invalid
chair are preserverd in the school. When we visited, the lead from the roof had disappeared
for the third time, hence the temporary tarpaulin flapping dismally here. © Hilary Crystal.
With locations as far apart as the south-east of Kent and the Scottish lowlands, and from the west of Wales to the East Anglian coast, Hilary and I drove several thousand miles to compile what proved to be a somewhat unorthodox combination of English language history and travelogue. It was a hugely rewarding experience, though, which added a strong sense of place to our existing knowledge of language topics and personalities, and we strongly recommend doing the same sort of thing in your own locality, wherever you live, as a powerful way of making language study come alive. Field trips are not just for historians, geographers, and archaeologists. The English language lurks around every corner, in every country in the world, awaiting your call.

David Crystal is known throughout the world as a writer, editor, lecturer and broadcaster on language. ‘Wordsmiths and Warriors: The English-Language Tourist’s Guide to Britain’ by David and Hilary Crystal is published on 26 September 2013 by Oxford University Press.