The Lingua File from TLF Translation: Technology

Showing posts with label Technology. Show all posts

Monday, September 26, 2016

Retronyms: Renaming the Past

Languages evolve over time. The words we use change, as does the way we use them. Today I'd like to take a look at retronyms, which are created when we rename something from the past because something newer is now the most common usage of a particular word. Here are a few of the most common reasons for and examples of retronyms.

Technology

A reel-to-reel. It was originally known as a tape recorder,
until modern tape recorders came about.

Technology is often responsible for the creation of retronyms. Nowadays almost everything is digital, while previous technology was analogue (without the ue if you're from the US). Before digital technologies, things like clocks and watches were just that, clocks and watches. Now, with the advent of digital clocks and watches, it is common to say an analogue clock or an analogue watch in order to differentiate.

Before email, we simply had mail. Now, you might hear people refer to the sending of letters, cards, and packages as snail mail (as it is much slower than email).

As automatic systems became increasingly common, use of the term manual became necessary. In the UK, most of the cars we drive are manual, but in the US, cars are often automatic, making the distinction necessary.

Landline phones were just phones before we had mobile phones. With smartphones becoming more and more common, are we going to start calling older models dumbphones?

Media

The way we refer to media changes as we develop newer technologies. For example, all films used to have no sound. Once films had sound, those without became silent films or silent movies.

Now that films are almost always in colour, a lot of older films are said to be black and white. Likewise, what was once just animation is often called traditional animation to differentiate it from computer animation.

Numbers

Anything with a sequel or later numbered version often gets a retronym. For example, the first Star Wars film was originally called just Star Wars. Now it's Star Wars Episode IV: A New Hope, as Star Wars became the title for the entire series.

Other examples include video consoles and computers. The original PlayStation is often referred to as the PlayStation 1 or PS1 to differentiate it from the three subsequent versions released, numbered 2, 3, and 4, obviously.

Classics

Newer versions of things often mean we call the first version the classic version. Remember Coca-Cola's failed attempt at New Coke? Me neither. However, when the company's new version of Coca-Cola failed, they were forced to bring back the old version, which became Coca-Cola Classic or Classic Coke.

Historical Events

I remember studying World War I and World War II in school. However, for the poor souls living through the first of these tragic events, it was just referred to as The Great War, it only became the First World War after we continued to make the same mistakes again. Let's pray there's never a third.

Languages

In the UK, we speak British English. Previously, this was known simply as English until it became necessary to differentiate between British, American, and other varieties of English.

These are just a few examples of retronyms. Which are your favourites? Can you think of any possible future examples, such as non-virtual reality, for example? Tell us your thoughts in the comments below.

Friday, August 12, 2016

How Crunchyroll Gets Subtitling Right

Last year I wrote a post about the poor quality of subtitling on Netflix and am sorry to say that the same problems and frustrations continue to bug me. I've watched entire shows riddled with subtitles whose content is just nonsense.

It should read "And I even got that award off those feminists"

Netflix's subtitles for the British sitcom The IT Crowd were so awful that I can only imagine that they may have been automatically generated, not checked over, and subsequently just thrown onto the bottom of the screen.

YouTube should also get a special mention for subtitling quality. However, even though a lot of YouTube videos use automatically generated subtitles, the platform is kind enough to tell you they are and you don't have to pay a subscription for it like you do with Netflix.

However, the purpose of today's post isn't to name and shame bad subtitling (even though I just did), it's to praise Crunchyroll, a streaming service for anime, whose subtitles look like they were lovingly created and carefully implemented into shows.

If you don't watch anime, then you're probably not familiar with the platform Since all its shows are from Japan with Japanese audio, with the exception of a few dubs, a lot of subtitling goes on and they do it so well.

It's important to remember that Japanese uses a different writing system to English. One of my complaints with Netflix was that the Japanese text in scenes is often left untranslated. On Crunchyroll, not only are the subtitles placed over the Japanese text, but they also use same colouring as the original Japanese text, which makes everything clearer and makes the shows so much more enjoyable.

Crunchyroll's subtitling is exemplary of how to do it. Netflix should definitely take a page out of their book when it comes to subtitling all their programmes.

Monday, July 4, 2016

Harvard Sentences: Making Every Phoneme Count

Have you ever gone to a concert and heard the sound engineer say "one two, one two"? If you're wondering why this is, it's because the word "two" is characterised by the silibant (hissing) sound. This allows them to test low and high frequency sounds and adjust the levels accordingly.

While this works for a concert and the audio levels for music, since the focus is on pitch variances and the overall mixing of the song, when it comes to communication, the simple "one two, one two" won't work. In this case, you should consider using Harvard Sentences.

Harvard Sentences are sentences that make use of common English phonemes in the same frequency that they tend to appear in normal sentences, thus making them representative of the language. It won't surprise you to know that they were developed at Harvard, either!

During the Second World War, scholars were working tirelessly on the intelligibility of radio communications. During this time, understanding radio messages was of the utmost importance. From this research, the representative Harvard Sentences were created (as well as the NATO Alphabet).

To test the quality of radio communications, researchers at Harvard developed a list of representative sentences. These sample phrases were later published by the Institute of Electrical and Electronics Engineers (IEEE) in their list of Recommended Practices for Speech Quality Measurements.

The published list includes 720 different Harvard Sentences, arranged into 72 lists of 10 (you can find the lists here). These sentences are still used today to test a variety of different technologies, from walkie-talkies and radios to mobile phones and Voice over IP, like Skype.

These sentences have helped develop plenty of communication technologies since the mid-1960s and continue to be used today, despite many of them sounding a bit silly!

Friday, April 22, 2016

Chatbots, AI, and Awaiting the Resurrection of Microsoft's Tay

Almost two years ago we dedicated a post to Eugene Goostman, a chatbot that had passed the Turing Test. At the time, we noted that the way the test was conducted seemed a little off and skewed. Nevertheless, we certainly thought that Eugene Goostman's achievements were worth celebrating.

This eagle was not pleased with Tay's behaviour.

Nearly a month ago on 23 March, Microsoft had a day from hell due to the complete debacle involving Tay, their new AI chatbot that took to Twitter and survived only 16 hours. In less than a day, users of the platform managed to corrupt the once innocent machine to a point where her tweets were so inflammatory that Microsoft had to close the AI's Twitter account.

Obviously, the Tay's "achievements" are a bit harder to celebrate than Eugene Goostman's. That said, while Microsoft's AI was spouting all sorts of racist and sexist messages, it was Twitter users who taught and raised it.

However, not all Twitter users are responsible for Tay's behaviour. In fact, Microsoft has used Chinese and Japanese chatbots for a couple of years now, and neither of them have caused any major problems. Maybe it's just that English speakers are worse when it comes to internet behaviour. Who knows?

Personally, I think that the internet should be given a second chance to try and raise its AI baby. However, I think Microsoft probably needs to ground Tay first and teach her a lesson. How did you react to Tay? Did you get over it? Would you like to see more experiments involving AI and chatbots on social media? Tell us your opinions in the comments below!

Friday, March 4, 2016

Emoticons and Emoji: How Pictures Are Worth A Thousand Words

Love them or loathe them, emoticons are becoming more and more commonplace in language, and not just for casual conversations between friends on Facebook or WhatsApp. Due to the immense popularity of mobile phones, texting, the internet, and messaging as forms of communication, emoticons, and now emojis, are now almost universally used.

While some of the purists among us may believe that most languages are diverse and varied enough not to need them, emoticons and emojis are everywhere. One use that particularly struck me was when I saw that the BBC had started using them more frequently in their posts on Facebook.

Today I'd like to talk about why emoticons and emoji are so useful in language, and the role they play in communication.

Texts and messages are short and instant forms of communication. Originally, all text messages were written using a traditional phone keypad instead of the keyboard featured on modern smartphones. Since you had to type letters by pressing a number key multiple times, texting could take quite a while.

When it comes to language, if there's a way to make something easier, we tend to do it. Not only were we trying to save time, but we were also trying to save money. The last thing you'd want to do is have your text (which had a limited number of characters) become two texts, costing you double.

SMS language was created as people tried to use fewer characters without any loss in meaning, which is how letters and numbers like "b", "c", "r", "u", "y", "2", and "4", began to be used to refer to the words "be", "see", "are", "you", "why", "to", and "for", respectively.

The Oxford Dictionary's "Word of the Year" in 2015.

The tone of texts can also be very ambiguous, so you can see how punctuation resembling a face could help set the mood of a message without having to write several long texts, which would take more time and money.

From characters looking like faces, we got emoji, a Japanese term that combines e, meaning "picture", and moji, meaning "character". Once emojis were included on Apple's iPhone, their popularity snowballed. Soon after, they were added to Android phones, and have now become a massive cultural phenomenon.

In fact, the Oxford Dictionary made an emoji their word of the year. The "Face with Tears of Joy" got the award in 2015, and is the most popular emoji.

What do you think of emoji and emoticons? Are they useful for communication? Or are they abominations on our once-beautiful languages? Tell us your thoughts in the comments below.

Wednesday, August 12, 2015

The Problems with Dubbing and Subtitling on Netflix

If you like TV shows and movies, Netflix is pretty great. The streaming service is one of the quickest ways to lose hours upon hours of your free time to popular media, and I'm cool with that. Netflix's algorithms always seem to suggest shows I end up liking, but there is one thing I don't like - its dubbing and subtitling.

Origami, another of Japan's fine artistic exports.

In the past, we've discussed dubbing versus subtitling at length (I tend to prefer subtitling over dubbing where possible). However, when watching anime (Japanese animation) I tend to take it on a series-by-series basis.

If the subtitles are good, I will happily watch an entire series with the original Japanese dialogue. However, when anime subtitles are bad, they are really bad! The internet is full of great examples of this.

Before I get into this rant, I need to clarify a couple of terms. For the purposes of this post, I'm taking "closed captioning" (CC) to refer to user-activated text that is generally used for those that are hard of hearing and "subtitling" to refer to a translation of foreign language dialogue that is not likely to be understood by the viewer. A quick way to distinguish whether you're watching CC or subtitling would be to see whether there are descriptions of sounds that wouldn't be considered dialogue, such as "[Phone rings]".

Aside from the bad grammar, unnatural syntax, or odd vocabulary choices present in bad anime subtitles, Netflix has a great way of making subtitles completely redundant. Aside from their low linguistic quality, I firmly believe there's also a technical issue at play here.

When I watch anime series on Netflix, I usually have two options for audio and two options for subtitles. The audio is available in either Japanese or English, while the subtitles are only available in English and can be "off" or "on". This is what causes problems.

The subtitles, just like the dubbing, are a translation of the original dialogue in Japanese. However, they are clearly not done simultaneously, nor do they appear to have any relation to each other.

On the one hand, the dubbing tends to have altered the original dialogue to make it fit better with the timing of the characters' speech, as well as make the lines more natural and easier to deliver by voice actors.

On the other hand, the subtitles tend to more strictly follow the meaning and structure of the dialogue. The massive difference between the dubbing and subtitling means that I find it almost impossible to have both dubbing and subtitling active at the same time.

Since you can either have all of the subtitles or none of the subtitles, Japanese text that appears in subtitles, such as explanations of time passing or where a scene takes place, are left untranslated. This is when I really get annoyed. I have to pause, turn the subtitles on, and rewind back to the start of the scene, just for the subtitles to load and tell me something like "One week later".

It should be noted that Netflix has also received criticism from deaf communities for the low quality of its CC. As much as I love the fact that it allows me to binge on watching massive robots and ninjas fight each other, it really needs to work harder on its foreign materials.

What do you think of Netflix's subtitling? Love it or loathe it? Are there better streaming services for subtitling? Or worse? Tell us your thoughts in the comments below.

Wednesday, July 15, 2015

Googlewhacking and Collocations on the Web

While the internet is definitely commonplace nowadays for the majority of the developed world, it's often amusing to think back to the earlier days of the internet. When I was in primary school we had to take a school trip to the local library to see the internet. This endeavour involved walking with your "buddy" and holding hands.

The technology in this data suite is far more advanced than
the clunky beige PC that first showed me the internet.

When we arrived, a solitary PC with a dial-up connection was used to showcase a number of the amazing features of the world wide web.

We were shown a very early example of bbc.co.uk (from 1997, to be precise) and the local weather forecast, which disappointingly but expectedly told us that north-eastern England would be rainy throughout the upcoming week.

Needless to say, we were not very impressed with two features that we could easily get from our TVs instead. The search engine, however, really sparked our imaginations.

We were told that if we typed something, the computer would show us what we were looking for. Fast forward a few years and IT lessons in secondary school were an interesting affair.

The school had just had broadband installed and every single student in the class had their own terminal with a "high-speed" internet connection, which helped us to spend the whole lesson doing anything but the work we were supposed to be doing.

One unproductive time-wasting technique we enjoyed was Googlewhacking. For those too young to remember a time before "Google" was a verb, it was possible in the last decade to search using only two English words and receive the message "no results found" from Google's search engine. Finding no results with a two-word search query was the goal of Googlewhacking.

This phenomenon (which seems impossible today) was probably due to there being fewer webpages and the fact that Google was yet to have crawled and indexed the web to the extent it has today. With that said, I do believe that it can tell us about how we use language, which I personally find very interesting.

If you are familiar with collocations, certain words naturally go together more frequently than they do with others. I often use Google as a quick and easy way to check how frequently certain words are used together (there are also more advanced search tools to do this as well). You can gauge one expression over another simply in terms of results by using quotation marks when you search.

Googlewhacking (though we didn't know at the time) gave us the opposite of collocations, words that seemingly never go together. As you were not permitted to use quotation marks, Google's results indicated that those particular combinations of words could not be found alongside one another, or even in the same sentence, paragraph, or webpage. Saying the results aloud would quickly tell your brain how rare these combinations were.

The latest examples of Googlewhacks (from 2008) on the now defunct googlewhack.com included "ambidextrous scallywags", "illuminatus ombudsman", "squirreling dervishes", and "assonant octosyllable". Any native speaker will note that these are rarely used words and even rarer combinations of them.

Nowadays Googlewhacking is pretty impossible as Google tries to suggest what you were trying to say and you rarely get a page saying there are no results (especially with a two-word phrase). However, as Amazon.com has a similar concept which they labelled a "statistically improbable phrase" by using data from indexed books in order to find out which words are rarely put together in the books they sell, you could always try Googlewhacking on Amazon and hope that your results only yield one result. Though I doubt it'd ever be as fun as Googlewhacking.

Did you ever Googlewhack in the past? What are some of the weirdest examples you can remember? Tell us about your experiences in the comments below.

Wednesday, July 1, 2015

Annoying Internet Terms That Shouldn't Be in Spoken Language

I love internet culture and arguably spend most of my time on the internet. It is a truly wonderful thing: at times it's a vibrant, beautiful ecosystem of ideas being exchanged, while at others it's like a dank puddle of murky water. Either way, I love it.

What I don't love about the internet is how some of its language encroaches into spoken language. I'm happy for the language to exist online and consider it almost as its own register. However, when the internet's weird lingo starts entering my ears and not my eyes, that's when I get annoyed. Here are a few of my biggest bugbears (or pet peeves to Americans) when it comes to online language that make me come close to losing my cool.

NASA astronaut Michael Gernhardt embodying "YOLO" in 1995
when dot-coms were just becoming household names.

Because

The term "because" is a bit of a funny one since I have no objection to the common usage of "because". However, the internet has given rise to the construction of "because" plus a noun. For example, "I can talk this way because language". I reckon it's a quick way to make most language purists' blood boil!

.com

Saying something is something.com is just downright stupid. My fury over this stems fully from the fact that saying "dot com" at the end of a word is not only already horrendously dated by about 20 years, it's also the kind of thing that uncool dads say when trying to be cool.

Fail

I wish people would stop using the verb "fail" when they are actually referring to a "failure", which is a noun. I also get fairly annoyed at the overuse of "epic" to describe said "fails". It's now used so often it's been demoted to the status of "moderate". This term is also often combined with the next one.

Hashtag

I like Twitter and understand why we have hashtags. In fact, I'm very happy to use them. Placing the "number sign" (#) before a word can help other users find content related to the word they've marked or to indicate the content is part of a particular conversation.

Using the term as a prefix irritates me beyond belief. Unless you're explaining a particular hashtag, saying hashtag is completely redundant.

LOL

LOL (an acronym for "laugh out loud") has been making the rounds online since people became too lazy to type out the onomatopaeia for laughter or explain that they found something humorous. As funny as it is when parents think "LOL" stands for "lots of love", there's nothing I find funny about using LOL in speech.

I find it annoying enough when people say "that's so funny" without actually laughing. Imagine how enraged I get when someone says "lol" in speech despite it being abundantly clear that they're not laughing out loud!

YOLO

I definitely agree that people should live life to the fullest. However, as a lover of Romance languages and Latin, I wish carpe diem was used instead of this acronym for "you only live once".

You can live your life with "YOLO" as a motto. Just please don't say it to me. Leave it on the internet, where it belongs. Thanks!

What internet terms do you wish people wouldn't vocalise? Tell us in the comments below.

Friday, October 17, 2014

Hatsune Miku: Virtual Vocals and Synthetic Singing

During a recent Facebook scrolling session, an odd link popped up on my news feed. It was this video of a musical performance on the Late Show with David Letterman.

You don't need to be the most observant person in the world to realise that the performer, Hatsune Miku, or 初音ミク, as her name is written in Japanese, is not a real person. Hatsune Miku is not the first virtual performer; other popular virtual acts include Alvin and the Chipmunks, The Archies, and Gorillaz. However, Hatsune Miku can do something that other acts can't do: sing.

You may think that her high-pitched singing is not as good as the sped-up singing of Alvin, Simon, and Theodore, and you may be right. However, the Chipmunks, much like other virtual acts, had their music and their vocals pre-recorded. Hatsune Miku's vocals are synthesised using Yamaha's VOCALOID2 and VOCALOID3 vocal synthesisers.

If you're familiar with Japanese, you may recognise the components of Hatsune Miku's name. In fact, the name translates as "the first sound from the future", with Hatsu (初) meaning "first", Ne (音) meaning "sound", and Miku (ミク) meaning "future".

Sapporo, Japan, the hometown of Hatsune Miku.

While 16 year-old Hatsune Miku could be said to be from Sapporo, the technology that allows her to sing was conceived of in Spain as part of a research project at Pompeu Fabra University in Barcelona.

Hatsune Miku's voice isn't purely synthesised and is in fact generated from phonemes prerecorded by Japanese voice actress Saki Fujita. Initially, only Japanese phonemes were recorded, before learning English (from Saki Fujita's recordings) for a later release. This allows her to sing in both languages, albeit with a Japanese accent when she sings in English.

The process that allows for the manipulation of the phonemes into song is known as concatenative synthesis. Using this process, sound samples (known as units) can be manipulated. This allows the user to modify a range of qualities, including the unit's length, pitch, and timbre.

Since anyone who owns the software can synthesise speech and vocals, Hatsune Miku is "technically" the performer of thousands of songs. She's not alone, though. There are also other virtual performers available with different language combinations such as Spanish and Chinese. Other languages can also be approximated using preexisting phonemes, with differing levels of success.

Wednesday, July 23, 2014

How Recent is the Expression "Where Are You?"

We heard an interesting fact the other day. It suggested that other than in reference to one's immediate vicinity, nobody would have ever used the expression "Where are you?" prior to the invention of mobile communications such as radio transmission, mobile phones, or the internet.

The logic behind this is that if you were to write somebody a letter you would require an address. If you had somebody's address, would you need to ask them where they were? I think not. Before mobile telephones, you would usually call a fixed line, meaning that you also already knew where somebody was.

Did this device really spawn the phrase "Where are you?"

This supposed fact is probably not true, as communications prior to mobile phones did not guarantee that the sender of the message would know where somebody was, for example. Imagine sending a message to a soldier on the front lines, you would probably ask where they were after asking if they were alive and safe.

Another similar and more probable suggestion is that before answering machines were invented, nobody had uttered "Sorry, I'm not here right now".

While we do not believe that throughout all of human existence these expressions were never uttered, we do believe that their usage was significantly lower prior to the advent of mobile communication.

We did a quick search for the earliest recorded instance of "Where are you?" and found an example in the biblical book of Genesis, albeit a translation. I guess you'd be hard-pressed to find an earlier example, at least if you believe the Old Testament.

Can anyone actually prove this "fact" for us? Share your thoughts, proofs, or just ideas, in the comments below.

Wednesday, June 18, 2014

How Eugene Goostman Passed the Turing Test

As the headlines have been dominated by the World Cup recently, one of the news stories that fell by the wayside last week was the story of chatbot Eugene Goostman passing the Turing Test. If you aren't familiar with it, the Turing Test is a method for testing artificial intelligence posited by Alan Turing.

Alan Turing was a British mathematician who helped crack the Enigma code during World War II, in addition to being considered one of the most important pioneers of artificial intelligence. To simplify his test, Turing suggested that if a human participant could not tell whether simulated behaviour was that of an AI or a human, the AI will have passed the test.

While there are various ways to adjudicate the Turing Test, it is generally agreed that there are two main rules to follow when conducting the Turing Test:

1. The participant has five minutes to interact with the AI.

2. At least 30% of respondents must be fooled by the AI into thinking it is a real human.

The reason this news particularly interests us is because of how the test is conducted. Language is used as the primary indicator for participants as to whether or not they are speaking to an AI. In the most recent test, participants had a conversation with Eugene Goostman via typed chat. However, they were told that Eugene Goostman was a 13 year-old Ukrainian boy. As we said in our earlier post on the Turing Test, a human who does not speak English as their first language may be considered to be an AI by respondents.

The reverse can also be true. If participants are told that they are speaking to a 13 year-old Ukranian boy, they are likely to be more lenient when it comes to judging the responses of the AI, assuming that misunderstandings, odd answers, and incorrect grammar are all failings of the boy's age and mother tongue rather than that of the AI.

While we have no doubt that Eugene Goostman is an impressive piece of programming, we can't help but think that these results are exaggerated. Had people thought that they were talking to a native English speaker, would he have passed the Turing Test?

What do you think? Genuine AI or skewed results? Tell us your thoughts in the comments below.

Friday, May 30, 2014

Why We Both Love and Hate Google's Spell Up

As most of my web browsing starts from searching for something on the internet, setting Google as my homepage seemed like an ingenious idea. However, with a lot of stuff on the internet being little more than an aid to procrastination, google.com has become a thorn in the side of my productivity.

This was particularly true yesterday, when I discovered Google Chrome's new experiment, Spell Up. Only after playing for thirty minutes did I see the promotional video explaining its purpose.

While not explicitly saying that the game is for those learning English as foreign language, it's quite clear that the benefits of playing this game will be clearer to somebody who does not speak English as their first language. Let's start with the reasons as to why we love the game:

The Good

More Language Video Games

I'm really fond of video games, in all shapes and forms, and while racing fast cars, killing terrorists, or embarking on a mystery quest are all my cup of tea, there are very few language games that I have actually enjoyed and wanted to continue playing.

A Focus on Spoken Language

The game's focus on speaking is an aspect that is often overlooked when learning to speak a language from a book, podcasts, CDs, or, if you can remember that far back, cassettes. Many language learning programs ignore this or add it as an afterthought in a way that means the learner never has their spoken language skills evaluated, and instead just speaks aloud to themselves in public like a lunatic.

A Sense of Achievement

Gameifying language learning is a fantastic way to encourage the continuation of your "quest" for a new tongue. With achievements, levels, power-ups, and bonuses, the player must actually do something. In a book, you can just keep reading whether you understand the concept or not.

The Bad

Understanding Native English Speakers

As a Geordie (a native of the northeastern English city of Newcastle-Upon-Tyne), I certainly do not have the clearest and most easily-understood accent when speaking my mother tongue, and I can accept that. However, I do not accept when the phonemes I can pronounce well are misunderstood by a machine, forcing me to alter the way I speak just to play the game. This is even more annoying when you're spelling the last letter of discombobulate and the machine thinks I said "a" when I said "e" and have to go back to the start of the level.

Bugs

It's very unnatural to spell words as slowly as the program requires and when it finally catches up it throws up a suggestion of what it thinks the combination of the four letters it couldn't hear could be if they were just one letter.

Aside from the obvious linguistic issues I have with the program, it is still a video game at the end of the day and it will be faced with the same scrutiny as I would judge any other game. It doesn't run well! The frame-rate is poor and jumpy.

Put simply, Spell Up is a good idea, poorly executed.

Have you played Spell Up? If so, tell us about your experience with it in the comments below and whether or not you're a native English speaker. We can't wait to hear from you!

Friday, November 22, 2013

The ALPAC Report: The Failings of Machine Translation

One of the organisations interested in
the potential of machine translation.

Not long ago, we had a look at the birth of machine translation (MT) with the Georgetown-IBM experiment. Following the experiment, optimism was at an all-time high for MT, and the problem was expected to be solved promptly. Today we're looking at the next important milestone in early MT, the ALPAC Report. Unfortunately, our tale includes a lot of government bodies and research groups, so expect a lot of acronyms.

In the US, the Department of Defense, the National Science Foundation, and the Central Intelligence Agency (CIA) were very interested in the prospect of automatically processing languages and MT. In the case of the Department of Defense and the CIA, this was mainly because the US was extremely curious and sceptical of the Russians and wanted to know what they were up to. By 1964 they had promoted and funded work in the field for almost a decade, and together the three organisations founded the Joint Automatic Language Processing Group (JALPG).

In 1964, JALPG set up the Automatic Language Processing Advisory Committee (ALPAC) in order to assess the progress of research. ALPAC was, in essence, founded by the US Government to ensure that funds were being spent wisely.

John R. Pierce, head of ALPAC.

The group was headed by chairman John R. Pierce, an employee of Bell Labs, who was assisted by various researchers into MT, linguists, a psychologist and an artificial intelligence researcher. They worked together in order to produce the 1966 ALPAC report, which was published in November of that year.

Titled "Languages and machines: computers in translation and linguistics", the report would appear to have a focus not only on MT, but also on computational linguistics as a whole. However, the report viewed MT very narrowly, from the perspective of its applications in terms of the US government and military, and how they could use the technology exclusively with the Russian language.

The report showed that since most scientific publications were in English, it would actually be quicker and therefore more cost-effective to learn and read Russian than to pay for translations into English. They also noted that there were an abundance of translators and that the supply of translators outweighed the demand for them, meaning that there was even less demand for research into MT to replace human translators.

While the report evaluated the translation industry in general, it also covered research into MT. It condemned the work done in Georgetown, as there was little evidence to support quality translations from the same place that had spawned the idea that the MT issue was close to being solved.

In fact, Georgetown's MT project had produced no translations of scientific texts, nor had it any immediate plans to do so. The report had defined MT as a process that required no human interaction and the fact that Georgetown's work still required human post-editing left ALPAC to deem it as a failure.

One of the criticisms of the unedited output of the MT was that though it could be deciphered by a human reader, it was sometimes inaccurate or completely wrong. It also criticised the work of Georgetown when compared with the 1954 experiment, stating that the output from 10 years previous were not only better, but showed little progress of the programme after that time.

Though the input for the original experiment was extremely limited and the systems tested by ALPAC were experimental, this did not lead to ALPAC cutting Georgetown any slack. ALPAC did, however, state that MT was not an issue with a foreseeable resolution as the Georgetown-IBM experiment had certainly suggested.

Though ALPAC hardly praised MT, it did appear to approve of the ideas of "machine-aided translation", which effectively refers to translation tools, which are fairly commonplace in today's translation industry. The report assessed that MT had advanced the field of linguistics more than it had the field of computing, and that MT was not deserving of more funding. Before it could receive more funding, certain criteria would have to be met.

In conclusion, ALPAC suggested the following:

practical methods for evaluation of translations;
means for speeding up the human translation process;
evaluation of quality and cost of various sources of translations;
investigation of the utilization of translations, to guard against production of translations that are never read;
study of delays in the over-all translation process, and means for eliminating them, both in journals and in individual items;
evaluation of the relative speed and cost of various sorts of machine-aided translation;
adaptation of existing mechanized editing and production processes in translation;
the over-all translation process; and
production of adequate reference works for the translator, including the adaptation of glossaries that now exist primarily for automatic dictionary look-up in machine translation

It would be fair to say that given the aim of the report, ALPAC achieved its objective of assessing MT. The downside to the report is that research into MT was effectively suspended for two decades, since all significant government funding was cut.

Perhaps we are little bitter that the ALPAC report was so damning of the work of MT merely because we can still see failings in modern day MT, such as our "favourite" Google translate. However, it would be fascinating to see what MT could have achieved had it been funded with as much fervour during the 60s, 70s, and 80s as it had been in the mid-to-late 50s.

Do you feel we would be better off had MT research continued? Or do you think "Machine-Aided Translation" was the correct avenue to pursue? Tells us your thoughts in the comments below. If you wish to read the 1964 ALPAC report, a full copy can be found here.

Saturday, June 1, 2013

Languages In The News: May 2013

Today we've decided to take a look at some of the biggest language stories featured in the news from the past month. We try to share all language news on our Facebook page, but we'll look back at the top stories at the end of each month just in cased you missed them. Here's what has been going on in the world of languages throughout the month of May.

The Guardian and The Economist both featured the conlang Dothraki from Game of Thrones in posts at the end of the month. These were published on April 30th, but since this is our first "Languages In The News" post, we'll include them.

The New York Times featured an overly-favourable article on translation apps. Despite calling the piece "The Utility and Drawbacks of Translation Apps", we found there were far too few drawbacks.

Arco della Pace in Milan, the city where writer Dan Brown
had 11 translators working underground for 2 months.

Dan Brown's new novel was covered by a few sources after it was revealed that the translators working on the piece were subjected to fairly "hellish" conditions whilst translating in order to not reveal any secrets and spoil from the book. One such article was found in The Telegraph.

The Los Angeles Times informed us that search engine Bing's translation services will now include the Star Trek conlang Klingon as part of a marketing campaign for the franchise's latest film, Star Trek Into Darkness. Trekkies can rejoice at the ability to translate text written in over 40 languages into Klingon, as well as convert it back into a "traditional" language.

In the mid-May, we found out from CNET that Google Translate now produces a billion translations per day while helping about 200 million users. The translation service works in 71 languages, but we're still skeptical of the quality of the machine-based translations it provides.

In Franglais, these are called talkie-walkies!

The relationship between French and English was heavily featured in the news this month. The Guardian informed us that the French government has decided to relax a long-time ban on the use of foreign languages in its universities. Since 1994, a French law has banned all teaching in a foreign language except, of course, in the case of language courses. The news inspired the BBC to produce some fun articles on Franglais, including a piece on their readers' favourite Franglais terms and phrases, as well as an amusing post called "How to speak Franglais" that is completely written in Franglais.

Finally, we have the results of two language-related research studies. The first study, done by researchers in Sweden and the US, discovered that foetuses actually listen to and remember their mothers' speech in the finals weeks of pregnancy. They can also distinguish foreign languages soon after birth, as discussed in this BBC article. A second study in Britain revealed that the long-debated idea of a Eurasiatic superfamily of languages may actually be a reality. The group of linguists was able to narrow down a list of 23 words found in at least four of the languages thought to belong to the superfamily, including "man", "mother", "worm" and "to spit"!

Was there another language article we missed that really piqued your interest this past month? Let us know below in the comments.

Friday, February 1, 2013

How Google Became a Verb

Many years ago, at least in terms of the internet, a couple of college students at Stanford (which is one of our top language universities in the U.S.) made something that helped change the way people both browse and speak. Their product, or perhaps service, was Google.

Without Google, finding things on the internet
is like finding a needle in a haystack.

As you surely know, Google is a search engine. Its primary function is to direct web users to the appropriate web page based on their search criteria. From its birth as a company all those years ago in 1998, Google has gone from strength to strength. From its humble origins as a white page with a text box, which hasn't changed much over the years, the corporation now includes cloud computing, email and even our much-loathed Google Translate.

The name for Google came from the word googol, which is the number 10¹⁰⁰, written as a one followed by one hundred zeros. Making it a pretty big number obviously is supposed to indicate the prowess of the search engine's capabilities.

The word Google as the name for the company has existed since its inception, but as a verb the first known occurrence came in an email from co-founder Larry Page on 8 July 1998 in which he said "have fun and keep Googling!". Despite the company trying desperately to stop people using the word in this manner, they have only themselves to blame.

It's unlikely "google" was
featured in this dictionary.

The American Dialect Society chose it as their most useful word of 2002, and it was even mentioned in an episode of Buffy the Vampire Slayer way back when.

In popular media it's used more and more frequently, and Google have taken steps to avoid its overuse since they fear it may become a generic trademark. They encourage people to use the verb to google (note the lowercase "g") only when referring specifically to Google's own search engine.

In many dictionaries, Google refers to the company or product, and google refers to the verb meaning "to search for on the internet", whether you use Google or not. So you can google on Google, but you can google using other search engines too!

With the world getting better-connected every day, we can only expect more words like this to find their way into the lexicon. We've heard people using Facebook as a verb too.

Have you heard any good internet neologisms? Tell us about them in the comments below!

Saturday, October 6, 2012

New Technology: New Vocabulary

You couldn't have asked someone in the '70s if they had the internet, nor if they had a PC... (personal computer, not police constable). With new technology comes new vocabulary. Here are some of our favourite words that didn't exist before the internet:

Internet - Where it all began. A word that is mentioned everyday, yet would not have been found in conversation twenty years ago.

There are 10 types of people in this world.
Those who understand binary and those who don't.

Retweet, sexting, and cyberbullying - We like the words, not the activities... at least not cyberbullying.

Acronyms such as LOL, OMG, BTW - Amongst other travesties against the English language...

Unfriend - You can remove someone from your digital life with a single click, but it isn't so easy in real life!

Dot-com - dot and possibly even com would have existed, but not in this context.

Blog - The word came from "web log", so you can thank laziness for it becoming blog... fun to say though. Blog, blog, blog!

Google - We're referring to the verb; the number existed long before that. They had a team counting to it... but many lost their lives in its pursuit.

w00t - This modern expression of excitement has even made it into at least one dictionary! The Concise Oxford English Dictionary added it in 2011... although they did replace the 0's with o's.

facebook - Again, as a verb; the words "face" and "book" clearly existed beforehand. Facebook the social networking site was launched in 2004, and has since stolen countless hours of our lives as we look at amusing images of cats.

How could anyone resist that adorable face
and impressive display of bipedalism?

Thursday, September 27, 2012

All Your Base: The Importance Of Good Localisation

Localisation, in the gaming industry, is the cultural adaptation and translation of products for sale and use in other markets. It can include translation for use in different countries where other languages are spoken, as well as in areas where the same language is spoken in a different dialect with different idioms (think US vs UK English).

One of the most prominent examples of a bad localisation process comes from the classic game Zero Wing. The game was decent, but it was made famous when rediscovered in 1999 and the English version of the intro spread like wildfire across cyberspace.

The terrible captioning that started it all.

Thus the "All Your Base" meme was born. Although amusing, the main problem was that when the game was made, the importance of high-quality translation and localisation was being overlooked. This resulted in some of the most horrific English you've ever seen.

Nowadays, the gaming industry is big money and games are created worldwide. However, without localisation it's difficult (perhaps impossible) to sell products globally.

Perhaps some of the best games... best localisation, perhaps not.

Once a game is localised, there's still another problem to tackle. Can you translate its cultural setting? Many games now feature detailed narratives. You can translate all the text properly, but can you really localise sentiments felt in one part of the world that may not be felt in another part? The Modern Warfare series probably isn't very popular in the Middle East. You can spend hours playing a Japanese game and still never really understand why any of the characters did what they did.

Although Mario still sells well in Italy...

Tuesday, September 25, 2012

Technology Understanding Languages: Don't Be Siri!

So you've got a new smartphone and you'd rather tell it what to do whilst it's in your hand than touch the screen. You probably decide to use its speech recognition software. Then, you tell it to make an imaginary appointment in your calendar... and it does!

"I'm sorry, I can't do that Dave."

How does it understand language? Well... it doesn't. It simulates it pretty well, that's all. It deciphers which phonemes have been said and puts them together in the most probable order.

If you speak a language, understanding words is quite simple. Your brain should be many times more powerful than the average smartphone. IBM simulated an apparent 4.5% of the human brain with a supercomputer, requiring 147,456 processors. That's the equivalent of your brain after a night of vodka and that's still pretty impressive.

It's very difficult to separate individual sounds with just one input. Because of that, some horrific mind-numbing mathematics is involved. To put it simply, the software hears audio and then guesses at the most probable phoneme you may have said. It does this by ruling out impossible combinations or very rare occurrences.

First, the hardware on your smartphone converts the analogue information into digital information. Computers like 1s and 0s.

The software cleans up the digital data, then removes background noise and frequencies beyond our range of hearing. The information is divided into very small sections (hundredths of a second) and sampled by the software in order to process the phonemes.


Can you decipher this? Didn't think so...

The phonemes are processed by means of probability. The most likely phonemes are considered first, but if they're followed by unlikely phonemes or expressions, they are disregarded and replaced with the more likely alternative.

An example of a stumbling block for speech recognition would be the following:

"Real eyes realise real lies".

Its output could easily be realise repeated three times. So a speech recognition program would probably get this wrong. There are so many examples that could be wrong, so how does it occasionally get it right?

"Where are you?" could be "wear are you?" - we know it couldn't be, but a computer doesn't. The only way to stop this being mistaken is to have included likely and unlikely word combinations. The best method is to pick the most likely option, but that can be difficult if you don't know what any of the words mean.

The phone has as much chance of understanding you as any member of the opposite sex, but that doesn't mean you can do those sorts of things with it, even though it does vibrate.

Pages