Robot Technology News  
ROBO SPACE
Translating lost languages using machine learning
by Staff Writers
Boston MA (SPX) Oct 22, 2020

In developing a system to help decipher lost languages, MIT researchers studied the language of Ugaritic, which is related to Hebrew and has previously been analyzed and deciphered by linguists.

Recent research suggests that most languages that have ever existed are no longer spoken. Dozens of these dead languages are also considered to be lost, or "undeciphered" - that is, we don't know enough about their grammar, vocabulary, or syntax to be able to actually understand their texts.

Lost languages are more than a mere academic curiosity; without them, we miss an entire body of knowledge about the people who spoke them. Unfortunately, most of them have such minimal records that scientists can't decipher them by using machine-translation algorithms like Google Translate. Some don't have a well-researched "relative" language to be compared to, and often lack traditional dividers like white space and punctuation. (To illustrate, imaginetryingtodecipheraforeignlanguagewrittenlikethis.)

However, researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) recently made a major development in this area: a new system that has been shown to be able to automatically decipher a lost language, without needing advanced knowledge of its relation to other languages. They also showed that their system can itself determine relationships between languages, and they used it to corroborate recent scholarship suggesting that the language of Iberian is not actually related to Basque.

The team's ultimate goal is for the system to be able to decipher lost languages that have eluded linguists for decades, using just a few thousand words.

Spearheaded by MIT Professor Regina Barzilay, the system relies on several principles grounded in insights from historical linguistics, such as the fact that languages generally only evolve in certain predictable ways. For instance, while a given language rarely adds or deletes an entire sound, certain sound substitutions are likely to occur. A word with a "p" in the parent language may change into a "b" in the descendant language, but changing to a "k" is less likely due to the significant pronunciation gap.

By incorporating these and other linguistic constraints, Barzilay and MIT PhD student Jiaming Luo developed a decipherment algorithm that can handle the vast space of possible transformations and the scarcity of a guiding signal in the input.

The algorithm learns to embed language sounds into a multidimensional space where differences in pronunciation are reflected in the distance between corresponding vectors. This design enables them to capture pertinent patterns of language change and express them as computational constraints. The resulting model can segment words in an ancient language and map them to counterparts in a related language.

The project builds on a paper Barzilay and Luo wrote last year that deciphered the dead languages of Ugaritic and Linear B, the latter of which had previously taken decades for humans to decode. However, a key difference with that project was that the team knew that these languages were related to early forms of Hebrew and Greek, respectively.

With the new system, the relationship between languages is inferred by the algorithm. This question is one of the biggest challenges in decipherment. In the case of Linear B, it took several decades to discover the correct known descendant. For Iberian, the scholars still cannot agree on the related language: Some argue for Basque, while others refute this hypothesis and claim that Iberian doesn't relate to any known language.

The proposed algorithm can assess the proximity between two languages; in fact, when tested on known languages, it can even accurately identify language families. The team applied their algorithm to Iberian considering Basque, as well as less-likely candidates from Romance, Germanic, Turkic, and Uralic families. While Basque and Latin were closer to Iberian than other languages, they were still too different to be considered related.

In future work, the team hopes to expand their work beyond the act of connecting texts to related words in a known language - an approach referred to as "cognate-based decipherment." This paradigm assumes that such a known language exists, but the example of Iberian shows that this is not always the case. The team's new approach would involve identifying semantic meaning of the words, even if they don't know how to read them.

"For instance, we may identify all the references to people or locations in the document which can then be further investigated in light of the known historical evidence," says Barzilay. "These methods of 'entity recognition' are commonly used in various text processing applications today and are highly accurate, but the key research question is whether the task is feasible without any training data in the ancient language." .

The project was supported, in part, by the Intelligence Advanced Research Projects Activity (IARPA).

Research Report: "Deciphering Undersegmented Ancient Scripts Using Phonetic Prior"


Related Links
Adam Conner-Simons for MIT News
All about the robots on Earth and beyond!


Thanks for being here;
We need your help. The SpaceDaily news network continues to grow but revenues have never been harder to maintain.

With the rise of Ad Blockers, and Facebook - our traditional revenue sources via quality network advertising continues to decline. And unlike so many other news sites, we don't have a paywall - with those annoying usernames and passwords.

Our news coverage takes time and effort to publish 365 days a year.

If you find our news sites informative and useful then please consider becoming a regular supporter or for now make a one off contribution.
SpaceDaily Contributor
$5 Billed Once


credit card or paypal
SpaceDaily Monthly Supporter
$5 Billed Monthly


paypal only


ROBO SPACE
A global collaboration to move artificial intelligence principles to practice
Boston MA (SPX) Oct 20, 2020
Today, artificial intelligence - and the computing systems that underlie it - are more than just matters of technology; they are matters of state and society, of governance and the public interest. The choices that technologists, policymakers, and communities make in the next few years will shape the relationship between machines and humans for decades to come. The rapidly increasing applicability of AI has prompted a number of organizations to develop high-level principles on social and ethical i ... read more

Comment using your Disqus, Facebook, Google or Twitter login.



Share this article via these popular social media networks
del.icio.usdel.icio.us DiggDigg RedditReddit GoogleGoogle

ROBO SPACE
DARPA project strives for off-road unmanned vehicles that react like humans

Skyvision team wins AUVSI XCELLENCE award

Boeing to build unmanned aerial vehicles in Australia

Turkey, Iran deploy 'game-changing' drones in north Iraq

ROBO SPACE
Does science have a plastic problem

When honey flows faster than water

Scientists discover unusual materials properties at ultrahigh pressure

Western Australia to host space communications station

ROBO SPACE
Material found in house paint may spur technology revolution

Researchers discover a uniquely quantum effect in erasing information

SK Hynix in $9 bn deal for Intel's flash memory chip business

SK Hynix in $9 bn deal for Intel's flash memory chip business

ROBO SPACE
Russian scientists suggested a transfer to safe nuclear energy

Framatome launches Framatome Defense to support the French national defense industry

Framatome showcases nuclear technologies at China's first international nuclear exhibition since COVID-19

Framatome and General Atomics announce collaboration to develop fast modular reactor

ROBO SPACE
Facebook, Instagram ban QAnon conspiracy-linked accounts

Army adopts new tech to detect chemical weapons

Bulgarian poisoning victim says his case echoes Navalny's

NATO demands Moscow reveal Novichok programme

ROBO SPACE
Unprecedented energy use since 1950 has transformed humanity's geologic footprint

A renewable solution to keep cool in a warming world

Greenpeace knocks ECB for carbon-heavy 'bias'

Real-time data show COVID-19's massive impact on global emissions

ROBO SPACE
How impurities enhance a thermoelectric material at the atomic level

LiU researchers first to develop an organic battery

A new approach boosts lithium-ion battery efficiency and puts out fires, too

UNLV and University of Rochester physicists observe room-temperature superconductivity

ROBO SPACE
China's Xichang launch center to carry out 10 missions by end of March

Eighteen new astronauts chosen for China's space station mission

NASA chief warns Congress about Chinese space station

China's new carrier rocket available for public view









The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.