Specialized Terminology on HLT

Junio 24, 2007 at 1:13 pm (IST)

Wikipedia offers us precise definitions of specialized terminology related to Human Language Technologies and closely linked topics:

  • Machine translation (MT): a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. It performs simple substitution of words in one natural language for words in another.
  • Machine-aided translation (CAT): a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process. It is also known as computer-assisted translation.

Currently, machine-translation softwares are still to be improved. Thus, computer-assisted translation is the most used for its efficiency and usefulness.

  • Multilingual content management: set of processes and technologies that support the evolutionary life cycle of digital information in several languages. This digital information is often referred to as content or, to be precise, digital content. Digital content may take the form of text, such as documents, multimedia files, such as audio or video files, or any other file type which follows a content lifecycle which requires management.
  • Translation technology: In general terms, translation technology involves all the previous terms, as it is the set which makes reference to every translation tool.  

Permalink 2 comentarios

Machine Translation Systems

Mayo 18, 2007 at 2:31 pm (IST)

Machine translation is one of the most interesting, useful and complete language technology applications. At its basic level, MT performs simple substitution of words in one natural language for words in another.

There are now many software programmes for translating natural language, several of them online, such as SYSTRAN.

The followings are some examples of well-known MT systems.

  • ESTeam Translator: supports translations in any direction between all the official European Union languages and Norwegian. It integrates Translation Memory (TM) and Machine Translation (MT) technology.
  • ATA Software: a world leader in the English/Arabic machine translation market.
  • TranExp: It contains NeuroTran -machine translator and dictionary- in English, German, French, Spanish, Hungarian, Polish, Croatian, Bosnian and Serbian. Also InteractiveTran -machine translator and dictionary too-, which includes Arabic, Bulgarian, Chinese, Croatian, Czech, Danish, English, German, European Portuguese, Finnish, Polish, Russian, Spanish, Swedish, etc.
  • SMART Translator Software: translates controlled English to a variety of languages, such as Spanish, French, Brazilian Portuguese and Italian.

Permalink 1 comentario

Characteristics of translation by FEMTI

Mayo 18, 2007 at 1:04 pm (IST)

According to the FEMTI report, “characteristics of the translation task refers to the information flow intended for the output, from the point of view of the agent (human or otherwise) who receives the translation”.

FEMTI notes three main characteristics in the translation task:

  • Assimilation: monitoring a large volume of texts produced by people outside the organization, in several languages. Steps: document routing/sorting, information extraction/summarization and search.
  • Dissemination: delivering to others a translation of documents produced inside the organization. Steps: internal/in-house and external publications.
  • Communication: supporting multi-turn dialogues between people who speak different languages. It can be synchronous or asynchronous.

Developers, research sponsors, commercial investors, buying agents, operational managers and end users are the stakeholders of the translation task.

J.C. Sager distinguises two type of uses on MT systems: the un-edited output and the edited output. In the latter case, the cost of revision, editing, etc. has to be established and compared with the cost of manual translation. As the type of use is inevitably related to the kind of text, establishing and taking account of types is needful.

Furthermore, Hovy opts for a division of the possible translation tasks in three groups. He claims that in order to make the taxonomization of features useful to beginners in MT, “it is important to articulate its layers and choices in terms they can intuitively understand”. This part of the evaluation taxonomy three main types of use. This way, users will be able to identify the type of work they want and developers will define in strict terms what their MT system can do.

Permalink Dejar un comentario

Meetings On Computational Linguistics

Mayo 18, 2007 at 12:29 pm (IST)

There is a wide range of worldwide meetings organized to share and discuss different updated information, recent developments and achievements made on the field of computational linguistics. As new ideas are constantly being developed, these meetings turn out to be an important point to discuss and consolidate them.

These meetings are made with different frecuency. Some of them are annuals, such as the one planned by the Association for Computational Linguistics (2007) or the one organized by the North American Chapter of the Association for Computational Linguistics, which has recently taken place in New York (2007). In Spain, the annual congress held by Sociedad Española para el Procesamiento del Lenguaje Natural is going to be set in Sevilla (2007).

These meetings will be carried out both by scholars who give conferences on the main topics and by workshops where participants share their experience and differently developed ideas. In these workshops a previous paper explaining the topic you are going to work on is required.

 To conclude, it is important to underline the usefulness of these conferences to consolidate information and enlarge everyone’s knowledge in the vast field of computational linguistics and natural language processing.

Permalink Dejar un comentario

The Importance Of HLT

Mayo 18, 2007 at 12:28 pm (IST)

The capabilities of Human Language Technologies have fastly grown in recent years; not only in the research and cientific field, but also in the commercial marketplace. We currently find a wide range of applications for HLT systems, such as automatic transcription of meetings, translation between languages, automatic answering of questions, text mining and access to information through spoken human-computer dialogue.

As The University of Sheffield points out, it is undoubtly true that “systems which use HLT are now in everyday use, through technologies such as internet search engines and mobile phones, and most major international computer and telecoms companies now engage in HLT research and development.”

Consequently, “there is strong demand for graduates with the highly-specialised multi-disciplinary skills that are required in HLT, both as practitioners in the development of HLT applications and as researchers into the advanced capabilities required for next-generation HLT systems.”

Permalink Dejar un comentario

European Research Centres for HLT

Abril 1, 2007 at 5:11 pm (Uncategorized)

There are many organizations, groups and centres which work with the development and evolution of Human Language Technologies:

Permalink Dejar un comentario

Hans Uszkoreit’s Contribution

Abril 1, 2007 at 4:40 pm (IST)

Hans Uszkoreit is one of the most influential and well-known specialists in the field that has to do with Human Language Technologies. His contribution throughout these years is worth highlighting.

Uszkoreit is Professor of Computational Linguistics at Saarland University. At the same time he serves as Scientific Director at the German Research Center for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Lab. By cooptation he is also Professor of the Computer Science Department.

Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin and the University of Texas at Austin (1973-1981). He received his Ph.D. in linguistics in 1984.

From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. During this time he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In 1986 he spent six months in Stuttgart on an IBM Research Fellowship at the Science Division of IBM Germany.In December 1986 he returned to Stuttgart to work for IBM Germany as a project leader in the project LILOG, and also worked as a professor in the University of Stuttgart.

From 1988 he works at Saarland University in the Department of Computational Linguistics and Phonetics. In 1989 he became the head of the newly founded Language Technology Lab at  DFKI.

Uszkoreit is Permanent Member of the International Committee of Computational Linguistics (ICCL), Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, and serves on several international editorial and advisory boards.  He is co-founder and Board Member of XtraMind Technologies GmbH, Saarbruecken, acrolinx gmbh, Berlin, and AnswerBus GmbH, Saarbrücken. Since 2006, he serves as Chairman of the Board of Directors of the international initiative dropping knowledge.

His most recent publications and short CV can be found in http://www.coli.uni-saarland.de/%7Ehansu/bio.html.

Permalink Dejar un comentario

Basic Notions Of Human Language Technologies

Abril 1, 2007 at 4:09 pm (IST)

There is no doubt that the development of all technologies totally depends on language, as communication is an essential feature for the use of these tools. But “what is less obvious is that the development and the evolution of language – its effectiveness in communicating faster, with more people, and with greater clarity – depends more and more on sophisticated tools.” (Language and technology: from the Tower of Babel to the Global Village, 1996) 

The main aim of Human Language Technology -including activities such as coding, recognition, interpretation or translation- is to enable people to communicate with machines using natural communication skills. Hans Uszkoreit offers us in his article What is Language Technology? a brief accurate definition on HLT:

“Language technology -sometimes also referred to as human language technology- comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics” (H. Uszkoreit, DFKI 2007).

Nevertheless, TC-STAR includes another less technical explanation in their publication Human Language Technologies For Europe (2006) to make us understand what HLT involve:

“We want to be able to interface with machines by voice and language, because we use these communication means and we want computers to process this form of information in all the ways that we consider useful. The set of technologies whic do this are known as human language technologies. Automatic speech recognition, machine translation and text to speech are the more prominent technologies, but there are many more.”

Permalink Dejar un comentario

Group E – Metadata & Metacontents

Febrero 10, 2007 at 3:56 pm (IST)

A piece of information -for example the number 16086364- given out of a context is meaningless. It is necessary something more about that data in order to understand that the content represents one individual’s ID card, and that is what we call metadata. This concept of metadata is very interesting in a variety of fields of computer science.

There are some definitions needed so as to understand accurately what all these terms mean on the whole:

  • Data: in a very large sense, it refers to “numbers, characters, images or other outputs from devices to convert physical quantities into symbols processed by a human or input into a computer or transmitted to another human or computer”. Data processing occurs by stages -from raw data to processed data (Wikipedia, visited: 01/12/2007).
  • Metadata: the most common and specific definition for the term is “data about data“, that could be developed as “structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities” (Wikipedia, visited: 01/12/2007).
  • Content: referring to computing, it means “the ’stuff’ that makes up a website“, such as words, pictures, images or sounds. In other words, “the ‘information’ a website provides” (Web-Designz.com, visited: 01/12/2007).
  • Metacontent: it is the information relating to the document’s content, such as its title, author, size, date, changes-history, key words…, etc. A metacontent can be used for searching and leaking information, and administering documents (Joaquín Bravo Montero, visited: 01/12/2007).

A metalanguage, in computing terms, refers to the programming languages developed for computers to process data and information. Three examples of these are:

  1. HTML (http://www.w3.org/MarkUp/).
  2. XML (http://www.w3.org/XML/).
  3. SGML (http://www.w3.org/MarkUp/SGML/).

The new Internet, the Web 2.0, offers us endless possibilities, but there is something missing – in words of W3C platform “a part of the Web which contains information about information – labeling, cataloging and descriptive information structured in such a way that allows Web pages to be properly searched and processed in particular by computers. In other words, what is now very much needed on the Web is metadata“.

Permalink Dejar un comentario

Inside The Markup Languages

Febrero 10, 2007 at 3:53 pm (IST)

The most popular markup language for the building of web pages is HTML (HyperText Markup Language). The aim of this language is “to describe the structure of the text-based information in a document and to supplement that text with interactive forms, embedded images, and other objects” (Wikipedia, 10-02-07). HTML is created in the form of labels, by “>” and “<” signs. The original HTML was created by Tim Berners-Lee using the NeXTSTEP development environment.

HTML markup consists of elements, attributes, data types and a variety of character references. With regard to the former, HTML language’s elements are often often classified as the next:

  • Structural markup describes the purpose of the text.
  • Presentational markup describes the appearance of the text, with no regard to its function.
  • Hypertext markup links parts of the document to other documents.

XHTML (Extensible HyperText Markup Language) is also a well-known markup language developed by W3C that “has the same depth of expression as HTML, but a stricter syntax. Whereas HTML is an application of SGML, a very flexible markup language, XHTML is an application of XML, a more restrictive subset of SGML” (Wikipedia, 10-02-07).

Due to their necessity of being syntactically correct, XHTML documents allow the using of a standard XML library, while HTML documents do not.

XML (Extensible Markup Language) is a general-purpose markup language recommended by W3C that supports an amount of applications. XML languages are easy to design and process, as they are a simplified subset of SGML. Their main aim is to make easier “the sharing of data across different information systems”, particularly those which are connected to the Internet.

XML-based languages (such as XHTML) “allow diverse software to reliably understand information formatted and passed in these languages” (Wikipedia, 10-02-07).

Permalink Dejar un comentario

Siguiente Página »