Every linguist has probably at some time had a conversation similar to the following. Somebody asks them: “so, what do you do for a living?” and they answer “I’m a linguist” without an extended explanation.
Then a somewhat puzzled look appears in the face of the person who asked. Right, linguists deal with languages, so the next question goes something like this: “so do you speak a lot of languages?”.
It’s linguistics, not language
While this comes in handy and is many times the case, it’s not always true (ever heard of Chomsky?) because the focus of linguists is language as a system and as a human ability, rather than specific languages such as Japanese, French or English.
It’s also worth mentioning that linguists are not grammar vigilantes either (actually, quite the opposite). Linguistics as a discipline aims to answer questions like “what are the common features of all human languages?”, “what is the structure of language?”, “why is it so easy to learn a language when you are a child?”, “how do languages change through the years and why?” or “how can we model linguistic phenomena?”.
Human-machine communication
Language is present in almost all human activities and so linguistics naturally aligns itself with a lot of different disciplines, like philosophy, psychology, sociology, biology, computer science and even crime investigation. So, what does a linguist actually do for a living? Their job can be extremely broad depending on their specialization; from gathering linguistic data from long lost tribes to finding linguistic features that can help a machine understand a sentence. Tasks like the last one occupy our linguists at Verbio.
For a long time, humans have dreamed of being able to communicate with computers in the same way we do with each other. Over time, human-machine communication has proved to be a harder problem than it was originally thought to be. Mastering a language is not an easy task and this shouldn’t come as a surprise. After all, human language is a wonder: we place together a series of arbitrary sounds, enabling us to communicate our thoughts about things that don’t even exist. How is this even possible?
For a machine to be able to communicate with humans using natural language understanding (NLU), we need to come up with strategies that address this question. Some linguistic theories propose that humans come endowed with a specific ability that enables us to “instinctively” learn languages when we are children.
Machines certainly don’t have such a mechanism or inherent ability to learn. They need to be taught every rule. So, how do linguists keep track of them all?
Machine linguist: training the model with natural language understanding (NLU)
Machines learn models from data, which are samples of reality. Modeling language presents many challenges. For example, there is no fixed inventory of all the possible utterances we can produce or understand. Therefore, we need to break them down into a limited set of meaningful units (words, for example) and feed this information to a computer. However, knowing a list of words does not amount to knowing a language.
If we’d like to interact with a computer in the way we interact with people, we need computers to understand meaning and be able to reason. Typically, humans interact with computers through commands. Many issues that we need to overcome are related to ambiguity (e.g. “Remind me to call my dad at 6”), semantics and pragmatics (e.g. “I won’t do it” may be equivalent to “in your dreams” in some contexts). Also, we need to incorporate common sense to perform inferences. And we would like the computer to go beyond, understanding the emotions transmitted in our utterances and summarizing a document for us, for example. In any language.
Verbio’s approach to human machine communication
Linguists at Verbio work towards improving interactions with computers using natural language in several areas: voice synthesis, speech recognition, voice biometrics and semantics, and reasoning (cognitive system). One of the key tasks that linguists perform, common to all those areas, is to gather and curate linguistic data. As the old saying goes, “a model is only as good as the data fed into it” and the advent of the deep learning paradigm has made this piece of wisdom more current than ever. Therefore, corpora from real interactions are carefully collected and annotated. These corpora are constantly reviewed and updated according to the user needs. They may be not texts but recordings, and, in that case, linguists coordinate transcriptions and decide how to deal with issues such as noise or truncated words. Linguists are also in charge of adapting transcription data in order to be able to use them for several models. Another important common resource are dictionaries of several types. Linguists develop and curate them, resolving issues with problematic terms such as neologisms, foreign terms and acronyms, among others.
Besides corpora and dictionaries, other types of tools that the linguists develop are ABNF and BNF grammars and regex, which are used to capture relevant pieces of specific data. Technologies such as voice synthesis require our linguists to develop modules to normalize and transcribe utterances as well as modules that deal with intonation and phoneme length among other phenomena. For every module, tool or functionality linguists perform tests that control the quality of the output. Their responsibilities also include other types of tasks such as helping casting talents for voice synthesis and designing dialogues for virtual assistants.
We have mentioned that one of the obstacles for the interactions with computers in natural language is getting them to understand what we actually mean. A number of the tasks that linguists perform at Verbio deal more directly with this issue. They train statistical models so a machine can understand user intentions, even if they are expressed using sentences that are not included in the training sample. To do so, they carefully adjust linguistic variation in the data. In addition to that, they create ontologies that organize the knowledge associated with a specific domain and enable our cognitive system to perform inferences over it. Besides, they also work in other meaning-related issues such as topic extraction and automatic terminology induction.
Linguists at Verbio continuously research and test new technologies and tools that enable us to tackle the challenges that natural language poses. There’s still some way to go until human / machine interaction is a smooth process, but linguists use their knowledge and expertise to make sure that every stone that is put in the path is solid and secure to create machine linguist solutions.