A mobile operator’s guide to natural language processing (NLP)

Written by
GMS

Computer assistance and AI analysis are a growing trend in the communication world. In their 2019 survey of the mobile economy, the GSMA identified artificial intelligence as one of the important emerging technologies for mobile network operators (MNOs).

This takes various forms, but perhaps one of the most interesting and simultaneously less talked about applications is the way that AI can be used to enhance an operator’s core business, through protecting the network itself. 

By preventing spam or finding A2P messages entering the network over grey routes, AI can help MNOs gain a better understanding of their network, improve SMS delivery, and secure themselves against fraud. 

There are two interesting, deeply related technologies involved in this network protection effort: machine learning (ML) and natural language processing (NLP). Previously we looked at how machine learning works in an operator context. 

Here we will look at natural language processing – the way in which machines can be made to process and analyze natural languages – and how it can be used to classify and segregate  messages passing through the network. 

Natural languages 

The basic idea is to have an AI program that can analyze an SMS message and know what kind of content it is. The problem, however, is that AIs often struggle to understand natural languages. 

A natural language is one that has been developed “naturally”, through continual use and development, without conscious planning on the part of those using it. This can be contrasted with designed languages like the formal logic used in philosophy or mathematics, or computer programming languages. 

Most languages spoken by humans – except codes or constructed languages like Esperanto – are natural languages. As a result, a natural language has quirks and irregularities that make it difficult for a computer to recognize, much less understand in a way necessary for us to give a machine verbal or written commands. 

Verbs can be declined in various ways, concepts are not always clearly defined without reference to a specific context, and grammar is rarely straightforward. In computer science, NLP is the attempt to overcome this complexity to make artificial intelligences more useful. 

Examples of natural language processing 

NLP is not a particularly new field. While techniques and technologies have changed, the overall goal – allowing a machine to “read” some text or “hear” a sentence – is almost as old as AI research itself.

It was key to the earliest chatbots, like Eliza, aimed at creating dialogic interfaces for interacting with machines, although Eliza itself was built to mimic the role of a psychotherapist. 

There are, broadly speaking, two kinds of NLP task: syntactic and semantic. 

Syntax describes the grammatical ordering of a sentence. In natural language processing its analysis involves identifying the parts of speech (verb, noun, and so on) and inflected forms of words (recognizing “running” and “ran” as forms of “run”, for example) as well as how these grammatical features work together. It aids interpretation of meaning and sentence function by understanding grammatical rules, roles, and conventions. 

Semantic analysis focuses more directly on extracting the meaning or significance of words, and therefore of whole sentences. It covers things like recognizing nouns denoting “named entities” such as people or places, and identifying positive or negatives sentiments, or even sarcasm. 

Semantic tasks are critical for creating truly useful, modern chatbots, since these techniques are what allow the machine to formulate and express a meaning in response to the linguistic input it receives. (Although, as we shall see, they are not so critical from an operator’s perspective.) 

Example of Amazon Alexa command. Source: Chatbots Magazine

Modern “smart home” devices and mobile assistants like Siri or Google use these techniques to determine what is being asked of them. In their case they don’t so much distinguish between verbs and nouns, as between commands and services (or as it is termed in the example here, the “invocation name”). 

NLP is also used to sort and organize information for security purposes, distinguishing between threats and ordinary correspondence. 

As early as the 1990s a team at America’s NSA was developing NLP-based approaches to try and avoid the agency “drowning in data.” And at least one cybersecurity company is proposing using NLP to prevent corporate data breaches caused by human error and extortion. 

Pattern recognition 

To enhance their security, MNOs do not necessarily need all the functionality of NLP. They are not interested in having a system executing commands from the messages it receives, since their role is to transfer these messages. Nor do mobile operators need to have the system talk to them.

What MNOs require is an ability to identify and classify various types of messages and properly route them accordingly. 

They want their AI to be able to recognize international A2P SMS traffic and terminate it accordingly, avoiding grey routes, and to identify and block spam. (In our previous article we discussed how machine learning can allow a machine to do this in a proactive manner, learning what spam looks like and blocking it or flagging it for review.) 

So, NLP for operators focuses instead on the business of identifying patterns in the messages; having an AI try to understand the content of messages would be a waste of effort, as well as intrusive. 

The approach is simply to sort messages based on patterns found in the data (the text); patterns discovered through analyzing the structure of their content, not the meaning. Consequently, semantic analysis is given less weight in the decision-making process, as are certain forms of syntactic analysis like parsing. 

How it works 

GMS uses NLP and machine learning to enhance network security by scanning messages in two modes. Both use the same techniques – looking for patterns in an SMS message – but in slightly different ways. 

This allows it to provide the operator with state-of-the-art reporting that identifies message types independent of their source, and to proactively assist their protection efforts by making routing and blocking decisions about these messages. 

Every message that gets sent is scanned in what is called Offline mode. This is a deep inspection of a message after it has been delivered, which creates feedback about the type of messages entering the operator’s network from a particular source. 

Discovering that, for example, a message that should be classified as an international A2P SMS is entering the network as P2P allows the MNO to assign and adjust the sender’s Risk Rating. 

A Risk Rating is a gauge of the potential risk that a given sender will handle mislabeled traffic or send spam. The MNO can then take action to correct this behavior as appropriate, for example by working with the sender to discover the original source of the problem, or by blocking them if the sender is found to be the culprit.  

Alongside this is the Inline mode – in which messages from senders deemed high risk are scanned in real time before they are delivered. This delivers actionable information about routing and billing in real time, and perhaps even more importantly, it enables messages that clearly violate network policy to be blocked before delivery. All this happens in less than a fraction of second, meaning there’s no perceptible delay in message transit. 

Customization 

Mobile operators can define the kinds of categories they want an AI to sort messages into. Using guided machine learning the system will learn what kinds of pattern in a text will correspond with the categories it has been given and react according to the rules given to it. 

For example, spam is just another category, and when the system recognizes certain patterns it has learned are associated with spam it can flag those messages for review, or simply block them. 

The same thing applies to fraud, giving operators another tool for protecting their subscribers. Other rules and categories can be defined, as needed by an operator, and the AI can be trained to use these by being shown examples of the type of message in question. 

This flexibility and ability to learn extends to the languages that can be used. Because in this case we are not particularly interested in teaching the machine to understand the messages, all it needs to do is recognize the patterns. This makes the job a lot easier than if one were training the machine to give a coherent response to a question or comment.  

While it comes pre-trained in English and Chinese, almost any language can be taught to the machine in this sense. Given enough examples (about 50,000 unique messages for each category the MNO wants to define) the AI can be trained to recognize the patterns necessary to have “learned” any particular language that the operator’s subscribers and enterprise customers use. 

GMS is committed to finding new methods and technologies to enhance its partners’ networks. Artificial intelligence offers new ways to optimize and improve network security. Get in touch with our experts to find out how GMS can use cutting-edge technology to bring progress to your business. 

Comments

No comments yet... be the first to comment!

Leave a comment

*

Subscribe to our digests

Stay up to date with the latest messaging,
marketing and customer service tips and news





× Merry Christmas and Happy New Year! Play the Video