What is Natural Language Processing?

Natural Language Processing is the branch of Artificial Intelligence which enables machines and humans to interact with each other with a natural language which both can understand. It helps the machine to understand, process and decipher a large amount of data in natural language. Spell Check is a prominent application which uses Natural Language Processing (NLP).

Contents show

Components of NLP

Natural Language Understanding

It is mapping the given input in the natural language into useful representation. Different level of analysis requires:

morphological analysis
syntactic analysis
semantic analysis
discourse analysis

Natural Language Generation

It is producing output in the natural language from some internal representation. Different level of synthesis required:

deep planning (what to say)
syntactic generation

Steps in NLP

Lexical Semantics
Syntactic Analysis
Semantic Analysis
Disclosure Integration

Lexical Semantics – Lemantics is a subfield of linguistic semantics. Lexical semantics determines how the meaning of the lexical units is identical or similar to the language structure or syntax. This is referred to as the syntax-semantic interface. They include not only words but also sub-words or sub-units such as affixes and even compound words and phrases.

Lexical Analysis – The purpose of lexical processing is to determine the meanings of individual words. The basic method is to lookup in a database of meanings – lexicon. Non-words such as punctuation marks should also be identified. Word-level ambiguity implies words may have several meanings, and the correct one cannot be chosen based solely on the word itself. The solution can be to resolve the ambiguity on the spot by POS tagging (if possible) or pass-on the ambiguity to the other levels.

Syntactic Analysis – It is a method of parsing syntax. It analyses natural language with grammar rules. However, grammatical rules are applied to a group of words or phrases and not on an individual word. Semantic Analysis implies assigning meanings to the structures created by syntactic analysis. Semantic can be helpful in determining competing syntactic analyses and eliminating illogical analyses. For example, “I robbed the bank”, bank is a river bank or a financial institution.

Disclosure Integration – It defines the continuity and flow between two different sentences occurring next after the other. The paragraph can only deliver meaningful information if there is continuity between two sentences. Even though the sentences seem to be correct but there is no proper flow between them, then these data can be useless as it cannot deliver any useful information. For example, let us analyze the following group of sentences which talks about a particular context.

a. John went to a store to buy his favorite instrument piano.
b. He just arrived on time as the store was closing for the day.
c. He had visited the store many times.
d. He was thrilled with the feeling that he could finally buy a piano.

In the above four sentences, there is no connection between them but if the same sentences are rearranged it can deliver useful and meaningful information as in

a) John went to a store to buy his favorite instrument piano.
b) He had visited the store many times.
c) He was thrilled with the feeling that he could finally buy a piano.
d) He just arrived on time as the store was closing for the day.

Knowledge representation in Natural Language Processing

It solely depends upon the application as to which NLP representation should be used – Machine Translation or Database Query System. It requires the choice of representational framework, as well as the specific meaning vocabulary (what are concepts and relationship between these concepts — ontology). It must be computationally effective.

Common representational formalisms:

first-order predicate logic
conceptual dependency graphs
semantic networks
Frame-based representations

Why NLP is difficult?

Human language is very complex in itself. It is made of not only words but mixed with emotions and symbols it can correlate different information as a whole. It is a unique signalling system that is difficult to program as a whole. Also, a single sentence can deliver several contextual information which is another challenge of processing NLP. Also, there can be infinite different ways to arrange a sentence which can deliver different or similar meaning.

Techniques to understand a text

Parsing – It is a method to divide a sentence into its components and then describe their roles. It forms a parse tree structure showing how a component is related to each other in a hierarchical manner. This can be helpful in further processing and understanding.

Stemming – It is a process to resolve the root of the word by removing all prefixes and affixes. For example, after stemming the word “riding” it becomes “ride”.

Text Segmentation – It is dividing the text into words, sentences, topics and further. Sometimes it can be a difficult task for some words that can be written differently such as “ice box” and “ice-box”.

Author

Anupama kumari

M.Tech (VLSI Design and Embedded system)

BS Abdur Rahman University

References

1. AI – Natural Language Processing

2. INTRODUCTION TO NLP

3. CHALLENGES OF IMPLEMENTING NATURAL LANGUAGE PROCESSING

4. Introduction to Natural Language Processing

5. Your Guide to Natural Language Processing (NLP)