Natural Language Processing is the branch of Artificial Intelligence which enables machines and humans to interact with each other with a natural language which both can understand. It helps the machine to understand, process and decipher a large amount of data in natural language. Spell Check is a prominent application which uses Natural Language Processing (NLP).
Components of NLP
Natural Language Understanding
It is mapping the given input in the natural language into useful representation. Different level of analysis requires:
- morphological analysis
- syntactic analysis
- semantic analysis
- discourse analysis
Natural Language Generation
It is producing output in the natural language from some internal representation. Different level of synthesis required:
- deep planning (what to say)
- syntactic generation
Steps in NLP
- Lexical Semantics
- Syntactic Analysis
- Semantic Analysis
- Disclosure Integration
Lexical Semantics – Lemantics is a subfield of linguistic semantics. Lexical semantics determines how the meaning of the lexical units is identical or similar to the language structure or syntax. This is referred to as the syntax-semantic interface. They include not only words but also sub-words or sub-units such as affixes and even compound words and phrases.
Lexical Analysis – The purpose of lexical processing is to determine the meanings of individual words. The basic method is to lookup in a database of meanings – lexicon. Non-words such as punctuation marks should also be identified. Word-level ambiguity implies words may have several meanings, and the correct one cannot be chosen based solely on the word itself. The solution can be to resolve the ambiguity on the spot by POS tagging (if possible) or pass-on the ambiguity to the other levels.
Syntactic Analysis – It is a method of parsing syntax. It analyses natural language with grammar rules. However, grammatical rules are applied to a group of words or phrases and not on an individual word. Semantic Analysis implies assigning meanings to the structures created by syntactic analysis. Semantic can be helpful in determining competing syntactic analyses and eliminating illogical analyses. For example, “I robbed the bank”, bank is a river bank or a financial institution.
Disclosure Integration – It defines the continuity and flow between two different sentences occurring next after the other. The paragraph can only deliver meaningful information if there is continuity between two sentences. Even though the sentences seem to be correct but there is no proper flow between them, then these data can be useless as it cannot deliver any useful information. For example, let us analyze the following group of sentences which talks about a particular context.
a. John went to a store to buy his favorite instrument piano.
b. He just arrived on time as the store was closing for the day.
c. He had visited the store many times.
d. He was thrilled with the feeling that he could finally buy a piano.
In the above four sentences, there is no connection between them but if the same sentences are rearranged it can deliver useful and meaningful information as in
a) John went to a store to buy his favorite instrument piano.
b) He had visited the store many times.
c) He was thrilled with the feeling that he could finally buy a piano.
d) He just arrived on time as the store was closing for the day.
Knowledge representation in Natural Language Processing
It solely depends upon the application as to which NLP representation should be used – Machine Translation or Database Query System. It requires the choice of representational framework, as well as the specific meaning vocabulary (what are concepts and relationship between these concepts — ontology). It must be computationally effective.
Common representational formalisms:
- first-order predicate logic
- conceptual dependency graphs
- semantic networks
- Frame-based representations
Why NLP is difficult?
Human language is very complex in itself. It is made of not only words but mixed with emotions and symbols it can correlate different information as a whole. It is a unique signalling system that is difficult to program as a whole. Also, a single sentence can deliver several contextual information which is another challenge of processing NLP. Also, there can be infinite different ways to arrange a sentence which can deliver different or similar meaning.
Techniques to understand a text
Parsing – It is a method to divide a sentence into its components and then describe their roles. It forms a parse tree structure showing how a component is related to each other in a hierarchical manner. This can be helpful in further processing and understanding.
Stemming – It is a process to resolve the root of the word by removing all prefixes and affixes. For example, after stemming the word “riding” it becomes “ride”.
Text Segmentation – It is dividing the text into words, sentences, topics and further. Sometimes it can be a difficult task for some words that can be written differently such as “ice box” and “ice-box”.
Author
Anupama kumari
M.Tech (VLSI Design and Embedded system)
BS Abdur Rahman University