Before getting into What is NGRAM? we must understand Where NGRAMS are used?
NGRAM is terminologies used in NLP (Natural Language Processing)
What is NLP?
NLP or Natural Language Processing is method of interacting with an intelligent system using natural language.
It can be referred as a amalgam of AI (Artificial Intelligence) and linguistics.
There are two types of languages :
- Natural Language (English, Spanish, Hindi, etc.)
- Artificial Language (Java, Python, C, etc.)
What is NGRAM?
A contiguous or consecutive sequences of items here words in text can be said as NGRAM
Order matters in sequence of words in NGRAM.
The n grams depends on the size of prefix.
Simplest case for NGRAM is 1GRAM or UNIGRAM. Here size of prefix is one and follows Markov assumptions.
It is a probabilistic model based on Markov property to predict next probable word in given sequence.
Why NGRAM is Used?
NGRAM is used to predict the most probable word that might follow in given sequence.
Google search suggestion uses NGRAM to auto complete searches based on previous word entered.
NGRAM model finds its wide application in various email services and text editor or IDE.
Limitations of NGRAM
Model prediction improves significantly with increase in value of N but this causes overhead and requires higher computation power such as High RAM.
NGRAM model is build based on probability of words that are co-occurring. All the words that are new to corpus will be assigned zero probability. This makes NGRAMs a sparse representation of language.
Hope it helps!
Happy Learning 🙂