Skip to content

Tag Archives: Natural-language-processing

What is a corpus? A corpus can be defined as a collection of text documents. It can be thought as just a bunch of text… Read More
What is a corpus? A corpus can be defined as a collection of text documents. It can be thought as just a bunch of text… Read More
How we can use Tagged Corpus Reader ?   Customizing word tokenizer Customizing sentence tokenizer Customizing paragraph block reader Customizing tag separator Converting tags to a… Read More
What are Chunks? These are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern… Read More
What are Chunks ? Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a… Read More
If we have a large number of text data, then one can categorize it to separate sections.  Code #1 : Categorization   Python3 # Loading brown… Read More
Whats is Part-of-speech (POS) tagging ? It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple… Read More
Path-based Similarity: It is a similarity measure that finds the distance that is the length of the shortest path between two synsets. Leacock Chordorow (LCH)… Read More
RegexpParser or RegexpChunkRule.fromstring() doesn’t support all the RegexpChunkRule classes. So, we need to create them manually. This article focusses on 3 of such classes :… Read More
Natural Language Toolkit (NLTK) is a platform used for building programs for text analysis. We can observe that male and female names have some distinctive… Read More
WordNet is the lexical database i.e. dictionary for the English language, specifically designed for natural language processing. Synset is a special kind of a simple… Read More
Why do we need to train a sentence tokenizer? In NLTK, default sentence tokenizer works for the general purpose and it works very well. But… Read More
A single token is referred to as a Unigram, for example – hello; movie; coding. This article is focussed on unigram tagger. Unigram Tagger: For… Read More
Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence (tagged with Part-of-Speech). Chunks are made up of words and… Read More
How Wu & Palmer Similarity works ? It calculates relatedness by considering the depths of the two synsets in the WordNet taxonomies, along with the depth… Read More

Start Your Coding Journey Now!