How we can use Tagged Corpus Reader ?
- Customizing word tokenizer
- Customizing sentence tokenizer
- Customizing paragraph block reader
- Customizing tag separator
- Converting tags to a universal tagset
Code #1 : Customizing word tokenizer
['The', 'expense', 'and', 'time', 'involved', 'are', ...]
Code #2 : For sentence
[['The', 'expense', 'and', 'time', 'involved', 'are', 'astronomical', '.']]
- Assume paragraphs to be split by blank lines
- Done with the para_block_reader function, which is nltk.corpus.reader.util.read_blankline_block
- Number of other block reader are present in nltk.corpus.reader.util, whose purpose is to read blocks of text from a stream.
Customizing Tag seperator
- If ‘/’ is not used as the word/tag separator, one can pass an alternative string to TaggedCorpusReader for sep.
- Default is sep=’/’, but if one wants to split words and tags with ‘|’, such as ‘word|tag’, then sep=’|’ is passed in .
Converting tags to a universal tagset
Tagset : It is a list of POS tags used by one or more corpora.
Universal Tagset : It is a simplified and condensed tagset composed of only 12 part-of-speech tags
Code #3 : map corpus tags to the universal tagset
[('The', 'DET'), ('expense', 'NOUN'), ('and', 'CONJ'), ...]
Code #4 : map corpus tags to the universal tagset
[('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ...] [('Pierre', 'NOUN'), ('Vinken', 'NOUN'), (', ', '.'), …] [('Pierre', 'UNK'), ('Vinken', 'UNK'), (', ', 'UNK'), ...]
- NLP | Part of speech tagged - word corpus
- NLP | Chunking using Corpus Reader
- NLP | Categorized Text Corpus
- NLP | Wordlist Corpus
- NLP | Custom corpus
- Processing text using NLP | Basics
- NLP | Using dateutil to parse dates.
- NLP | Classifier-based Chunking | Set 2
- Readability Index in Python(NLP)
- Feature Extraction Techniques - NLP
- Python | NLP analysis of Restaurant reviews
- Applying Multinomial Naive Bayes to NLP Problems
- NLP | Chunking and chinking with RegEx
- NLP | Training Unigram Tagger
- NLP | Synsets for a word in WordNet
- NLP | Part of Speech - Default Tagging
- NLP | Word Collocations
- NLP | WuPalmer - WordNet Similarity
- NLP | Training a tokenizer and filtering stopwords in a sentence
- NLP | How tokenizing text, sentence, words works
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.