If we have a large number of text data, then one can categorize it to separate sections.
Code #1 : Categorization
['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', 'humor', 'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance', 'science_fiction']
How to do categorize a corpus ?
Easiest way is to have one file for each category. The following are two excerpts from the movie_reviews corpus:
Using these two files, we’ll have two categories – pos and neg.
Code #2 : Let’s categorize
Categorize : ['neg', 'pos'] Negative field : ['movie_neg.txt'] Posiitve field : ['movie_pos.txt']
Code #3 : Instead of cat_pattern, using in a cat_map
Categorize : ['neg', 'pos']
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- NLP | Part of speech tagged - word corpus
- NLP | Chunking using Corpus Reader
- NLP | Customization Using Tagged Corpus Reader
- NLP | Wordlist Corpus
- NLP | Custom corpus
- Processing text using NLP | Basics
- NLP | How tokenizing text, sentence, words works
- NLP | Chunk Tree to Text and Chaining Chunk Transformation
- NLP - Expand contractions in Text Processing
- NLP | Classifier-based Chunking | Set 2
- Readability Index in Python(NLP)
- Feature Extraction Techniques - NLP
- Python | NLP analysis of Restaurant reviews
- Applying Multinomial Naive Bayes to NLP Problems
- NLP | Chunking and chinking with RegEx
- NLP | Training Unigram Tagger
- NLP | Synsets for a word in WordNet
- NLP | Part of Speech - Default Tagging
- NLP | Word Collocations
- NLP | WuPalmer - WordNet Similarity
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.