What is a corpus?
A corpus can be defined as a collection of text documents. It can be thought as just a bunch of text files in a directory, often alongside many other directories of text files.
How to create wordlist corpus?
WordListCorpusReader class is one of the simplest CorpusReader classes. It
- WordListCorpusReader – It is one of the simplest CorpusReader classes.
- This class provides access to the files that contain list of words or one word per line
- Wordlist file can be a CSV file or a txt file having one word in each line. In our wordlist file
we have added : geeks for geeks welcomes you to nlp articles
- Two arguments to give
- directory path containing the files
- list of filenames
Code #1 : Creating a wordlist corpus
['geeks', 'for', 'geeks', 'welcomes', 'you', 'to', 'nlp', 'articles'] ['C:\\Users\\dell\\Desktop\\wordlist.txt']
Code #2 : Accessing raw.
'geeks\r\nfor\r\ngeeks\r\nwelcomes\r\nyou\r\nto\r\nnlp\r\narticles' Wordlist : ['geeks', 'for', 'geeks', 'welcomes', 'you', 'to', 'nlp', 'articles']
Code #3 : Accessing Name Wordlist corpus
Path : ['female.txt', 'male.txt'] No. of female names : 5001 No. of male names : 2943
Code #4 : Accessing English Wordlist corpus
File : ['en', 'en-basic'] No. of female names : 850 No. of male names : 235886
- NLP | Part of speech tagged - word corpus
- NLP | Categorized Text Corpus
- NLP | Chunking using Corpus Reader
- NLP | Customization Using Tagged Corpus Reader
- NLP | Custom corpus
- NLP | Classifier-based Chunking | Set 2
- Processing text using NLP | Basics
- Readability Index in Python(NLP)
- Feature Extraction Techniques - NLP
- Python | NLP analysis of Restaurant reviews
- Applying Multinomial Naive Bayes to NLP Problems
- NLP | Chunking and chinking with RegEx
- NLP | Training Unigram Tagger
- NLP | Synsets for a word in WordNet
- NLP | Part of Speech - Default Tagging
- NLP | Word Collocations
- NLP | WuPalmer - WordNet Similarity
- NLP | Training a tokenizer and filtering stopwords in a sentence
- NLP | How tokenizing text, sentence, words works
- NLP | Splitting and Merging Chunks
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.