NLP | Wordlist Corpus
What is a corpus?
A corpus can be defined as a collection of text documents. It can be thought as just a bunch of text files in a directory, often alongside many other directories of text files.
How to create wordlist corpus?
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course
- WordListCorpusReader class is one of the simplest CorpusReader classes. It
- WordListCorpusReader – It is one of the simplest CorpusReader classes.
- This class provides access to the files that contain list of words or one word per line
- Wordlist file can be a CSV file or a txt file having one word in each line. In our wordlist file
we have added : geeks for geeks welcomes you to nlp articles
- Two arguments to give
- directory path containing the files
- list of filenames
Code #1 : Creating a wordlist corpus
['geeks', 'for', 'geeks', 'welcomes', 'you', 'to', 'nlp', 'articles'] ['C:\\Users\\dell\\Desktop\\wordlist.txt']
Code #2 : Accessing raw.
'geeks\r\nfor\r\ngeeks\r\nwelcomes\r\nyou\r\nto\r\nnlp\r\narticles' Wordlist : ['geeks', 'for', 'geeks', 'welcomes', 'you', 'to', 'nlp', 'articles']
Code #3 : Accessing Name Wordlist corpus
Path : ['female.txt', 'male.txt'] No. of female names : 5001 No. of male names : 2943
Code #4 : Accessing English Wordlist corpus
File : ['en', 'en-basic'] No. of female names : 850 No. of male names : 235886