Regular expression matching is used to tag words. Consider the example, numbers can be matched with \d to assign the tag CD (which refers to a Cardinal number). Or one can match the known word patterns, such as the suffix “ing”.
Understanding the concept –
RegexpTagger is a subclass of SequentialBackoffTagger. It can be positioned before a
DefaultTagger classso as to tag words that the n-gram tagger(s) missed and thus can be a useful part of a backoff chain.
- At initialization, patterns are saved in
choose_tag()is then called, it iterates over the patterns. Then, it returns the first expression tag that can match the current word using re.match().
- So, if the two given expressions get matched, then the tag of the first one will be returned without even trying the second expression.
- If the given pattern is like – (r’.*’, ‘NN’), RegexpTagger class can replace the
Code #1 : Python regular expression module and re syntax
RegexpTagger class expects a list of two tuples
-> first element in the tuple is a regular expression -> second element is the tag
Code #2 : Using RegexpTagger
Accuracy : 0.037470321605870924
What is Affix tagging ?
It is a subclass of ContextTagger. In the case of AffixTagger class, the context is either the suffix or the prefix of a word. So, it clearly indicates that this class can learn tags based on fixed-length substrings of the beginning or end of a word.
It specifies the three-character suffixes. That words must be at least 5 characters long and None is returned as the tag if a word is less than five character.
Code #3 : Understanding AffixTagger.
Train data : [('Mr.', 'NNP'), ('Vinken', 'NNP'), ('is', 'VBZ'), ('chairman', 'NN'), ('of', 'IN'), ('Elsevier', 'NNP'), ('N.V.', 'NNP'), (', ', ', '), ('the', 'DT'), ('Dutch', 'NNP'), ('publishing', 'VBG'), ('group', 'NN'), ('.', '.')] Accuracy : 0.27558817181092166
Code #4 : AffixTagger by specifying 3 character prefixes.
Accuracy : 0.23587308439456076
Code #5 : AffixTagger by specifying 2-character suffixes
Accuracy : 0.31940427368875457
- NLP | Part of Speech - Default Tagging
- NLP | Backoff Tagging to combine taggers
- NLP | Classifier-based tagging
- NLP | Trigrams'n'Tags (TnT) Tagging
- NLP | WordNet for tagging
- NLP | Distributed Tagging with Execnet - Part 1
- NLP | Distributed Tagging with Execnet - Part 2
- NLP | Chunking and chinking with RegEx
- NLP | Expanding and Removing Chunks with RegEx
- NLP | Partial parsing with Regex
- Python | PoS Tagging and Lemmatization using spaCy
- Part of Speech Tagging with Stop words using NLTK in python
- Python | Part of Speech Tagging using TextBlob
- NLP | Training a tokenizer and filtering stopwords in a sentence
- NLP | Splitting and Merging Chunks
- NLP | Leacock Chordorow (LCH) and Path similarity for Synset
- NLP | Swapping Verb Phrases and Noun Cardinals
- NLP | Singularizing Plural Nouns and Swapping Infinite Phrases
- NLP | Chunk Tree to Text and Chaining Chunk Transformation
- NLP | How to score words with Execnet and Redis
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : shubham_singh