Open In App

NLP | Storing Frequency Distribution in Redis

The nltk.probability.FreqDist class is used in many classes throughout NLTK for storing and managing frequency distributions. It’s quite useful, but it’s all in-memory, and doesn’t provide a way to persist the data. A single FreqDist is also not accessible to multiple processes. All that can be changed by building a FreqDist on top of Redis.
What is Redis?

Installation :



How it works?

Code : Explaining the working




from rediscollections import RedisHashMap
  
class RedisHashFreqDist(RedisHashMap):
    def N(self):
        return int(sum(self.values()))
      
    def __missing__(self, key):
        return 0
      
    def __getitem__(self, key):
        return int(RedisHashMap.__getitem__(self, key) or 0)
      
    def values(self):
        return [int(v) for v in RedisHashMap.values(self)]
      
    def items(self):
        return [(k, int(v)) for (k, v) in RedisHashMap.items(self)]

This class can be used just like a FreqDist. To instantiate it, pass a Redis connection and the name of our hash map. The name should be a unique reference to this particular FreqDist so that it doesn’t clash with any other keys in Redis.



Code:




from redis import Redis
from redisprob import RedisHashFreqDist
  
r = Redis()
rhfd = RedisHashFreqDist(r, 'test')
print (len(rhfd))
  
rhfd['foo'] += 1
print (rhfd['foo'])
  
rhfd.items()
print (len(rhfd))

Output :

0
1
1

Most of the work is done in the RedisHashMap class, which extends collections.MutableMapping and then overrides all methods that require Redis-specific commands. Outline of each method that uses a specific Redis command:


Article Tags :