NLP | Storing Frequency Distribution in Redis

The nltk.probability.FreqDist class is used in many classes throughout NLTK for storing and managing frequency distributions. It’s quite useful, but it’s all in-memory, and doesn’t provide a way to persist the data. A single FreqDist is also not accessible to multiple processes. All that can be changed by building a FreqDist on top of Redis.
What is Redis?

  • Redis is a data structure server that is one of the more popular NoSQL databases.
  • Among other things, it provides a network-accessible database for storing dictionaries (also known as hash maps).
  • Building a FreqDist interface to a Redis hash map will allow us to create a persistent FreqDist that is accessible to multiple local and remote processes at the same time.

Installation :

  • Install both Redis and redis-py. The Redis website is at http://redis.io/ and includes many documentation resources.
  • To use hash maps, install the latest version, which at the time of this writing is 2.8.9.
  • The Redis Python driver, redis-py, can be installed using pip install redis or easy_install redis. The latest version at this time is 2.9.1.
  • The redis-py home page is at http://github.com/andymccurdy/redis-py/.
  • Once both are installed and a redis-server process is running, you’re ready to go. Let’s assume redis-server is running on localhost on port 6379 (the default host and port).

How it works?

  • The FreqDist class extends the standard library collections.Counter class, which makes a FreqDist a small wrapper with a few extra methods, such as N().
  • The N() method returns the number of sample outcomes, which is the sum of all the values in
    the frequency distribution.
  • An API-compatible class is created on top of Redis by extending a RedisHashMapand then implementing the N() method.
  • The RedisHashFreqDist (defined in redisprob.py) sums all the values in the hash map for the N() method

Code : Explaining the working

filter_none

edit
close

play_arrow

link
brightness_4
code

from rediscollections import RedisHashMap
  
class RedisHashFreqDist(RedisHashMap):
    def N(self):
        return int(sum(self.values()))
      
    def __missing__(self, key):
        return 0
      
    def __getitem__(self, key):
        return int(RedisHashMap.__getitem__(self, key) or 0)
      
    def values(self):
        return [int(v) for v in RedisHashMap.values(self)]
      
    def items(self):
        return [(k, int(v)) for (k, v) in RedisHashMap.items(self)]

chevron_right


This class can be used just like a FreqDist. To instantiate it, pass a Redis connection and the name of our hash map. The name should be a unique reference to this particular FreqDist so that it doesn’t clash with any other keys in Redis.



Code:

filter_none

edit
close

play_arrow

link
brightness_4
code

from redis import Redis
from redisprob import RedisHashFreqDist
  
r = Redis()
rhfd = RedisHashFreqDist(r, 'test')
print (len(rhfd))
  
rhfd['foo'] += 1
print (rhfd['foo'])
  
rhfd.items()
print (len(rhfd))

chevron_right


Output :

0
1
1

Most of the work is done in the RedisHashMap class, which extends collections.MutableMapping and then overrides all methods that require Redis-specific commands. Outline of each method that uses a specific Redis command:

  • __len__() : This uses the hlen command to get the number of elements in thehash map
  • __contains__(): This uses the hexists command to check if an element existsin the hash map
  • __getitem__(): This uses the hget command to get a value from the hash map
  • __setitem__(): This uses the hset command to set a value in the hash map
  • __delitem__(): This uses the hdel command to remove a value from thehash map
  • keys(): This uses the hkeys command to get all the keys in the hash map
  • values(): This uses the hvals command to get all the values in the hash map
  • items(): This uses the hgetall command to get a dictionary containing all the keys and values in the hash map
  • clear(): This uses the delete command to remove the entire hash map from Redis

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.