Open In App

NLP | Storing Conditional Frequency Distribution in Redis

Last Updated : 12 Jun, 2019
Improve
Improve
Like Article
Like
Save
Share
Report

The nltk.probability.ConditionalFreqDist class is a container for FreqDist instances, with one FreqDist per condition. It is used to count frequencies that are dependent on another condition, such as another word or a class label. It is being used here to create an API-compatible class on top of Redis using the RedisHashFreqDist .
In the code given below, a RedisConditionalHashFreqDist class that extends nltk.probability.ConditionalFreqDist and overrides the __getitem__() method. Override __getitem__() so as to create an instance of RedisHashFreqDist instead of a FreqDist.

Code :




from nltk.probability import ConditionalFreqDist
from rediscollections import encode_key
  
class RedisConditionalHashFreqDist(ConditionalFreqDist):
    def __init__(self, r, name, cond_samples = None):
        self._r = r
        self._name = name
        ConditionalFreqDist.__init__(self, cond_samples)
          
        for key in self._r.keys(encode_key('% s:*' % name)):
            condition = key.split(':')[1]
            # calls self.__getitem__(condition)
            self[condition] 
              
    def __getitem__(self, condition):
        if condition not in self._fdists:
            key = '% s:% s' % (self._name, condition)
            val = RedisHashFreqDist(self._r, key)
            super(RedisConditionalHashFreqDist, self).__setitem__(
                    condition, val)
        return super(
                RedisConditionalHashFreqDist, self).__getitem__(condition)
    def clear(self):
        for fdist in self.values():
            fdist.clear()


An instance of this class can be created by passing in a Redis connection and a base name. After that, it works just like a ConditionalFreqDist as shown in the code below :
Code :




from redis import Redis
from redisprob import RedisConditionalHashFreqDist
  
r = Redis()
rchfd = RedisConditionalHashFreqDist(r, 'condhash')
  
print (rchfd.N())
  
print (rchfd.conditions())
  
rchfd['cond1']['foo'] += 1
  
print (rchfd.N())
  
print (rchfd['cond1']['foo'])
  
print (rchfd.conditions())
  
rchfd.clear()



Output :

0
[]
1
1
['cond1']

How it works ?

  • The RedisConditionalHashFreqDist uses name prefixes to reference RedisHashFreqDist instances.
  • The name passed into the RedisConditionalHashFreqDist is a base name that is combined with each condition to create a unique name for each RedisHashFreqDist.
  • For example, if the base name of the RedisConditionalHashFreqDist is ‘condhash’, and the condition is ‘cond1’, then the final name for the RedisHashFreqDist is ‘condhash:cond1’.
  • This naming pattern is used at initialization to find all the existing hash maps using the keys command.
  • By searching for all keys matching ‘condhash:*’, user can identify all the existing conditions and create an instance of RedisHashFreqDist for each.
  • Combining strings with colons is a common naming convention for Redis keys as a way to define namespaces.
  • Each RedisConditionalHashFreqDist instance defines a single namespace of hash maps.

RedisConditionalHashFreqDist also defines a clear() method. This is a helper method that calls clear() on all the internal RedisHashFreqDist instances. The clear() method is not defined in ConditionalFreqDist.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads