Scraping And Finding Ordered Words In A Dictionary using Python

What are ordered words?

An ordered word is a word in which the letters appear in alphabetic order. For example abbey & dirt. The rest of the words are unordered for example geeks

The task at hand

This task is taken from Rosetta Code and it is not as mundane as it sounds from the above description. To get a large number of words we will use an online dictionary available on http://www.puzzlers.org/pub/wordlists/unixdict.txt which has a collection of about 2,500 words and since we are gonna be using python we can do that by scraping the dictionary instead of downloading it as a text file and then doing some file handling operations on it.

Requirements:

pip install requests

Code

The approach will be to traverse the whole word and compare the ascii values of elements in pairs until we find a false result otherwise the word will be ordered.
So this task will be divided in 2 parts:
Scraping

  1. Using the python library requests we will fetch the data from the given URL
  2. Store the content fetched from the URL as a string
  3. Decoding the content which is usually encoded on the web using UTF-8
  4. Converting the long string of content into a list of words

Finding the ordered words

  1. Traversing the list of words
  2. Pairwise comparison of the ASCII value of every adjacent character in each word
  3. Storing a false result if a pair is unordered
  4. Otherwise printing the ordered word
filter_none

edit
close

play_arrow

link
brightness_4
code

# Python program to find ordered words
import requests
  
# Scrapes the words from the URL below and stores 
# them in a list
def getWords():
  
    # contains about 2500 words
    fetchData = requests.get(url)
  
    # extracts the content of the webpage
    wordList = fetchData.content
  
    # decodes the UTF-8 encoded text and splits the 
    # string to turn it into a list of words
    wordList = wordList.decode("utf-8").split()
  
    return wordList
  
  
# function to determine whether a word is ordered or not
def isOrdered():
  
    # fetching the wordList
    collection = getWords()
  
    # since the first few of the elements of the 
    # dictionary are numbers, getting rid of those
    # numbers by slicing off the first 17 elements
    collection = collection[16:]
    word = ''
  
    for word in collection:
        result = 'Word is ordered'
        i = 0
        l = len(word) - 1
  
        if (len(word) < 3): # skips the 1 and 2 lettered strings
            continue
  
        # traverses through all characters of the word in pairs
        while i < l:         
            if (ord(word[i]) > ord(word[i+1])):
                result = 'Word is not ordered'
                break
            else:
                i += 1
  
        # only printing the ordered words
        if (result == 'Word is ordered'):
            print(word,': ',result)
  
  
# execute isOrdered() function
if __name__ == '__main__':
    isOrdered()

chevron_right


Output:
aau: Word is ordered
abbe: Word is ordered
abbey: Word is ordered
abbot: Word is ordered
abbott: Word is ordered
abc: Word is ordered
abe: Word is ordered
abel: Word is ordered
abet: Word is ordered
abo: Word is ordered
abort: Word is ordered
accent: Word is ordered
accept: Word is ordered
...........................
...........................
...........................

References: Rosetta Code

This article is contributed by Palash Nigam . If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.



My Personal Notes arrow_drop_up