Open In App

Scraping And Finding Ordered Words In A Dictionary using Python

Improve
Improve
Like Article
Like
Save
Share
Report

What are ordered words?

An ordered word is a word in which the letters appear in alphabetic order. For example abbey & dirt. The rest of the words are unordered for example geeks

The task at hand

This task is taken from Rosetta Code and it is not as mundane as it sounds from the above description. To get a large number of words we will use an online dictionary available on http://www.puzzlers.org/pub/wordlists/unixdict.txt which has a collection of about 2,500 words and since we are gonna be using python we can do that by scraping the dictionary instead of downloading it as a text file and then doing some file handling operations on it.

Requirements:

pip install requests

Code

The approach will be to traverse the whole word and compare the ascii values of elements in pairs until we find a false result otherwise the word will be ordered.
So this task will be divided in 2 parts:
Scraping

  1. Using the python library requests we will fetch the data from the given URL
  2. Store the content fetched from the URL as a string
  3. Decoding the content which is usually encoded on the web using UTF-8
  4. Converting the long string of content into a list of words

Finding the ordered words

  1. Traversing the list of words
  2. Pairwise comparison of the ASCII value of every adjacent character in each word
  3. Storing a false result if a pair is unordered
  4. Otherwise printing the ordered word




# Python program to find ordered words
import requests
  
# Scrapes the words from the URL below and stores 
# them in a list
def getWords():
  
    # contains about 2500 words
    fetchData = requests.get(url)
  
    # extracts the content of the webpage
    wordList = fetchData.content
  
    # decodes the UTF-8 encoded text and splits the 
    # string to turn it into a list of words
    wordList = wordList.decode("utf-8").split()
  
    return wordList
  
  
# function to determine whether a word is ordered or not
def isOrdered():
  
    # fetching the wordList
    collection = getWords()
  
    # since the first few of the elements of the 
    # dictionary are numbers, getting rid of those
    # numbers by slicing off the first 17 elements
    collection = collection[16:]
    word = ''
  
    for word in collection:
        result = 'Word is ordered'
        i = 0
        l = len(word) - 1
  
        if (len(word) < 3): # skips the 1 and 2 lettered strings
            continue
  
        # traverses through all characters of the word in pairs
        while i < l:         
            if (ord(word[i]) > ord(word[i+1])):
                result = 'Word is not ordered'
                break
            else:
                i += 1
  
        # only printing the ordered words
        if (result == 'Word is ordered'):
            print(word,': ',result)
  
  
# execute isOrdered() function
if __name__ == '__main__':
    isOrdered()


Output:
aau: Word is ordered
abbe: Word is ordered
abbey: Word is ordered
abbot: Word is ordered
abbott: Word is ordered
abc: Word is ordered
abe: Word is ordered
abel: Word is ordered
abet: Word is ordered
abo: Word is ordered
abort: Word is ordered
accent: Word is ordered
accept: Word is ordered
...........................
...........................
...........................

References: Rosetta Code



Last Updated : 26 Nov, 2018
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads