Scraping And Finding Ordered Words In A Dictionary using Python
Last Updated :
26 Nov, 2018
What are ordered words?
An ordered word is a word in which the letters appear in alphabetic order. For example abbey & dirt. The rest of the words are unordered for example geeks
The task at hand
This task is taken from Rosetta Code and it is not as mundane as it sounds from the above description. To get a large number of words we will use an online dictionary available on http://www.puzzlers.org/pub/wordlists/unixdict.txt which has a collection of about 2,500 words and since we are gonna be using python we can do that by scraping the dictionary instead of downloading it as a text file and then doing some file handling operations on it.
Requirements:
pip install requests
Code
The approach will be to traverse the whole word and compare the ascii values of elements in pairs until we find a false result otherwise the word will be ordered.
So this task will be divided in 2 parts:
Scraping
- Using the python library requests we will fetch the data from the given URL
- Store the content fetched from the URL as a string
- Decoding the content which is usually encoded on the web using UTF-8
- Converting the long string of content into a list of words
Finding the ordered words
- Traversing the list of words
- Pairwise comparison of the ASCII value of every adjacent character in each word
- Storing a false result if a pair is unordered
- Otherwise printing the ordered word
import requests
def getWords():
fetchData = requests.get(url)
wordList = fetchData.content
wordList = wordList.decode( "utf-8" ).split()
return wordList
def isOrdered():
collection = getWords()
collection = collection[ 16 :]
word = ''
for word in collection:
result = 'Word is ordered'
i = 0
l = len (word) - 1
if ( len (word) < 3 ):
continue
while i < l:
if ( ord (word[i]) > ord (word[i + 1 ])):
result = 'Word is not ordered'
break
else :
i + = 1
if (result = = 'Word is ordered' ):
print (word, ': ' ,result)
if __name__ = = '__main__' :
isOrdered()
|
Output:
aau: Word is ordered
abbe: Word is ordered
abbey: Word is ordered
abbot: Word is ordered
abbott: Word is ordered
abc: Word is ordered
abe: Word is ordered
abel: Word is ordered
abet: Word is ordered
abo: Word is ordered
abort: Word is ordered
accent: Word is ordered
accept: Word is ordered
...........................
...........................
...........................
References: Rosetta Code
Share your thoughts in the comments
Please Login to comment...