How to use Vision API from Google Cloud | Set-2

Prerequisite: Create a Virtual Machine and set up an API in Google Cloud

In the previous article we have seen how to use Facial Detection, Logo Detection, Label Detection and Landmark Detection features of Vision using Vision API, now let’s see few more features like Optical Character Recognition, handwritten text detection, Image Properties Detection etc.

Text Detection (Optical Character Recognition):

It detects and extracts text from within an image.

filter_none

edit
close

play_arrow

link
brightness_4
code

import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
from matplotlib import patches as pch
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
     os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
response = client.text_detection(image = image)
texts = response.text_annotations
  
a = plt.imread(f)
fig, ax = plt.subplots(1)
ax.imshow(a)
  
for text in texts:
    print(text.description)
      
    vertices = ([(vertex.x, vertex.y) 
                 for vertex in text.bounding_poly.vertices]) 
    
    print('Vertices covering text: {}\n\n'.format(vertices))
    rect = pch.Rectangle(vertices[0], (vertices[1][0] - vertices[0][0]), 
                        (vertices[2][1] - vertices[0][1]), linewidth = 1
                                       edgecolor ='r', facecolor ='none'
    ax.add_patch(rect)
      
plt.show()

chevron_right


The above code extracts text from a given image and also prints the coordinates of the vertices of the rectangle containing the text.
For example, when the following image is given as input:

Output:

MY MORNING
ROUTINE
How Successful People Start
Every Day Inspired
BENJAMIN SPALL and MICHAEL XANDER

Vertices covering text: [(38, 71), (348, 71), (348, 602), (38, 602)]


MY
Vertices covering text: [(46, 71), (108, 82), (100, 128), (38, 117)]


MORNING
Vertices covering text: [(129, 79), (348, 118), (338, 170), (120, 131)]


ROUTINE
Vertices covering text: [(96, 135), (292, 170), (283, 219), (87, 184)]


How
Vertices covering text: [(68, 200), (101, 205), (98, 221), (65, 216)]


Successful
Vertices covering text: [(104, 207), (196, 222), (193, 238), (101, 224)]


People
Vertices covering text: [(202, 222), (257, 231), (254, 251), (199, 242)]


Start
Vertices covering text: [(265, 232), (311, 239), (309, 255), (262, 248)]


Every
Vertices covering text: [(112, 238), (155, 246), (152, 265), (109, 258)]


Day
Vertices covering text: [(160, 246), (189, 251), (185, 271), (157, 266)]


Inspired
Vertices covering text: [(194, 251), (262, 263), (258, 283), (191, 271)]


BENJAMIN
Vertices covering text: [(57, 534), (118, 546), (115, 561), (54, 549)]


SPALL
Vertices covering text: [(122, 550), (160, 558), (157, 572), (119, 564)]


and
Vertices covering text: [(165, 560), (185, 564), (182, 577), (162, 573)]


MICHAEL
Vertices covering text: [(190, 564), (250, 576), (247, 590), (187, 578)]


XANDER
Vertices covering text: [(254, 575), (311, 587), (308, 602), (251, 591)]

Document/handwritten text detection:

This feature also performs Optical Character Recognition on dense documents, including handwritings.

filter_none

edit
close

play_arrow

link
brightness_4
code

import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
     os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
response = client.document_text_detection(image = image)
  
a = plt.imread(f)
plt.imshow(a)
  
txt = []
for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nConfidence: {}%\n'.format(block.confidence * 100))
            for paragraph in block.paragraphs:
  
                for word in paragraph.words:
                    word_text = ''.join([symbol.text for symbol in word.symbols])
                    txt.append(word_text)
                      
print(txt)

chevron_right


The above code identifies and extracts handwritten texts from an image and outputs it.
For example, when we give the following image as input:

Output:

Block confidence: 97.00000286102295%

['Geeks', 'for', 'Geeks', 'A', 'computer', 'science', 'portal', 'for', 'Geeks', '.']

Image Properties Detection:

This feature detects the general attributes of an image, like dominant color.

filter_none

edit
close

play_arrow

link
brightness_4
code

import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] =
      os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpeg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
  
response = client.image_properties(image = image)
properties = response.image_properties_annotation
  
a = plt.imread(f)
plt.imshow(a)
  
for color in properties.dominant_colors.colors:
        print('fraction: {}'.format(color.pixel_fraction))
        print('\tr: {}'.format(color.color.red))
        print('\tg: {}'.format(color.color.green))
        print('\tb: {}'.format(color.color.blue))

chevron_right


The code takes an image as in input and returns its color properties i.e. amount of red, green and blue colors. For example, when the following image is given as input:

Output:

fraction: 0.036332178860902786
        r: 5.0
        g: 185.0
        b: 6.0
fraction: 0.03337658569216728
        r: 131.0
        g: 207.0
        b: 13.0
fraction: 0.029988465830683708
        r: 253.0
        g: 169.0
        b: 5.0
fraction: 0.0262399073690176
        r: 254.0
        g: 123.0
        b: 5.0
fraction: 0.03553921729326248
        r: 253.0
        g: 248.0
        b: 12.0
fraction: 0.02104959636926651
        r: 249.0
        g: 36.0
        b: 6.0
fraction: 0.024581892415881157
        r: 3.0
        g: 35.0
        b: 188.0
fraction: 0.03424163907766342
        r: 6.0
        g: 122.0
        b: 200.0
fraction: 0.027032872661948204
        r: 140.0
        g: 32.0
        b: 185.0
fraction: 0.029411764815449715
        r: 10.0
        g: 177.0
        b: 217.0

Safe Search Properties Detection:

Detects explicit content such as adult content or violent content within an image. This feature uses five categories (“adult”, “spoof”, “medical”, “violence”, and “racy”) and returns the likelihood that each is present in a given image.

filter_none

edit
close

play_arrow

link
brightness_4
code

import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
    os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
  
a = plt.imread(f)
plt.imshow(a)
  
response = client.safe_search_detection(image = image)
safe = response.safe_search_annotation
  
likelihood_name = ('UNKNOWN', 'VERY_UNLIKELY', 'UNLIKELY',
                   'POSSIBLE', 'LIKELY', 'VERY_LIKELY')
  
print('Adult: {}'.format(likelihood_name[safe.adult]))
print('Medical: {}'.format(likelihood_name[safe.medical]))
print('Spoofed: {}'.format(likelihood_name[safe.spoof]))
print('Violence: {}'.format(likelihood_name[safe.violence]))
print('Racy: {}'.format(likelihood_name[safe.racy]))

chevron_right


Given an image, the code will determine the probability of it being an image with graphics or adult content.

Object Detection:

Detects and extracts multiple objects from an image. It localizes multiple objects and returns their coordinates.

filter_none

edit
close

play_arrow

link
brightness_4
code

import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
      os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
  
a = plt.imread(f)
plt.imshow(a)
  
response = client.object_localization(image = image)
objects = response.localized_object_annotations
  
print('Number of objects found: ', len(objects))
for object_ in objects:
    print('Object: ', object_.name)
    print('Confidence: ', object_.score)

chevron_right


For example, when we input the following image:

Output:

Number of objects found:  1
Object:  Scissors
Confidence:  0.540185272693634

For more information, visit Cloud Vision API documentation here.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.