Train a model using Vertex AI and the Python SDK

Vertex AI is an end-to-end, fully managed platform for machine learning and data science. It enables you to use the infrastructure and services of Google Cloud to create, train, implement, and administer machine learning models. A high-level library that assists you in automating data intake, model training, and prediction on Vertex AI is the Vertex AI SDK for Python. Most of the tasks that may be performed programmatically on the Google Cloud terminal can be done using Python code to access the Vertex AI API. In this article, we will learn how to use the Vertex AI SDK for Python to train a model on Vertex AI.

We will cover the following topics:

What are the main components and concepts of Vertex AI
How to install and import the Vertex AI SDK for Python
How to create a dataset and upload data to Vertex AI
How to define a custom training job and run it on Vertex AI
How to deploy the trained model and get predictions on Vertex AI

What are the main components of Vertex AI?

Before utilizing Vertex AI, you must be aware of its many parts and principles. Here are a few of the important ones:

Project: A project on Google Cloud is a container for all of your settings and resources. Before utilizing Vertex AI, you must establish a project and enable the Vertex AI API.
Dataset: A dataset is an assemblage of data intended for use in training or forecasting. On Vertex AI, you can build a variety of dataset types, including tabular, picture, text, video, and custom datasets. Additionally, you may import data from a variety of sources, including local files, Big Query, and Google Cloud Storage.
Training job: A training job is a procedure that uses your dataset to train a machine-learning model. On Vertex AI, you may design several training tasks, including custom, hyperparameter tweaking, and AutoML. For your training task, you may also provide several other factors and variables, such as machine type, region, scale tier, budget, etc.
Model: The result of a training task is a model. It stands for the rules and patterns that your data has taught you. A model can be used to fresh data to make predictions or assess it.
Endpoint: A service that houses one or more prediction models is called an endpoint. To obtain predictions from an endpoint, you may deploy your models there and submit queries to that endpoint. Vertex AI also allows you to manage and keep an eye on your endpoints.
Prediction: When a model is applied to an instance of input data, a prediction is produced. Your models can provide you with forecasts online or off. Sending queries to an endpoint and receiving real-time results is known as online prediction. Batch processing massive volumes of data and storing the findings in files is known as offline prediction.

Step 1: Install the Vertex AI SDK for Python

Installing the Google Cloud-platform package is required to utilize the Vertex AI SDK for Python. This package contains the Vertex AI Python client library in addition to the Vertex AI SDK for Python. A lower-level library that offers more precise control over the Vertex AI API calls is the client library. If necessary, you can utilize both libraries at once.

In your virtual environment, execute the following command to install the google-cloud-platform package:

Optional only if you using vertex ai workbench notebook:

Python

# Setup your dependencies

import os
 
# The Google Cloud Notebook product has specific requirements

IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")
 
USER_FLAG = ""
# Google Cloud Notebook requires dependencies to be installed with '--user'

if IS_GOOGLE_CLOUD_NOTEBOOK:

    USER_FLAG = "--user"

Install the latest version of the Vertex AI client library.

Run the following command in your virtual environment to install the Vertex SDK for Python:

Python

! pip install google-cloud-aiplatform
# if package already installed in your system or notebook run below commands
# Upgrade the specified package to the newest available version
# ! pip install {USER_FLAG} --upgrade google-cloud-aiplatform  
# Upgrade the specified package to the newest available version
# ! pip install {USER_FLAG} --upgrade google-cloud-storage

Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

Python

# Automatically restart kernel after installs

import os
 
if not os.getenv("IS_TESTING"):

    # Automatically restart kernel after installs

    import IPython
 
    app = IPython.Application.instance()

    app.kernel.do_shutdown(True)

Step 2: Setting Up the Environment

Before we dive into the code, we need to set up our GCP project, create a Cloud Storage bucket, and install the necessary Python libraries. Make sure you have the Google Cloud SDK (gcloud) installed and configured with your GCP project.

If you don’t know your project ID, you may be able to get your project ID using gcloud.

Python

import os
 
PROJECT_ID = ""
 
# Get your Google Cloud project ID from gcloud

if not os.getenv("IS_TESTING"):

    shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null

    PROJECT_ID = shell_output[0]

    print("Project ID: ", PROJECT_ID)

When you run the above command it will show your project ID it will look like this :

Project ID: qwiklabs-gcp-04-c846c60XXXX

Copy the project ID and set your project ID here as the environment variable.

Python

if PROJECT_ID == "" or PROJECT_ID is None:

    PROJECT_ID = "qwiklabs-gcp-04-c846b6079446"  # @param {type:"string"}

Create a timestamp for uniqueness:

Python

# Import necessary libraries

from datetime import datetime
 
# Use a timestamp to ensure unique resources

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

Create a Cloud Storage bucket:

Python

! gsutil mb -l $REGION $BUCKET_NAME

Replace REGION and BUCKET_NAME as per your project requirement.

Output :

Creating gs://qwiklabs-gcp-06-c846b60794346aip-20210826051667/...

Finally, validate access to your Cloud Storage bucket by examining its contents:

Python

! gsutil ls -al $BUCKET_NAME

Step 3: Copying Dataset into Cloud Storage

In this step, we’ll copy the dataset from a source location to our Cloud Storage bucket. Replace [your-bucket-name] with your bucket name and [your-dataset-source] with the source URL of your dataset.

Python

IMPORT_FILE = "petfinder-tabular-classification_toy.csv"

! gsutil cp gs://cloud-training/mlongcp/v3.0_MLonGC/pdtrust_toy_datasets/{IMPORT_FILE} {BUCKET_NAME}/data/
 
gcs_source = f"{BUCKET_NAME}/data/{IMPORT_FILE}"

Step 4: Importing the Vertex SDK for Python

We need to import the Vertex SDK and initialize it using our project ID and location:

Python

# Import necessary libraries

import os
 
from google.cloud import aiplatform
 
aiplatform.init(project=PROJECT_ID, location=REGION)

Step 5: Creating a Managed Tabular Dataset

To create a dataset from a CSV file stored in Cloud Storage, use the Vertex SDK:

Python

ds = dataset = aiplatform.TabularDataset.create(

    display_name="petfinder-tabular-dataset",

    gcs_source=gcs_source,
)
 
ds.resource_name

Output:


INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/1075205415941/locations/us-central1/datasets/1945247175768276992/operations/1110822578768838656
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/1075205415941/locations/us-central1/datasets/1945247175768276992
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/1075205415941/locations/us-central1/datasets/1945247175768276992')
'projects/1075205415941/locations/us-central1/datasets/1945247175768276992'

This will create a dataset from a CSV file stored on your GCS bucket.

Step 6: Launching a Training Job

Now, we are ready to create and train our AutoML tabular model:

Python

# Constructs a AutoML Tabular Training Job

job = aiplatform.AutoMLTabularTrainingJob(

    display_name="train-petfinder-automl-1",

    optimization_prediction_type="classification",

    column_transformations=[

        {"categorical": {"column_name": "Type"}},

        {"numeric": {"column_name": "Age"}},

        {"categorical": {"column_name": "Breed1"}},

        {"categorical": {"column_name": "Color1"}},

        {"categorical": {"column_name": "Color2"}},

        {"categorical": {"column_name": "MaturitySize"}},

        {"categorical": {"column_name": "FurLength"}},

        {"categorical": {"column_name": "Vaccinated"}},

        {"categorical": {"column_name": "Sterilized"}},

        {"categorical": {"column_name": "Health"}},

        {"numeric": {"column_name": "Fee"}},

        {"numeric": {"column_name": "PhotoAmt"}},

    ],
)
 
# Create and train the model object
# This will take around two hour and half to run

model = job.run(

    dataset=ds,

    target_column="Adopted",

    # TODO 2b

    # Define training, validation and test fraction for training

    training_fraction_split=0.8,

    validation_fraction_split=0.1,

    test_fraction_split=0.1,

    model_display_name="adopted-prediction-model",

    disable_early_stopping=False,
)

Output:

opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:16: DeprecationWarning: consider using column_specs instead. column_transformations will be deprecated in the future.
  app.launch_new_instance()
INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/1715908841423503360?project=1075205415941
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING

It takes more than 2 hours to complete the training.

Step 7: Deploying the Model

Before making predictions, we need to deploy the model to an endpoint:

Python

# Deploy the model resource to the serving endpoint resource 

endpoint = model.deploy(

    machine_type="e2-standard-4",
)

Output:

/opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/7965582686603444224
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/1075205415941/locations/us-central1/endpoints/7467372802459303936')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/2903536705439006720
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936

Step 8: Making Predictions

With the model deployed, you can now make predictions. Here’s an example of how to send data for prediction. This sample instance is taken from an observation in which Adopted = Yes

Note: Google Cloud-platform: that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your AutoMLTabularTrainingJob inform Vertex AI to transform the inputs to their defined types.

Python

# Make a prediction using the sample values 

prediction = endpoint.predict(

    [

        {

            "Type": "Cat",

            "Age": "3",

            "Breed1": "Tabby",

            "Gender": "Male",

            "Color1": "Black",

            "Color2": "White",

            "MaturitySize": "Small",

            "FurLength": "Short",

            "Vaccinated": "No",

            "Sterilized": "No",

            "Health": "Healthy",

            "Fee": "100",

            "PhotoAmt": "2",

        }

    ]
)
 
print(prediction)

Output:

Prediction(predictions=[{'classes': ['Yes', 'No'], 'scores': [0.527707576751709, 0.4722923934459686]}], deployed_model_id='3521401492231684096', explanations=None)

Step 9: (Optional) Undeploy the model

Python

# Undeploy the model resource 

endpoint.undeploy(deployed_model_id=prediction.deployed_model_id)

Step 10: (Optional) Cleaning up

Python

delete_training_job = True

delete_model = True

delete_endpoint = True
 
# Warning: Setting this to true will delete everything in your bucket

delete_bucket = False
 
# Delete the training job
job.delete()
 
# Delete the model
model.delete()
 
# Delete the endpoint
endpoint.delete()
 
if delete_bucket and "BUCKET_NAME" in globals():

    ! gsutil -m rm -r $BUCKET_NAME

Conclusion

In this article, we learned how to use the Vertex AI SDK for Python to train a model on Vertex AI. We covered the following steps:

How to create a dataset and upload data to Vertex AI
How to define a custom training job and run it on Vertex AI
How to deploy the trained model and get predictions on Vertex AI

We also learned about some of the main components and concepts of Vertex AI, such as project, dataset, training job, model, endpoint, and prediction. We used an image classification example to demonstrate how to use the Vertex AI SDK for Python, but you can apply the same steps to other types of datasets and models as well.

Article Tags :

AI-ML-DS

Data Science

Geeks Premier League

Geeks Premier League 2023

Vertex AI