Open In App

Train a model using Vertex AI and the Python SDK

Vertex AI is an end-to-end, fully managed platform for machine learning and data science. It enables you to use the infrastructure and services of Google Cloud to create, train, implement, and administer machine learning models. A high-level library that assists you in automating data intake, model training, and prediction on Vertex AI is the Vertex AI SDK for Python. Most of the tasks that may be performed programmatically on the Google Cloud terminal can be done using Python code to access the Vertex AI API. In this article, we will learn how to use the Vertex AI SDK for Python to train a model on Vertex AI.

We will cover the following topics:



What are the main components of Vertex AI?

Before utilizing Vertex AI, you must be aware of its many parts and principles. Here are a few of the important ones:

Step 1: Install the Vertex AI SDK for Python

Installing the Google Cloud-platform package is required to utilize the Vertex AI SDK for Python. This package contains the Vertex AI Python client library in addition to the Vertex AI SDK for Python. A lower-level library that offers more precise control over the Vertex AI API calls is the client library. If necessary, you can utilize both libraries at once.



In your virtual environment, execute the following command to install the google-cloud-platform package:

Optional only if you using vertex ai workbench notebook:




# Setup your dependencies
import os
 
# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")
 
USER_FLAG = ""
# Google Cloud Notebook requires dependencies to be installed with '--user'
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

Install the latest version of the Vertex AI client library.

Run the following command in your virtual environment to install the Vertex SDK for Python:




! pip install google-cloud-aiplatform
# if package already installed in your system or notebook run below commands
# Upgrade the specified package to the newest available version
# ! pip install {USER_FLAG} --upgrade google-cloud-aiplatform 
# Upgrade the specified package to the newest available version
# ! pip install {USER_FLAG} --upgrade google-cloud-storage

Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.




# Automatically restart kernel after installs
import os
 
if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython
 
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

Step 2: Setting Up the Environment

Before we dive into the code, we need to set up our GCP project, create a Cloud Storage bucket, and install the necessary Python libraries. Make sure you have the Google Cloud SDK (gcloud) installed and configured with your GCP project.

If you don’t know your project ID, you may be able to get your project ID using gcloud.




import os
 
PROJECT_ID = ""
 
# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
    shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID: ", PROJECT_ID)

When you run the above command it will show your project ID it will look like this :

Project ID: qwiklabs-gcp-04-c846c60XXXX

Copy the project ID and set your project ID here as the environment variable.




if PROJECT_ID == "" or PROJECT_ID is None:
    PROJECT_ID = "qwiklabs-gcp-04-c846b6079446"  # @param {type:"string"}

Create a timestamp for uniqueness:




# Import necessary libraries
from datetime import datetime
 
# Use a timestamp to ensure unique resources
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

Create a Cloud Storage bucket:




! gsutil mb -l $REGION $BUCKET_NAME

Replace REGION and BUCKET_NAME as per your project requirement.

Output :

Creating gs://qwiklabs-gcp-06-c846b60794346aip-20210826051667/...

Finally, validate access to your Cloud Storage bucket by examining its contents:




! gsutil ls -al $BUCKET_NAME

Step 3: Copying Dataset into Cloud Storage

In this step, we’ll copy the dataset from a source location to our Cloud Storage bucket. Replace [your-bucket-name] with your bucket name and [your-dataset-source] with the source URL of your dataset.




IMPORT_FILE = "petfinder-tabular-classification_toy.csv"
! gsutil cp gs://cloud-training/mlongcp/v3.0_MLonGC/pdtrust_toy_datasets/{IMPORT_FILE} {BUCKET_NAME}/data/
 
gcs_source = f"{BUCKET_NAME}/data/{IMPORT_FILE}"

Step 4: Importing the Vertex SDK for Python

We need to import the Vertex SDK and initialize it using our project ID and location:




# Import necessary libraries
import os
 
from google.cloud import aiplatform
 
aiplatform.init(project=PROJECT_ID, location=REGION)

Step 5: Creating a Managed Tabular Dataset

To create a dataset from a CSV file stored in Cloud Storage, use the Vertex SDK:




ds = dataset = aiplatform.TabularDataset.create(
    display_name="petfinder-tabular-dataset",
    gcs_source=gcs_source,
)
 
ds.resource_name

Output:


INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/1075205415941/locations/us-central1/datasets/1945247175768276992/operations/1110822578768838656
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/1075205415941/locations/us-central1/datasets/1945247175768276992
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/1075205415941/locations/us-central1/datasets/1945247175768276992')
'projects/1075205415941/locations/us-central1/datasets/1945247175768276992'

This will create a dataset from a CSV file stored on your GCS bucket.

Step 6: Launching a Training Job

Now, we are ready to create and train our AutoML tabular model:




# Constructs a AutoML Tabular Training Job
job = aiplatform.AutoMLTabularTrainingJob(
    display_name="train-petfinder-automl-1",
    optimization_prediction_type="classification",
    column_transformations=[
        {"categorical": {"column_name": "Type"}},
        {"numeric": {"column_name": "Age"}},
        {"categorical": {"column_name": "Breed1"}},
        {"categorical": {"column_name": "Color1"}},
        {"categorical": {"column_name": "Color2"}},
        {"categorical": {"column_name": "MaturitySize"}},
        {"categorical": {"column_name": "FurLength"}},
        {"categorical": {"column_name": "Vaccinated"}},
        {"categorical": {"column_name": "Sterilized"}},
        {"categorical": {"column_name": "Health"}},
        {"numeric": {"column_name": "Fee"}},
        {"numeric": {"column_name": "PhotoAmt"}},
    ],
)
 
 
# Create and train the model object
# This will take around two hour and half to run
model = job.run(
    dataset=ds,
    target_column="Adopted",
    # TODO 2b
    # Define training, validation and test fraction for training
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    model_display_name="adopted-prediction-model",
    disable_early_stopping=False,
)

Output:

opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:16: DeprecationWarning: consider using column_specs instead. column_transformations will be deprecated in the future.
app.launch_new_instance()
INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/1715908841423503360?project=1075205415941
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING

It takes more than 2 hours to complete the training.

Step 7: Deploying the Model

Before making predictions, we need to deploy the model to an endpoint:




# Deploy the model resource to the serving endpoint resource
endpoint = model.deploy(
    machine_type="e2-standard-4",
)

Output:

/opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/7965582686603444224
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/1075205415941/locations/us-central1/endpoints/7467372802459303936')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/2903536705439006720
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936

Step 8: Making Predictions

With the model deployed, you can now make predictions. Here’s an example of how to send data for prediction. This sample instance is taken from an observation in which Adopted = Yes

Note: Google Cloud-platform: that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your AutoMLTabularTrainingJob inform Vertex AI to transform the inputs to their defined types.




# Make a prediction using the sample values
prediction = endpoint.predict(
    [
        {
            "Type": "Cat",
            "Age": "3",
            "Breed1": "Tabby",
            "Gender": "Male",
            "Color1": "Black",
            "Color2": "White",
            "MaturitySize": "Small",
            "FurLength": "Short",
            "Vaccinated": "No",
            "Sterilized": "No",
            "Health": "Healthy",
            "Fee": "100",
            "PhotoAmt": "2",
        }
    ]
)
 
print(prediction)

Output:

Prediction(predictions=[{'classes': ['Yes', 'No'], 'scores': [0.527707576751709, 0.4722923934459686]}], deployed_model_id='3521401492231684096', explanations=None)

Step 9: (Optional) Undeploy the model




# Undeploy the model resource
endpoint.undeploy(deployed_model_id=prediction.deployed_model_id)

Step 10: (Optional) Cleaning up




delete_training_job = True
delete_model = True
delete_endpoint = True
 
# Warning: Setting this to true will delete everything in your bucket
delete_bucket = False
 
# Delete the training job
job.delete()
 
# Delete the model
model.delete()
 
# Delete the endpoint
endpoint.delete()
 
if delete_bucket and "BUCKET_NAME" in globals():
    ! gsutil -m rm -r $BUCKET_NAME

Conclusion

In this article, we learned how to use the Vertex AI SDK for Python to train a model on Vertex AI. We covered the following steps:

We also learned about some of the main components and concepts of Vertex AI, such as project, dataset, training job, model, endpoint, and prediction. We used an image classification example to demonstrate how to use the Vertex AI SDK for Python, but you can apply the same steps to other types of datasets and models as well.


Article Tags :