Vertex AI is an end-to-end, fully managed platform for machine learning and data science. It enables you to use the infrastructure and services of Google Cloud to create, train, implement, and administer machine learning models. A high-level library that assists you in automating data intake, model training, and prediction on Vertex AI is the Vertex AI SDK for Python. Most of the tasks that may be performed programmatically on the Google Cloud terminal can be done using Python code to access the Vertex AI API. In this article, we will learn how to use the Vertex AI SDK for Python to train a model on Vertex AI.
We will cover the following topics:
- What are the main components and concepts of Vertex AI
- How to install and import the Vertex AI SDK for Python
- How to create a dataset and upload data to Vertex AI
- How to define a custom training job and run it on Vertex AI
- How to deploy the trained model and get predictions on Vertex AI
What are the main components of Vertex AI?
Before utilizing Vertex AI, you must be aware of its many parts and principles. Here are a few of the important ones:
- Project: A project on Google Cloud is a container for all of your settings and resources. Before utilizing Vertex AI, you must establish a project and enable the Vertex AI API.
- Dataset: A dataset is an assemblage of data intended for use in training or forecasting. On Vertex AI, you can build a variety of dataset types, including tabular, picture, text, video, and custom datasets. Additionally, you may import data from a variety of sources, including local files, Big Query, and Google Cloud Storage.
- Training job: A training job is a procedure that uses your dataset to train a machine-learning model. On Vertex AI, you may design several training tasks, including custom, hyperparameter tweaking, and AutoML. For your training task, you may also provide several other factors and variables, such as machine type, region, scale tier, budget, etc.
- Model: The result of a training task is a model. It stands for the rules and patterns that your data has taught you. A model can be used to fresh data to make predictions or assess it.
- Endpoint: A service that houses one or more prediction models is called an endpoint. To obtain predictions from an endpoint, you may deploy your models there and submit queries to that endpoint. Vertex AI also allows you to manage and keep an eye on your endpoints.
- Prediction: When a model is applied to an instance of input data, a prediction is produced. Your models can provide you with forecasts online or off. Sending queries to an endpoint and receiving real-time results is known as online prediction. Batch processing massive volumes of data and storing the findings in files is known as offline prediction.
Step 1: Install the Vertex AI SDK for Python
Installing the Google Cloud-platform package is required to utilize the Vertex AI SDK for Python. This package contains the Vertex AI Python client library in addition to the Vertex AI SDK for Python. A lower-level library that offers more precise control over the Vertex AI API calls is the client library. If necessary, you can utilize both libraries at once.
In your virtual environment, execute the following command to install the google-cloud-platform package:
Optional only if you using vertex ai workbench notebook:
# Setup your dependencies import os
# The Google Cloud Notebook product has specific requirements IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists( "/opt/deeplearning/metadata/env_version" )
USER_FLAG = ""
# Google Cloud Notebook requires dependencies to be installed with '--user' if IS_GOOGLE_CLOUD_NOTEBOOK:
USER_FLAG = "--user"
|
Install the latest version of the Vertex AI client library.
Run the following command in your virtual environment to install the Vertex SDK for Python:
! pip install google - cloud - aiplatform
# if package already installed in your system or notebook run below commands # Upgrade the specified package to the newest available version # ! pip install {USER_FLAG} --upgrade google-cloud-aiplatform # Upgrade the specified package to the newest available version # ! pip install {USER_FLAG} --upgrade google-cloud-storage |
Restart the kernel
After you install the additional packages, you need to restart the notebook kernel so it can find the packages.
# Automatically restart kernel after installs import os
if not os.getenv( "IS_TESTING" ):
# Automatically restart kernel after installs
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown( True )
|
Step 2: Setting Up the Environment
Before we dive into the code, we need to set up our GCP project, create a Cloud Storage bucket, and install the necessary Python libraries. Make sure you have the Google Cloud SDK (gcloud) installed and configured with your GCP project.
If you don’t know your project ID, you may be able to get your project ID using gcloud.
import os
PROJECT_ID = ""
# Get your Google Cloud project ID from gcloud if not os.getenv( "IS_TESTING" ):
shell_output = !gcloud config list - - format 'value(core.project)' 2 > / dev / null
PROJECT_ID = shell_output[ 0 ]
print ( "Project ID: " , PROJECT_ID)
|
When you run the above command it will show your project ID it will look like this :
Project ID: qwiklabs-gcp-04-c846c60XXXX
Copy the project ID and set your project ID here as the environment variable.
if PROJECT_ID = = "" or PROJECT_ID is None :
PROJECT_ID = "qwiklabs-gcp-04-c846b6079446" # @param {type:"string"}
|
Create a timestamp for uniqueness:
# Import necessary libraries from datetime import datetime
# Use a timestamp to ensure unique resources TIMESTAMP = datetime.now().strftime( "%Y%m%d%H%M%S" )
|
Create a Cloud Storage bucket:
! gsutil mb - l $REGION $BUCKET_NAME
|
Replace REGION and BUCKET_NAME as per your project requirement.
Output :
Creating gs://qwiklabs-gcp-06-c846b60794346aip-20210826051667/...
Finally, validate access to your Cloud Storage bucket by examining its contents:
! gsutil ls - al $BUCKET_NAME
|
Step 3: Copying Dataset into Cloud Storage
In this step, we’ll copy the dataset from a source location to our Cloud Storage bucket. Replace [your-bucket-name] with your bucket name and [your-dataset-source] with the source URL of your dataset.
IMPORT_FILE = "petfinder-tabular-classification_toy.csv"
! gsutil cp gs: / / cloud - training / mlongcp / v3. 0_MLonGC / pdtrust_toy_datasets / {IMPORT_FILE} {BUCKET_NAME} / data /
gcs_source = f "{BUCKET_NAME}/data/{IMPORT_FILE}"
|
Step 4: Importing the Vertex SDK for Python
We need to import the Vertex SDK and initialize it using our project ID and location:
# Import necessary libraries import os
from google.cloud import aiplatform
aiplatform.init(project = PROJECT_ID, location = REGION)
|
Step 5: Creating a Managed Tabular Dataset
To create a dataset from a CSV file stored in Cloud Storage, use the Vertex SDK:
ds = dataset = aiplatform.TabularDataset.create(
display_name = "petfinder-tabular-dataset" ,
gcs_source = gcs_source,
) ds.resource_name |
Output:
INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/1075205415941/locations/us-central1/datasets/1945247175768276992/operations/1110822578768838656
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/1075205415941/locations/us-central1/datasets/1945247175768276992
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/1075205415941/locations/us-central1/datasets/1945247175768276992')
'projects/1075205415941/locations/us-central1/datasets/1945247175768276992'
This will create a dataset from a CSV file stored on your GCS bucket.
Step 6: Launching a Training Job
Now, we are ready to create and train our AutoML tabular model:
# Constructs a AutoML Tabular Training Job job = aiplatform.AutoMLTabularTrainingJob(
display_name = "train-petfinder-automl-1" ,
optimization_prediction_type = "classification" ,
column_transformations = [
{ "categorical" : { "column_name" : "Type" }},
{ "numeric" : { "column_name" : "Age" }},
{ "categorical" : { "column_name" : "Breed1" }},
{ "categorical" : { "column_name" : "Color1" }},
{ "categorical" : { "column_name" : "Color2" }},
{ "categorical" : { "column_name" : "MaturitySize" }},
{ "categorical" : { "column_name" : "FurLength" }},
{ "categorical" : { "column_name" : "Vaccinated" }},
{ "categorical" : { "column_name" : "Sterilized" }},
{ "categorical" : { "column_name" : "Health" }},
{ "numeric" : { "column_name" : "Fee" }},
{ "numeric" : { "column_name" : "PhotoAmt" }},
],
) # Create and train the model object # This will take around two hour and half to run model = job.run(
dataset = ds,
target_column = "Adopted" ,
# TODO 2b
# Define training, validation and test fraction for training
training_fraction_split = 0.8 ,
validation_fraction_split = 0.1 ,
test_fraction_split = 0.1 ,
model_display_name = "adopted-prediction-model" ,
disable_early_stopping = False ,
) |
Output:
opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:16: DeprecationWarning: consider using column_specs instead. column_transformations will be deprecated in the future.
app.launch_new_instance()
INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/1715908841423503360?project=1075205415941
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/1075205415941/locations/us-central1/trainingPipelines/1715908841423503360 current state:
PipelineState.PIPELINE_STATE_RUNNING
It takes more than 2 hours to complete the training.
Step 7: Deploying the Model
Before making predictions, we need to deploy the model to an endpoint:
# Deploy the model resource to the serving endpoint resource endpoint = model.deploy(
machine_type = "e2-standard-4" ,
) |
Output:
/opt/conda/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/7965582686603444224
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/1075205415941/locations/us-central1/endpoints/7467372802459303936')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936/operations/2903536705439006720
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/1075205415941/locations/us-central1/endpoints/7467372802459303936
Step 8: Making Predictions
With the model deployed, you can now make predictions. Here’s an example of how to send data for prediction. This sample instance is taken from an observation in which Adopted = Yes
Note: Google Cloud-platform: that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your AutoMLTabularTrainingJob inform Vertex AI to transform the inputs to their defined types.
# Make a prediction using the sample values prediction = endpoint.predict(
[
{
"Type" : "Cat" ,
"Age" : "3" ,
"Breed1" : "Tabby" ,
"Gender" : "Male" ,
"Color1" : "Black" ,
"Color2" : "White" ,
"MaturitySize" : "Small" ,
"FurLength" : "Short" ,
"Vaccinated" : "No" ,
"Sterilized" : "No" ,
"Health" : "Healthy" ,
"Fee" : "100" ,
"PhotoAmt" : "2" ,
}
]
) print (prediction)
|
Output:
Prediction(predictions=[{'classes': ['Yes', 'No'], 'scores': [0.527707576751709, 0.4722923934459686]}], deployed_model_id='3521401492231684096', explanations=None)
Step 9: (Optional) Undeploy the model
# Undeploy the model resource endpoint.undeploy(deployed_model_id = prediction.deployed_model_id)
|
Step 10: (Optional) Cleaning up
delete_training_job = True
delete_model = True
delete_endpoint = True
# Warning: Setting this to true will delete everything in your bucket delete_bucket = False
# Delete the training job job.delete() # Delete the model model.delete() # Delete the endpoint endpoint.delete() if delete_bucket and "BUCKET_NAME" in globals ():
! gsutil - m rm - r $BUCKET_NAME
|
Conclusion
In this article, we learned how to use the Vertex AI SDK for Python to train a model on Vertex AI. We covered the following steps:
- How to create a dataset and upload data to Vertex AI
- How to define a custom training job and run it on Vertex AI
- How to deploy the trained model and get predictions on Vertex AI
We also learned about some of the main components and concepts of Vertex AI, such as project, dataset, training job, model, endpoint, and prediction. We used an image classification example to demonstrate how to use the Vertex AI SDK for Python, but you can apply the same steps to other types of datasets and models as well.