Open In App

Create a Vertex AI tabular dataset

Last Updated : 25 Sep, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

A Google Cloud service called Vertex AI enables you to create, use, and manage machine learning models. Vertex AI’s AutoML function automatically trains and fine-tunes models for a variety of data sources and activities, including tabular data and categorization. In this article, we will explore how to create a tabular dataset on the Vertex AI platform.

Role of Tabular Dataset

Numerous rows of data make up tabular data. The columns or characteristics are the same in every row. Depending on the data source (BigQuery or a CSV file in Cloud Storage), each feature has a different data source type. Vertex AI analyzes the source data type and feature values when you use the data to train a model and then predicts how it will utilize that feature. The alteration of that characteristic is what is meant by this. Any feature can have a distinct supported transformation specified if necessary.

Assuring that your problem is well described and capable of producing the necessary prediction outcomes is the first step in producing successful tabular training data. You will employ a categorization model for this lab that examines your tabular data and produces a list of categories that adequately represent the data. For instance, you might train a model to determine whether or not a customer’s buying history indicates that they would purchase a subscription.

You must construct a Vertex AI dataset with your training data and the target column specified to utilize AutoML on tabular data. A dataset for Vertex AI is a set of data that may be used to develop one or more models. Either a table in BigQuery or a CSV file in Cloud Storage may be used to build a dataset. We’ll demonstrate how to construct a Vertex AI tabular dataset from a CSV file in Cloud Storage in this post.

Steps to create a Vertex AI tabular dataset

Step 1:

Create a CSV or JSON file of your training data and upload it to cloud storage. Ensure that the target column in your CSV file is either numerical (for regression) or categorical (for classification) and that your CSV file contains a header row that identifies each column. For illustration, I’m using an online sample dataset called “bank-marketing.csv” here.

Step 2:

Go to Google Cloud Console. There, on the left side Navigation menu (≡), locate and click on Vertex AI, and then click on Dashboard.

Console-display - Geeksforgeeks-(1)

Fig 1: Navigating console

Step 3:

Click Enable all Recommended APIs.

Enable-APIs - Geeksforgeeks-(1)

Fig 2: Enable APIs

Step 4:

After you have enabled all the APIs, navigate to the left menu panel and locate subsection – Data. In Data, click on Datasets.

Navigate-Dataset - Geeksforgeeks-(1)

Fig 3: Navigate to Dataset

Step 5:

You will see an empty workspace after you click on datasets. Beside Datasets heading, click on Create to make your own Dataset instance.

Create-Dataset - Geeksforgeeks-(1)

Fig 4: Create Dataset

Step 6:

Enter “Structured_AutoML_Tutorial” or any other name you want to give your dataset for the dataset name and select the Tabular tab.

Step 7:

Select the Regression/Classification objective and leave the Region set to us-central1 or any specific If you want please set it accordingly.

Step 8:

Click the Create button to create the dataset.

Selecting-datatype-and-objective - Geeksforgeeks-(1)

Fig 5: Selecting Tabular Dataset type

Step 9:

To select a data source, click Select CSV files from Cloud Storage or you can choose any option where your training dataset is available.

Selecting-import-method - Geeksforgeeks-(1)

Fig 6: select importing method

Step 10:

In the Import file path, enter your dataset path like here I am using “cloud-ml-tables-data/bank-marketing.csv” and Click Continue.

Note:

Review the schema and make sure the data types and roles of each column are correct. You can change the data type by clicking on the drop-down menu under Type, and change the role by clicking on the drop-down menu under Role. The role can be either Feature (input variable), Target (output variable), or Unused (ignored variable).

Select the column that you want to use as the target for your model. You can only select one column as the target. If you want to train multiple models with different targets, you need to create multiple datasets with different target columns.

Click Create Dataset and wait for Vertex AI to create your dataset.

Vertex AI verifies and checks the data type and feature values of the dataset and then decides how those features will be used in training the model. (Note: You must review the datatype of each column. This will make the interpretation correct later on.) In case required, there’s an option to specify a different transformation for any feature.

Conclusion

In this article, we learned how to create a Vertex AI tabular dataset from a CSV file in Cloud Storage. We also learned about the concepts of tabular data, features, targets, and roles. With a Vertex AI tabular dataset, you can start training AutoML models for various tasks such as classification or regression. You can also use the same dataset to train custom models with your code or pre-built containers.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads