Google Cloud Platform – Schematized tags and IAM in Data Catalog
Finding your data assets is one thing managing both your technical and business metadata while maintaining proper security requirements is a whole other beast. Data catalog can be of great help in scenarios like this. In this article, we will look into the process of adding schematized tags to your data assets as well as look into the integration with Cloud Identity and Access Management (IAM).
Data catalog is a fully managed and scalable data discovery a metadata management service that empowers organizations to quickly discover, understand, and manage all their data in Google Cloud. Here we will be tackling the concept of understanding your data assets with schematized tags in a safe and secure way. Just like sticky notes data catalog tags act as annotations, whereas other data catalogs capture tags as simple text strings data catalog captures business metadata in a schematized format via tags. These tags can be created through the UI as well as custom API’s. Users can now define a business tag as a string, double, boolean, or enumerated enum making it easy to catalog and find data assets.
Technical metadata is data that’s already saved in GCP services such as table names, column descriptions, and date created. This data is automatically ingested from the data source into the Data catalog without any involvement on the user part. For example, if you add a new table to BigQuery, it will show up in the data catalog in a matter of seconds on the other hand business metadata are tags that provide additional business information that is valuable to a customer. This could include
- delete by dates
- business logic
- data quality scores
- governance tags
By adding business metadata, data assets become more searchable and usable for your team. Let’s take a look at how you can add rich information to your data assets with tag templates. On the data, catalog home screen scroll to the tag template section and click on create a tag template.
Here you can define the Template ID, display name as well as add attributes and define their type.
Each attribute type can be string double boolean or enum.
Once you’ve added all your attributes, click on the save template.
Creating a tag template can also be done programmatically using custom API’s. Let’s learn more about an existing tag template. On the data catalog home screen, scroll to the tag template section, and click on explore tag templates.
We’ll look at the Data Governance Tag Templates.
This template has tags to help the data governance team certify specific data assets for use. It also lets approved users know which data assets they can use. You’ll notice that each attribute has a type as well.
For data classification the type is enum. From there the data governance team can now define whether the data asset is public, sensitive, confidential, or regulatory and can then apply the applicable controls around it.
Now that we’ve explored a tag template, let’s look at how the integration with IAM helps with access control of your data assets instead of having to set separate permissions. Data catalog auto ingest tactical metadata and honors existing source code. This means that if a user already has read access to all data assets in BigQuery then they’ll be able to discover and have read access to those data assets in the data catalog.
Similarly, this IAM integration allows further access controls to be set. With data catalog, you can set read access, metadata read-only access, and no read or no metadata read access. Depending on these permission settings, users search results and access can be controlled keeping highly sensitive data limited to those with the necessary access.