Google Cloud Platform – A High level Overview of Data Catalog Service
Discovering enterprise data sources is a time-consuming challenge and is usually based on tribal knowledge. Data discovery is now made simple with Data Catalog. Data Catalog is a fully managed and scalable data discovery and metadata management service that empowers organizations to quickly discover, understand, and manage all their data in Google Cloud. In this article, we will be tackling data discovery with Data Catalog.
Data discovery usually starts with a question. What’s the slowest link in our supply chain? Users spend an incredible amount of time and effort discovering, validating, and getting access to the correct data sets and sources, and to be able to answer their questions. But thanks to the same technology that powers Gmail and Drive, Data Catalog provides a simple and easy-to-use user interface with powerful, structured search capabilities to quickly and easily find data assets.
So now, let’s see it in action. Using the search bar at the top of the Data Catalog, you can conduct a search of all your data assets across BigQuery and Cloud Pub/Sub.
Google will be adding additional data sources in the future, including on-premises. Data Catalog also provides faceted search, allowing users to search by type, column, and tag across literally millions of data assets.
Beyond the search bar, you can view popular tables, which span the most queried and viewed BigQuery tables in the past 30 days. You can also explore Pub/Subtopics, as well as tables and views, and actually get started with creating tag templates. The beauty of Data Catalog is that it has built-in access level controls that honor source ACLs, allowing users to get started with data exploration in a seamless and more secure way.