Introduction to Google Cloud Bigtable

Last Updated : 30 Mar, 2023

Pre-requisite: GCP

You may store terabytes or even petabytes of data in Google Cloud BigTable, a sparsely populated table that can scale to billions of rows and thousands of columns. The row key is the lone index value that appears in every row and is also known as the row value. Low-latency storage for massive amounts of single-keyed data is made possible by Google Cloud Bigtable. It is the perfect data source for MapReduce processes since it enables great read and write throughput with low latency.

Applications can access Google Cloud BigTable through a variety of client libraries, including a supported Java extension to the Apache HBase library. Because of this, it is compatible with the current Apache ecosystem of open-source big data software.

Powerful backend servers from Google Cloud Bigtable have a number of advantages over a self-managed HBase installation, including:
Exceptional scalability In direct proportion to the number of machines in your cluster, Google Cloud Bigtable scales. After a certain point, a self-managed HBase system has a design bottleneck that restricts performance. This bottleneck does not exist for Google Cloud Bigtable, therefore you can extend your cluster to support more reads and writes.
Ease of administration Upgrades and restarts are handled by Google Cloud Bigtable transparently, and it automatically upholds strong data durability. Simply add a second cluster to your instance to begin replicating your data; replication will begin immediately. Simply define your table schemas, and Google Cloud Bigtable will take care of the rest for you. No more managing replication or regions.
Cluster scaling with minimal disruption. Without any downtime, you may scale down a Google Cloud Bigtable cluster after increasing its capacity for a few hours to handle a heavy load. Under load, Google Cloud Bigtable usually balances performance across all of the nodes in your cluster within a few minutes after you modify the size of a cluster.

Why use BigTable?

Applications that require high throughput and scalability for key/value data, where each value is typically no more than 10 MB, should use Google Cloud BigTable. Additionally, Google Cloud Bigtable excels as a storage engine for machine learning, stream processing, and batch MapReduce operations.

All of the following forms of data can be stored in and searched using Google Cloud Bigtable:

Time-series information, such as CPU and memory utilization patterns across various servers.
Marketing information, such as consumer preferences and purchase history. Financial information, including stock prices, currency exchange rates, and transaction histories.
Internet of Things data, such as consumption statistics from home appliances and energy meters. Graph data, which includes details on the connections between users.

BigTable Storage Concept:

Each massively scalable table in Google Cloud Bigtable is a sorted key/value map that holds the data. The table is made up of columns that contain unique values for each row and rows that typically describe a single object. A single row key is used to index each row, and a column family is often formed out of related columns. The column family and a column qualifier, a distinctive name within the column family, are combined to identify each column.

Multiple cells may be present at each row/column intersection. A distinct timestamped copy of the data for that row and column is present in each cell. When many cells are put in a column, a history of the recorded data for that row and column is preserved. Cloud by Google Bigtable tables is sparse, taking up no room if a column is not used in a given row.
a few points to remember Rows of columns could be empty.

A specific row and column contain cells with individual timestamps (t).

All client queries made through the Google Cloud Bigtable architecture are sent through a frontend server before being forwarded to a Google Cloud Bigtable node. The nodes are arranged into a Google Cloud Bigtable cluster, which is a container for the cluster and is part of a Google Cloud Bigtable instance.

A portion of the requests made to the cluster is handled by each node. The number of simultaneous requests that a cluster can handle can be increased by adding nodes. The cluster’s maximum throughput rises as more nodes are added. You can send various types of traffic to various clusters if replication is enabled by adding more clusters. Then you can fail over to another cluster if one cluster is unavailable.

It’s important to note that data is never really saved in Google Cloud Bigtable nodes; rather, each node contains pointers to a collection of tablets that are kept on Colossus. Because the real data is not duplicated, rebalancing tablets from one node to another proceeds swiftly. When a Google Cloud Bigtable node fails, no data is lost; recovery from a node failure is quick since only metadata must be moved to the new node. Google Cloud Bigtable merely changes the pointers for each node.

Load balancing

A primary process oversees each Google Cloud Bigtable zone, balancing workload and data volume within clusters. By dividing busier/larger tablets in half and combining less-used/smaller tablets, this procedure moves tablets across nodes as necessary. Google Cloud Bigtable divides a tablet into two when it experiences a spike in traffic, and then moves one of the new tablets to a different node. By handling the splitting, merging, and rebalancing automatically with Google Cloud Bigtable, you may avoid having to manually manage your tablets.

It’s crucial to distribute writes among nodes as equally as you can in order to obtain the optimum write performance out of Google Cloud Bigtable. Using row keys with unpredictable ordering is one method to accomplish this.

Additionally, grouping comparable rows together and placing them next to one another makes it much easier to read multiple rows at once. If you were keeping various kinds of weather data across time, for instance, your row key may be the place where the data was gathered, followed by a timestamp (for instance, WashingtonDC#201803061617). A contiguous range of rows would be created using this kind of row key to combine all the data from one location. With several sites gathering data at the same rate, writes would still be dispersed uniformly between tablets. For other places, the row would begin with a new identifier.

Obtainable data types

For the majority of uses, Google Cloud Bigtable treats all data as raw byte strings. Only during increment operations, where the destination must be a 64-bit integer encoded as an 8-byte big-endian value, does Google Cloud Bigtable attempt to ascertain the type.

Use of the disc and memory

The sections that follow explain how various Google Cloud Bigtable features impact the amount of memory and disc space used by your instance.

Inactive columns

A Google Cloud Bigtable row doesn’t have any room for columns that aren’t being used. Each row is essentially made up of a set of key/value entries, where the key is made up of the timestamp, column family, and column qualifier. The key/value entry is just plain absent if a row doesn’t have a value for a certain column.

Columns that qualify

Since each column qualifier used in a row is stored in that row, column qualifiers occupy space in rows. As a result, using column qualifiers as data is frequently effective.

Compactions

To make reads and writes more effective and to eliminate removed entries, Google Cloud Bigtable periodically rewrites your tables. This procedure is called compaction. Your data is automatically compacted by Google Cloud Big Table; there are no tuning options.

Removals and Modifications

Because Google Cloud Bigtable saves mutations sequentially and only periodically compacts them, updates to a row require more storage space. A table is compacted by Google Cloud Bigtable by removing values that are no longer required. The original value and the updated value will both be kept on disc until the data is compressed if you change a cell’s value.
Because deletions are actually a particular kind of mutation, they also require more storage space, at least initially. A deletion consumes additional storage rather than releasing space up until the table is compacted.

Compression of data:- Your data is automatically compressed by Google Cloud Bigtable using a clever algorithm. Compression settings for your table cannot be configured. To store data effectively so that it may be compressed, though, is useful.
Patterned data can be compressed more effectively than random data. • Compression performs best when identical values are next to one another, either in the same row or in adjacent rows. Text, like as the page you’re reading right now, is a type of patterned data. The data can be efficiently compressed if your row keys are arranged so that rows with similar pieces of data are near to one another.
Before saving values in Google Cloud Bigtable, compress those that are greater than 1 MiB. This compression conserves network traffic, server memory, and CPU cycles. Compression is automatically off for values greater than 1 MiB in Google Cloud Bigtable.

Data longevity

When you use Google Cloud Bigtable, your information is kept on Colossus, an internal, incredibly resilient file system, employing storage components located in Google’s data centers. To use Google Cloud Bigtable, you do not need to run an HDFS cluster or any other type of file system.

Beyond what conventional HDFS three-way replication offers, Google employs customized storage techniques to ensure data persistence. Additionally, we make duplicate copies of your data to enable disaster recovery and protection against catastrophic situations.

Dependable model

Single-cluster Strong consistency is provided via Google Cloud Bigtable instances.
IAM roles that you can apply for security stop specific users from creating new instances, reading from tables, or writing to tables. Any of your tables cannot be accessed by anyone who does not have access to your project or who does not have an IAM role with the necessary Google Cloud Bigtable permissions.

At the level of projects, instances, and tables, security can be managed. There are no row-level, column-level, or cell-level security constraints supported by Google Cloud Bigtable.

Encryption

The same hardened key management mechanisms that we employ for our own encrypted data are used by default for all data stored within Google Cloud, including the data in Google Cloud Big Table tables.

Customer-managed encryption keys provide you more control over the keys used to protect your Google Cloud Bigtable data at rest (CMEK).

Backups

With Google Cloud Bigtable backups, you may copy the schema and data of a table and later restore it to a new table using the backup. You can recover from operator errors, such as accidentally deleting a table and application-level data destruction with the use of backups.

Suggest improvement

Google File System

Google Cloud Platform Networking Services

Share your thoughts in the comments

Introduction

Compute Services

Storage and Database Services

Networking Services

Security Services

Data Integration and Analytics Services

Management tools and Monitoring Services

GCP DevOps

Miscellaneous