How Does YouTube Store And Analyze Such Huge Amount of Data?
Do you love watching stand up comedy? Or maybe you like watching cute cat videos? Or even the latest Bollywood songs and trailers? Whatever your interests, I am sure you use YouTube to watch videos. And who knows? You might even have a popular channel on YouTube!!!
In either case, YouTube is an integral part of your life. And this is true for most people on Earth! This is easily proved by the fact that more than 400 hours of video content is uploaded on YouTube every minute, and approximately 1 billion hours of YouTube videos are watched every day. This makes YouTube the 2nd most popular social media platform in the world with 1.9 billion users (The 1st is Facebook!)
This is an insane amount of data that is stored and managed on YouTube. So the natural question is “How do they do it?” How does YouTube store and retrieve their content? How do they know which video to recommend to you next? How do they know what you want to watch? The answer to these questions lies in the complicated database management systems for YouTube. So let’s try to understand that now!
What is the Basic YouTube Data Storage Mechanism?
YouTube is the goto platform for watching and sharing videos. So, it’s obvious that there is a large volume of video content that it has to manage daily. This is done by using MySQL and various database management systems at different places to keep YouTube up and running.
Most of the YouTube data is stored in the Google Modular Data Centers. A modular data center is portable and can be placed wherever the data storage capacity is required. Since YouTube was bought by Google in 2006, it stands to reason that the YouTube data is stored in the Google Modular Data Centers. There are mainly 5 or 6 Google data centers that YouTube uses along with its own content distribution network (CDN) to make sure data is constantly available to end-users.
The more popular videos are moved to CDN which replicates them into various places. This means that they can be accessed much faster by the user with fewer hops required. On the other hand, less popular videos are saved on the YouTube servers where they can be accessed on-demand. Also, there is no hard and fast rule that the videos are stored in the data center closest to the geographical region they came out of. For example: If you upload some videos on YouTube from India, your data may be stored in a data center in the UK. Youtube also makes use of cloud storage in addition to all these methods.
Originally MySQL was mostly used in the YouTube databases to store most of the data ranging from the videos to metadata like users, tags, and descriptions. The varbinary data type was used for the databases which allowed the storage of videos and images like thumbnails as well! However, a disadvantage of MySQL is that there is little scope for scalability, which is a very important factor in an ever-expanding company like YouTube. However, YouTube cannot let go of MySQL completely, so Vitess is used in conjugation with MySQL. Vitess is a database clustering system that combines many of the important features of MySQL with the scalability that is a trademark of a NoSQL database. Vitess helps in consolidating the YouTube queries into smaller batches that are much easier to handle and execute. It also creates backups and scales as much is required.
How is YouTube Data Analyzed for Advertisement?
The data collected by YouTube is also analyzed for personalized advertisement displays. Are you wondering how this is done? Well, this is where Google lends a hand! Google uses algorithms to collect all user information like browser and search history, geographical information, etc. Then these algorithms analyze the information to understand what kind of products or services the users might be interested in. Then companies pay for advertising their products on YouTube by using Adwords and Adsense which monitors the number of clicks on these advertisements. For example: Suppose that you are a football enthusiast who continuously watches games and player interviews on YouTube. So you will be mostly shown sports ads so that you might be interested in buying things!!! Using this algorithm, users get targeted ads according to their preferences and the advertisers can also ensure that their products reach those who might be interesting in buying them while YouTube earns money. So it’s a win-win situation!!!