Open In App

AWS Athena

Last Updated : 17 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Pre-requisites: AWS

Amazon Web Services (AWS) provides its account holders with on-demand IT resources, i.e. pay-as-you-go with no upfront expenses. Amazon Web services are adaptable since you just pay for the services you use or require. 

What is AWS Athena

AWS Athena is a serverless interactive query service that enables normal SQL data analysis in Amazon S3. Athena is based on Presto, a distributed SQL query engine, and it can query data in Amazon S3 fast using conventional SQL syntax. There is no infrastructure to handle with Athena, so you can focus on analyzing data at scale. To have more idea of AWS Ethena, let us understand the architecture first.

AWS Ethena Architecture

Apache Presto, an open-source distributed SQL query engine, serves as the foundation for Athena. When a query is submitted by a user, Athena generates a query plan and sends it to Presto for execution. Presto then distributes the query over numerous cluster nodes for parallel processing. The results are subsequently compiled and presented to the user. Athena stores table and partition metadata in a controlled Hive metastore. When a query is run, Athena gets the metadata from the metastore to establish the data’s location and format. Athena also interfaces with AWS Glue, a fully managed extract, transform, and load (ETL) service, allowing customers to create and manage data catalogs and ETL processes. Furthermore, we will go through the various components of AWS Athena.

  • Amazon S3: Athena searches data stored in Amazon S3, an object storage service that is highly durable, highly accessible, and infinitely scalable.
  • Amazon Glue: Athena leverages AWS Glue, a fully managed extract, transform, and load (ETL) service, to catalog and query the data stored in S3.
  • Apache Presto: Apache Presto is Athena’s distributed SQL query engine. Presto is well-suited for querying data stored in distributed systems and can handle queries that require data from numerous sources to be joined.
  • Amazon CloudWatch: Athena interacts with Amazon CloudWatch, a monitoring service that offers metrics and logs for all of your AWS account’s resources. CloudWatch may be used to track the performance of your Athena queries and create alerts for specific query patterns.
  • Amazon VPC: Athena supports performing queries within an Amazon Virtual Private Cloud (VPC), which allows you to isolate your data and limit access to it using Amazon VPC security groups and network ACLs.
  • Encryption: Athena supports S3 server-side encryption with Amazon S3-managed keys (SSE-S3) or AWS Key Management Service-managed keys (SSE-KMS), as well as SSL/TLS encryption of data in transit.

 

Features of AWS Athena

  • Serverless architecture – Athena is a fully-managed service that does not require any infrastructure setup, management, or scaling.
  • Standard SQL support – Since Athena supports ANSI SQL, users can query data in S3 using their existing SQL knowledge and tools.
  • Connection with the AWS ecosystem – Athena interfaces with other AWS services such as Amazon S3, AWS Glue, and AWS Lambda, enabling customers to import and convert data from a variety of sources.
  • Cost-effective pricing model – Athena’s pricing approach is cost-effective since it costs customers based on the amount of data scanned by their queries, making it ideal for ad-hoc and exploratory queries.
  • Integration with BI tools – Athena provides connectivity with major business intelligence tools like as Tableau, Power BI, and Amazon QuickSight, allowing users to build visualizations and reports.

Advantages of AWS Athena

  1. No infrastructure setup – Athena is a serverless service that eliminates the need for users to set up and manage infrastructure, making data querying easier and faster.
  2. Cost-effective – Athena charges customers solely for the quantity of data scanned by their searches, making it an affordable solution for ad hoc and exploratory queries. 
  3. Scalability – Athena is a fully-managed service that can automatically scale to accommodate massive amounts of data and queries.
  4. SQL support – Since Athena supports ANSI SQL, users can query data in S3 using their existing SQL knowledge and tools.

Disadvantages of AWS Athena

  1. Restricted query performance – The volume of data scanned and the intricacy of the query can limit Athena’s speed, resulting in lengthier query times.
  2. No real-time querying – Because Athena is intended for batch processing, it may not be ideal for real-time querying.
  3. Limited data types – In comparison to other database systems, Athena only supports a restricted selection of data types.

Use Cases of AWS Athena

  • Ad hoc and exploratory querying: Athena is well suited for ad hoc and exploratory querying, where users need to quickly assess data without the need to set up and manage infrastructure.
  • Log analysis: Athena is extensively used for log analysis, allowing customers to query massive amounts of log data stored in S3.
  • Business intelligence: By querying data stored in S3 and viewing the results in popular BI tools such as Tableau and Power BI, Athena may be used to serve business intelligence applications.

In conclusion, Amazon Athena is a serverless query service that allows customers to run regular SQL queries to evaluate data in S3. Serverless design, standard SQL support, interaction with the AWS environment, cost-effective pricing, and integration with BI tools are among its primary characteristics. Its architecture is based on top of Apache Presto and interfaces with AWS Gl.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads