How to Install Apache Airflow?
A batch-oriented workflow can be developed, scheduled, and monitored using Apache Airflow, an open-source platform. You can integrate Airflow with virtually any technology thanks to its Python extension framework. Workflows can be managed using a web interface. Airflow is deployable in many ways, from simple processes running on laptops to distributed setups that can support even a huge flow of data.
Why Choose Airflow?
The Airflow framework can be easily extended to connect to new technology if your workflows have a clear start and end, and run at regular intervals. It is a batch workflow orchestration platform. If your workflows have a clear start and end and are scheduled to run at regular intervals, you can create Airflow DAGs.
Features:
- Easy to Use: if you are good with the basics of python, Airflow is easy.
- Open Source: The software is free and open-source, and it has many users.
- Roll back version: Previous versions of workflows can be rolled back by using version control
- Integrations: It provides ready-to-use operators with which to work with Google Cloud Platform, Amazon AWS, Microsoft Azure, etc.
- Amazing User Interface: Track your workflows and manage them with ease with the status interface.
Advantages:
- There is a time-based schedule for the entire Airflow model.
- To build a pipeline using Airflow, you can choose from a variety of operators.
- The Apache Airflow UI lets you check DAG status, runtimes, and logs.
- The raw data is stored, processed, and then separated from the processed data to provide immutability.
- Aim to provide idempotence wherein inputs and outputs will always be the same.
Disadvantages:
- Raw data pipelines make it extremely difficult to write test cases.
- Changing the schedule requires renaming your DAG.
- Running Airflow natively on Windows is not straightforward
Installation for Apache Airflow:
For Apache Airflow installation you should have pip installed first.
Step 1: Install pip first, in case you have already installed move to Step 3.
$ sudo apt-get install python3-pip
Step 2: Set the location
$ export AIRFLOW_HOME=~/airflow
Step 3: Install Apache Airflow using pip
$ pip3 install apache-airflow
Output:
Step 4: Backend initialization to maintain workflow
$ airflow initdb
Step 5: Run the below command to start the web server or Apache user interface
$ airflow webserver -p 8080
Step 6: Airflow scheduler to monitor workflow
$ airflow scheduler
Last Updated :
23 Nov, 2022
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...