Open In App

How to Install Apache Airflow?

Improve
Improve
Like Article
Like
Save
Share
Report

A batch-oriented workflow can be developed, scheduled, and monitored using Apache Airflow, an open-source platform. You can integrate Airflow with virtually any technology thanks to its Python extension framework. Workflows can be managed using a web interface. Airflow is deployable in many ways, from simple processes running on laptops to distributed setups that can support even a huge flow of data.

Why Choose Airflow?

The Airflow framework can be easily extended to connect to new technology if your workflows have a clear start and end, and run at regular intervals. It is a batch workflow orchestration platform. If your workflows have a clear start and end and are scheduled to run at regular intervals, you can create Airflow DAGs.

Features:

  1. Easy to Use: if you are good with the basics of python, Airflow is easy.
  2. Open Source: The software is free and open-source, and it has many users.
  3. Roll back version: Previous versions of workflows can be rolled back by using version control
  4. Integrations: It provides ready-to-use operators with which to work with Google Cloud Platform, Amazon AWS, Microsoft Azure, etc.
  5. Amazing User Interface: Track your workflows and manage them with ease with the status interface.

Advantages:

  1. There is a time-based schedule for the entire Airflow model.
  2. To build a pipeline using Airflow, you can choose from a variety of operators.
  3. The Apache Airflow UI lets you check DAG status, runtimes, and logs.
  4. The raw data is stored, processed, and then separated from the processed data to provide immutability.
  5. Aim to provide idempotence wherein inputs and outputs will always be the same.

Disadvantages:

  1. Raw data pipelines make it extremely difficult to write test cases.
  2. Changing the schedule requires renaming your DAG.
  3. Running Airflow natively on Windows is not straightforward

Installation for Apache Airflow:

For Apache Airflow installation you should have pip installed first.

Step 1: Install pip first, in case you have already installed move to Step 3.

$ sudo apt-get install python3-pip

Step 2: Set the location

$ export AIRFLOW_HOME=~/airflow

Step 3: Install Apache Airflow using pip

$ pip3 install apache-airflow

Output:

install airflow

 

Step 4: Backend initialization to maintain workflow

$ airflow initdb

Step 5: Run the below command to start the web server or Apache user interface 

$ airflow webserver -p 8080

Step 6: Airflow scheduler to monitor workflow

$ airflow scheduler

Last Updated : 23 Nov, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads