Open In App

How to Create First DAG in Airflow?

Directed Acyclic Graph (DAG) is a group of all individual tasks that we run in an ordered fashion. In other words, we can say that a DAG is a data pipeline in airflow. In a DAG:

Key Terminologies:

A Dag file is a python file that specifies the structure as well as the code of the DAG. 



Steps To Create an Airflow DAG

  1. Importing the right modules for your DAG
  2. Create default arguments for the DAG 
  3. Creating a DAG Object
  4. Creating tasks
  5. Setting up dependencies for the DAG 

Now, let’s discuss these steps one by one in detail and create a simple DAG.

Step 1: Importing the right modules for your DAG



In order to create a DAG, it is very important to import the right modules that are needed in order to make sure, that we have imported all the modules, that we will be using in our code to create the structure of the DAG. The first and most important module to import is the “DAG” module from the airflow package that will initiate the DAG object for us. Then, we can import the modules related to the date and time. After that we can import the operators, we will be using in our DAG file. Here, we will be just importing the Dummy Operator.

# To initiate the DAG Object
from airflow import DAG
# Importing datetime and timedelta 
 modules for scheduling the DAGs
from datetime import timedelta, datetime
# Importing operators 
from airflow.operators.dummy_operator 
import DummyOperator

Step 2: Create default arguments for the DAG 

Default arguments is a dictionary that we pass to airflow object, it contains the metadata of the DAG. We can easily apply these arguments to as many operators, that we want.

Let’s create a dictionary named default_args 

# Initiating the default_args
default_args = {
        'owner' : 'airflow',
        'start_date' : datetime(2022, 11, 12)
}

We can add more such parameters to our arguments, as per our requirement.

Step 3: Creating DAG Object

After the default_args, we have to create a DAG object, by passing a unique identifier, that we call “dag_id“, Here we can name it DAG-1.

So, let’s create a DAG Object.

# Creating DAG Object
dag = DAG(dag_id='DAG-1',
        default_args=default_args,
        schedule_interval='@once', 
        catchup=False
    )

Here, 

Step 4: Create tasks

A task is an instance of an operator. It has a unique identifier called task_id. There are various operators, but here, we will be using the DummyOperator. We can create various tasks using various operators. Here we will be creating two simple tasks:-

 # Creating first task
 start = DummyOperator(task_id
  = 'start', dag = dag)

If you go to the graph view in UI, then you can see the task, “start” has been created.

 

# Creating second task
end = DummyOperator(task_id 
= 'end', dag = dag)

Now, two tasks start and end will be created,

 

Step 5: Setting up dependencies for the DAG.

Dependencies are the relationship between the operators or the order in which the tasks in a DAG will be executed. We can set the order of execution by using the bitwise left or right operators to specify the downstream or upstream fashion respectively.

Now, let’s set up the order of execution between the start and end tasks. Here, let us suppose that we want to start to run first, and end running after that.

# Setting up dependencies 
start >> end 
# We can also write it as start.set_downstream(end) 

Now, start and end after setting up dependencies:-

 

Putting all our code together, 

# Step 1: Importing Modules
# To initiate the DAG Object
from airflow import DAG
# Importing datetime and timedelta modules for scheduling the DAGs
from datetime import timedelta, datetime
# Importing operators 
from airflow.operators.dummy_operator import DummyOperator

# Step 2: Initiating the default_args
default_args = {
        'owner' : 'airflow',
        'start_date' : datetime(2022, 11, 12),

}

# Step 3: Creating DAG Object
dag = DAG(dag_id='DAG-1',
        default_args=default_args,
        schedule_interval='@once', 
        catchup=False
    )

# Step 4: Creating task
# Creating first task
 start = DummyOperator(task_id = 'start', dag = dag)
# Creating second task 
 end = DummyOperator(task_id = 'end', dag = dag)

 # Step 5: Setting up dependencies 
start >> end 

Now, we have successfully created our first dag. We can move on to the webserver to see it in the UI.

 

Now, you can click on the dag and can explore different views of the DAG in the Airflow UI.

Article Tags :