This is a new version of Airflow. You can go back read the tutorial I made about Airflow 1 below.
As far as I read the changelog, I can summarize the big points that we can reach and use it as a consumer here.
Of course, it comes with new cleaner user interface, more understandable history page.
Now we can add the decorator
@task on top of a method and assign them as a python operator. This feature is new to me as well in order to make code cleaner and easier to read. I will write this later.
and et cetera.
A little gimmick here. Twisting fan.
Can visit the official page here to read all changes.
Installation in docker
I do have time back then to build a container of Airflow 1 in my macbook but found the official image at that time isn't good enough. Kind of information is lacking in the website and configurations are lots to go. So I ended up using Puckel's image instead.
Now the official docker compose for Airflow 2 has been launched here so I have no need to find out an other more reliable one.
I have assembled the steps defined at the Airflow documentation page into a single repo below. You can clone and try yourself.
Details in the repo
- The installation starts from
docker-compose.yaml. If you want to get more familiar with this, can visit my latest blog below.
- The original
docker-compose.yamlrelies on default image but I want ability to add additional python packages, so that I create a simple
dockerfile. You can add yours at
- Disable sample DAGs at line 59 of
- Prepare all necessary directory:
/plugins. Mostly use
0. Prepare all dependencies
Make sure all dependencies of our works are listed in
constraints.txt (if any) before go next.
For example, I put the package
pysftp in the file so when we build an image up, the following packages will be installed in the worker instance and be ready to use.
Also find the available packages via https://pypi.org.
1. docker-compose up
Now it is ready to go. Run this to roll the ball.
Seconds after this will show the necessary images are downloading.
Next are to create the scheduler which is for scheduling our jobs, and the worker that execute them.
And the last one is webserver. We see this means we are ready to view the airflow webpage now.
2. Logging in
Open a browser then go to http://localhost:8080. Use the username/password as
airflow to login this page.
It should be successful and now we can see the first page of Airflow here.
3. Try add a DAG
Back to the editor. I example the DAG from https://airflow.apache.org/docs/apache-airflow/stable/tutorial/fundamentals.html and save it in the folder
/dags like this.
It wouldn't show in the DAGs list instantly. We can trigger them by
1. Access the worker instance with the command.
docker exec -it <airflow-worker-container-name> /bin/bash
airflow-worker-container-name can be retrieved by
docker ps -a and choose one with
worker in its name.
2. List the DAGs with command.
airflow dags list
It will automatically compile DAGs files and display on the web if all is successful.
4. Break a DAG to see errors
In some cases, the web show some errors, for example we programmatically put wrong syntaxes or imports.
We can check the failed one by this command.
airflow dags list-import-errors
There will be a table showing all error messages in every DAGs files.
5. Let the DAG runs
Say every DAGs are good. When we click one and go see the UI of DAG history is improved. It's more modern, clearer, and neater.
DAG history is quite easier to read. It shows some basic stats about the DAGs at a side.
And the graph is not much differ, yet better, right?
6. docker-compose down
Stop the running terminal (
c on Mac) and down them using this command.
# stop all containers docker-compose down # stop all containers and remove everything docker-compose down --volumes --rmi all
This is just an introduction of Airflow 2 developing on Docker technology. I don't recommend this for Production but it is great for local development.
If you are looking for the best way to deal with Airflow job, you can try this way and hope it will be useful for you.