Introduction
Nowadays all the software applications are moving towards the SaaS(Software as Service) model
where it has a single application that serves multiple tenants with multiple data stores either
separated by schema or database instance. At the same time data separation between tenants
is a overcritical part for all the SaaS platform. I still believe many clients have tough to understand
and believe that their data is completely secure and separated from other tenant data on that SaaS
application and it will never mess with other tenants data.
Also its great challenge for data engineer and data architect to plan and design Data Pipelines for
SaaS Platform.
How Data pipeline for SaaS is different than normal application pipeline:
Apache Airflow provides a solution with GCP Composer to manage data pipelines for SaaS applications.
Here is detail on GCP documentation Google Cloud Composer Operators — apache-airflow-providers-google Documentation
Cloud Composer is a fully managed workflow orchestration service, enabling you to create, schedule, monitor, and manage workflows that span across clouds and on-premises data centers.
Cloud Composer is built on the popular Apache Airflow open source project and operates using the Python programming language.
By using Cloud Composer instead of a local instance of Apache Airflow, you can benefit from the best of Airflow with no installation or management overhead. Cloud Composer helps you create Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, so you can focus on your workflows and not your infrastructure.I am assuming the reader has an idea or learns about airflow basics DAG creation, Now I will explain
how to make it for SaaS.
How to make Airflow for SaaS
Write your most of logic in separate .py file and create as function
Function should accept tenant as parameter
Create a list of tenants in DAG or you can read the tenant list from either airflow variable or configuration. I recommend reading from airflow variable
Call that function directly in DAG and pass the tenant parameter
Import task from airflow.decorators import task
Use @task in DAG just before calling the function.
Use this code to create task at runtime for each of the tenant result=data_load.expand(tenant=tenant_list)
all_sucess(result)
In the next blog I will write all the working code for this solution.
I hope you have enjoyed learning, your feedback or comment will be highly appreciated.
No comments:
Post a Comment