使用 SQLExecuteQueryOperator 连接到 MSSQL¶
本指南的目的是定义使用 SQLExecuteQueryOperator 与 MSSQL 数据库交互的任务。
使用 SQLExecuteQueryOperator
在 MSSQL 数据库中执行 SQL 命令。
警告
以前,MsSqlOperator 用于执行此类操作。但目前 MsSqlOperator 已弃用,并将在未来版本的提供程序中删除。请考虑尽快切换到 SQLExecuteQueryOperator。
使用 SQLExecuteQueryOperator 进行常见的数据库操作¶
要使用 SQLExecuteQueryOperator 对 MSSQL 数据库执行 SQL 查询,需要两个参数:sql
和 conn_id
。这两个参数最终将被馈送到直接与 MSSQL 数据库交互的 MSSQL 钩子对象。
创建 MSSQL 数据库表¶
以下代码片段基于 Airflow-2.2
以下是使用 SQLExecuteQueryOperator 连接到 MSSQL 的示例
# Example of creating a task to create a table in MsSql
create_table_mssql_task = SQLExecuteQueryOperator(
task_id="create_country_table",
conn_id="airflow_mssql",
sql=r"""
CREATE TABLE Country (
country_id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
name TEXT,
continent TEXT
);
""",
dag=dag,
)
您还可以使用外部文件来执行 SQL 命令。脚本文件夹必须与 DAG.py 文件位于同一级别。这样,您可以轻松地将 SQL 查询与代码分开维护。
# Example of creating a task that calls an sql command from an external file.
create_table_mssql_from_external_file = SQLExecuteQueryOperator(
task_id="create_table_from_external_file",
conn_id="airflow_mssql",
sql="create_table.sql",
dag=dag,
)
您的 dags/create_table.sql
应如下所示
将数据插入 MSSQL 数据库表¶
然后,我们可以创建一个 SQLExecuteQueryOperator 任务来填充 Users
表。
populate_user_table = SQLExecuteQueryOperator(
task_id="populate_user_table",
conn_id="airflow_mssql",
sql=r"""
INSERT INTO Users (username, description)
VALUES ( 'Danny', 'Musician');
INSERT INTO Users (username, description)
VALUES ( 'Simone', 'Chef');
INSERT INTO Users (username, description)
VALUES ( 'Lily', 'Florist');
INSERT INTO Users (username, description)
VALUES ( 'Tim', 'Pet shop owner');
""",
)
从您的 MSSQL 数据库表中获取记录¶
从您的 MSSQL 数据库表中获取记录可以像下面这样简单
get_all_countries = SQLExecuteQueryOperator(
task_id="get_all_countries",
conn_id="airflow_mssql",
sql=r"""SELECT * FROM Country;""",
)
将参数传递给 SQLExecuteQueryOperator¶
SQLExecuteQueryOperator 提供了 parameters
属性,可以在运行时将值动态注入到 SQL 请求中。
要查找亚洲大陆的国家/地区
get_countries_from_continent = SQLExecuteQueryOperator(
task_id="get_countries_from_continent",
conn_id="airflow_mssql",
sql=r"""SELECT * FROM Country where {{ params.column }}='{{ params.value }}';""",
params={"column": "CONVERT(VARCHAR, continent)", "value": "Asia"},
)
连接到 MSSQL 的完整 SQLExecuteQueryOperator DAG¶
当我们把所有东西放在一起时,我们的 DAG 应该是这样的
import os
from datetime import datetime
import pytest
from airflow import DAG
try:
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
from airflow.providers.microsoft.mssql.hooks.mssql import MsSqlHook
except ImportError:
pytest.skip("MSSQL provider not available", allow_module_level=True)
ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
DAG_ID = "example_mssql"
with DAG(
DAG_ID,
schedule="@daily",
start_date=datetime(2021, 10, 1),
tags=["example"],
catchup=False,
) as dag:
# Example of creating a task to create a table in MsSql
create_table_mssql_task = SQLExecuteQueryOperator(
task_id="create_country_table",
conn_id="airflow_mssql",
sql=r"""
CREATE TABLE Country (
country_id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
name TEXT,
continent TEXT
);
""",
dag=dag,
)
@dag.task(task_id="insert_mssql_task")
def insert_mssql_hook():
mssql_hook = MsSqlHook(mssql_conn_id="airflow_mssql", schema="airflow")
rows = [
("India", "Asia"),
("Germany", "Europe"),
("Argentina", "South America"),
("Ghana", "Africa"),
("Japan", "Asia"),
("Namibia", "Africa"),
]
target_fields = ["name", "continent"]
mssql_hook.insert_rows(table="Country", rows=rows, target_fields=target_fields)
# Example of creating a task that calls an sql command from an external file.
create_table_mssql_from_external_file = SQLExecuteQueryOperator(
task_id="create_table_from_external_file",
conn_id="airflow_mssql",
sql="create_table.sql",
dag=dag,
)
populate_user_table = SQLExecuteQueryOperator(
task_id="populate_user_table",
conn_id="airflow_mssql",
sql=r"""
INSERT INTO Users (username, description)
VALUES ( 'Danny', 'Musician');
INSERT INTO Users (username, description)
VALUES ( 'Simone', 'Chef');
INSERT INTO Users (username, description)
VALUES ( 'Lily', 'Florist');
INSERT INTO Users (username, description)
VALUES ( 'Tim', 'Pet shop owner');
""",
)
get_all_countries = SQLExecuteQueryOperator(
task_id="get_all_countries",
conn_id="airflow_mssql",
sql=r"""SELECT * FROM Country;""",
)
get_all_description = SQLExecuteQueryOperator(
task_id="get_all_description",
conn_id="airflow_mssql",
sql=r"""SELECT description FROM Users;""",
)
get_countries_from_continent = SQLExecuteQueryOperator(
task_id="get_countries_from_continent",
conn_id="airflow_mssql",
sql=r"""SELECT * FROM Country where {{ params.column }}='{{ params.value }}';""",
params={"column": "CONVERT(VARCHAR, continent)", "value": "Asia"},
)
(
create_table_mssql_task
>> insert_mssql_hook()
>> create_table_mssql_from_external_file
>> populate_user_table
>> get_all_countries
>> get_all_description
>> get_countries_from_continent
)