airflow.providers.google.cloud.triggers.bigquery¶

类¶

`BigQueryInsertJobTrigger`	BigQueryInsertJobTrigger 在触发器 Worker 上运行以执行插入操作。
`BigQueryCheckTrigger`	BigQueryCheckTrigger 在触发器 Worker 上运行。
`BigQueryGetDataTrigger`	BigQueryGetDataTrigger 在触发器 Worker 上运行，继承自 BigQueryInsertJobTrigger 类。
`BigQueryIntervalCheckTrigger`	BigQueryIntervalCheckTrigger 在触发器 Worker 上运行，继承自 BigQueryInsertJobTrigger 类。
`BigQueryValueCheckTrigger`	BigQueryValueCheckTrigger 在触发器 Worker 上运行，继承自 BigQueryInsertJobTrigger 类。
`BigQueryTableExistenceTrigger`	使用所需参数初始化 BigQuery Table Existence Trigger。
`BigQueryTablePartitionExistenceTrigger`	使用所需参数初始化 BigQuery Table Partition Existence Trigger。

模块内容¶

类 airflow.providers.google.cloud.triggers.bigquery.BigQueryInsertJobTrigger(conn_id, job_id, project_id, location, dataset_id=None, table_id=None, poll_interval=4.0, impersonation_chain=None, cancel_on_kill=True)[source]¶

基类: airflow.triggers.base.BaseTrigger

BigQueryInsertJobTrigger 在触发器 Worker 上运行以执行插入操作。

参数:

conn_id (str) – Google Cloud 连接 ID 的引用
job_id (str | None) – 作业 ID。将附加作业配置的哈希值作为后缀
project_id (str) – 作业正在运行的 Google Cloud 项目
location (str | None) – 数据集位置。
dataset_id (str | None) – 请求的表的数据集 ID。（模板化）
table_id (str | None) – 请求的表的表 ID。（模板化）
poll_interval (float) – 检查状态的轮询周期，单位为秒。（模板化）
impersonation_chain (str | collections.abc.Sequence[str] | None) – 用于使用短期凭据模拟（impersonate）的可选服务帐号，或者获取列表中最后一个帐号的 access_token 所需的帐号链列表，该帐号将在请求中被模拟。如果设置为字符串，则该帐号必须授予发起帐号 Service Account Token Creator IAM 角色。如果设置为序列，则列表中的身份必须向直接前一个身份授予 Service Account Token Creator IAM 角色，列表中的第一个帐号向发起帐号授予此角色。（模板化）

conn_id[source]¶

job_id[source]¶

dataset_id = None[source]¶

project_id[source]¶

location[source]¶

table_id = None[source]¶

poll_interval = 4.0[source]¶

impersonation_chain = None[source]¶

cancel_on_kill = True[source]¶

serialize()[source]¶

序列化 BigQueryInsertJobTrigger 参数和类路径。

get_task_instance(session)[source]¶

safe_to_cancel()[source]¶

是否可以安全地取消由此触发器执行的外部作业。

这是为了避免触发器自身停止时调用 asyncio.CancelledError 的情况。因为在这些情况下，我们不应该取消外部作业。

async run()[source]¶

获取当前作业执行状态并生成 TriggerEvent。

类 airflow.providers.google.cloud.triggers.bigquery.BigQueryCheckTrigger(conn_id, job_id, project_id, location, dataset_id=None, table_id=None, poll_interval=4.0, impersonation_chain=None, cancel_on_kill=True)[source]¶

基类: BigQueryInsertJobTrigger

BigQueryCheckTrigger 在触发器 Worker 上运行。

serialize()[source]¶

序列化 BigQueryCheckTrigger 参数和类路径。

async run()[source]¶

获取当前作业执行状态并生成 TriggerEvent。

类 airflow.providers.google.cloud.triggers.bigquery.BigQueryGetDataTrigger(as_dict=False, selected_fields=None, **kwargs)[source]¶

基类: BigQueryInsertJobTrigger

BigQueryGetDataTrigger 在触发器 Worker 上运行，继承自 BigQueryInsertJobTrigger 类。

参数:: as_dict (bool) – 如果为 True，则将结果作为字典列表返回，否则作为列表列表返回（默认值：False）。

as_dict = False[source]¶

selected_fields =None[source]¶

serialize()[source]¶

序列化 BigQueryInsertJobTrigger 参数和类路径。

async run()[source]¶

获取当前作业执行状态，并生成包含响应数据的 TriggerEvent。

类 airflow.providers.google.cloud.triggers.bigquery.BigQueryIntervalCheckTrigger(conn_id, first_job_id, second_job_id, project_id, table, metrics_thresholds, location=None, date_filter_column='ds', days_back=-7, ratio_formula='max_over_min', ignore_zero=True, dataset_id=None, table_id=None, poll_interval=4.0, impersonation_chain=None)[source]¶

基类: BigQueryInsertJobTrigger

BigQueryIntervalCheckTrigger 在触发器 Worker 上运行，继承自 BigQueryInsertJobTrigger 类。

参数:

conn_id (str) – Google Cloud 连接 ID 的引用
first_job_id (str) – 执行的作业 1 的 ID
second_job_id (str) – 执行的作业 2 的 ID
project_id (str) – 作业正在运行的 Google Cloud 项目
dataset_id (str | None) – 请求的表的数据集 ID。（模板化）
table (str) – 表名
metrics_thresholds (dict[str, int]) – 按指标索引的比率字典
location (str | None) – 数据集位置。
date_filter_column (str | None) – 列名。（模板化）
days_back (SupportsAbs[int]) – ds 与我们要检查的 ds 之间的天数。（模板化）
ratio_formula (str) – 比率公式。（模板化）
ignore_zero (bool) – 是否考虑零的布尔值。（模板化）
table_id (str | None) – 请求的表的表 ID。（模板化）
poll_interval (float) – 检查状态的轮询周期，单位为秒。（模板化）
impersonation_chain (str | collections.abc.Sequence[str] | None) – 用于使用短期凭据模拟（impersonate）的可选服务帐号，或者获取列表中最后一个帐号的 access_token 所需的帐号链列表，该帐号将在请求中被模拟。如果设置为字符串，则该帐号必须授予发起帐号 Service Account Token Creator IAM 角色。如果设置为序列，则列表中的身份必须向直接前一个身份授予 Service Account Token Creator IAM 角色，列表中的第一个帐号向发起帐号授予此角色。（模板化）

conn_id[source]¶

first_job_id[source]¶

second_job_id[source]¶

project_id[source]¶

table[source]¶

metrics_thresholds[source]¶

date_filter_column = 'ds'[source]¶

days_back = -7[source]¶

ratio_formula = 'max_over_min'[source]¶

ignore_zero = True[source]¶

serialize()[source]¶

序列化 BigQueryCheckTrigger 参数和类路径。

async run()[source]¶

获取当前作业执行状态并生成 TriggerEvent。

类 airflow.providers.google.cloud.triggers.bigquery.BigQueryValueCheckTrigger(conn_id, sql, pass_value, job_id, project_id, tolerance=None, dataset_id=None, table_id=None, location=None, poll_interval=4.0, impersonation_chain=None)[source]¶

基类: BigQueryInsertJobTrigger

BigQueryValueCheckTrigger 在触发器 Worker 上运行，继承自 BigQueryInsertJobTrigger 类。

参数:

conn_id (str) – Google Cloud 连接 ID 的引用
sql (str) – 要执行的 SQL
pass_value (int | float | str) – 通过值
job_id (str | None) – 作业 ID
project_id (str) – 作业正在运行的 Google Cloud 项目
tolerance (Any) – 用于容差的某些指标。（模板化）
dataset_id (str | None) – 请求的表的数据集 ID。（模板化）
table_id (str | None) – 请求的表的表 ID。（模板化）
location (str | None) – 数据集位置
poll_interval (float) – 检查状态的轮询周期，单位为秒。（模板化）
impersonation_chain (str | collections.abc.Sequence[str] | None) – 用于使用短期凭据模拟（impersonate）的可选服务帐号，或者获取列表中最后一个帐号的 access_token 所需的帐号链列表，该帐号将在请求中被模拟。如果设置为字符串，则该帐号必须授予发起帐号 Service Account Token Creator IAM 角色。如果设置为序列，则列表中的身份必须向直接前一个身份授予 Service Account Token Creator IAM 角色，列表中的第一个帐号向发起帐号授予此角色（模板化）。

sql[source]¶

pass_value[source]¶

tolerance = None[source]¶

serialize()[source]¶

序列化 BigQueryValueCheckTrigger 参数和类路径。

async run()[source]¶

获取当前作业执行状态并生成 TriggerEvent。

类 airflow.providers.google.cloud.triggers.bigquery.BigQueryTableExistenceTrigger(project_id, dataset_id, table_id, gcp_conn_id, hook_params, poll_interval=4.0, impersonation_chain=None)[source]¶

基类: airflow.triggers.base.BaseTrigger

使用所需参数初始化 BigQuery Table Existence Trigger。

参数:

project_id (str) – 作业正在运行的 Google Cloud 项目
dataset_id (str) – 请求的表的数据集 ID。
table_id (str) – 请求的表的表 ID。
gcp_conn_id (str) – Google Cloud 连接 ID 的引用
hook_params (dict[str, Any]) – Hook 参数
poll_interval (float) – 检查状态的轮询周期，单位为秒
impersonation_chain (str | collections.abc.Sequence[str] | None) – 用于使用短期凭据模拟（impersonate）的可选服务帐号，或者获取列表中最后一个帐号的 access_token 所需的帐号链列表，该帐号将在请求中被模拟。如果设置为字符串，则该帐号必须授予发起帐号 Service Account Token Creator IAM 角色。如果设置为序列，则列表中的身份必须向直接前一个身份授予 Service Account Token Creator IAM 角色，列表中的第一个帐号向发起帐号授予此角色。（模板化）

dataset_id[source]¶

project_id[source]¶

table_id[source]¶

gcp_conn_id: str[source]¶

poll_interval = 4.0[source]¶

hook_params[source]¶

impersonation_chain = None[source]¶

serialize()[source]¶

序列化 BigQueryTableExistenceTrigger 参数和 classpath。

async run()[source]¶

将一直运行直到 Google Big Query 中存在该表。

class airflow.providers.google.cloud.triggers.bigquery.BigQueryTablePartitionExistenceTrigger(partition_id, **kwargs)[source]¶

基类: BigQueryTableExistenceTrigger

使用所需参数初始化 BigQuery Table Partition Existence Trigger。

参数:

partition_id (str) – 要检查其是否存在的分区的名称。
project_id – 运行作业的 Google Cloud 项目
dataset_id – 请求表的数据集 ID。
table_id – 请求表的表 ID。
gcp_conn_id – Google Cloud 连接 ID 的引用
hook_params – hook 的参数
poll_interval – 检查状态的轮询周期（秒）
impersonation_chain – 可选的服务账号，用于使用短期凭据进行模拟，或者获取列表中最后一个账号的 access_token 所需的账号链，该账号将在请求中被模拟。如果设置为字符串，该账号必须授予原始账号 Service Account Token Creator IAM 角色。如果设置为序列，列表中的身份必须授予紧接其前的身份 Service Account Token Creator IAM 角色，列表中第一个账号将此角色授予原始账号。（模板化）

partition_id[source]¶

serialize()[source]¶

序列化 BigQueryTablePartitionExistenceTrigger 参数和 classpath。

async run()[source]¶

将一直运行直到 Google Big Query 中存在该表。