Azure Data Factory Plugin

The plugin provides functionality to interact with Azure Data Factory

Installation

  1. Copy the below line to dependencies section of the project build.gradle file

    Please make sure to use the same version for all VIVIDUS dependencies.
    Example 1. build.gradle
    implementation(group: 'org.vividus', name: 'vividus-plugin-azure-data-factory', version: '0.6.0')
    gradle
  2. If the project was imported to the IDE before adding new dependency, re-generate the configuration files for the used IDE and then refresh the project in the used IDE.

Configuration

Authentication

The authentication process relies on the configuration of the environment variables.

See the official "Azure identity" guide to get more details on what types of authentication could be used.

Azure environment selection

Azure environment could be optionally specified using the property azure.environment (sets the environment for all Azure plugins). The default value is AZURE.

The supported environments are only:

  • AZURE

  • AZURE_CHINA

  • AZURE_GERMANY

  • AZURE_US_GOVERNMENT

Azure subscription selection

Azure subscription must be configured via AZURE_SUBSCRIPTION_ID environment variable.

Steps

Run pipeline

Runs a pipeline in Data Factory, waits for its completion or until the timeout is reached and validates the run status is equal to the expected one.

When I run pipeline `$pipelineName` in Data Factory `$factoryName` from resource group `$resourceGroupName` with wait timeout `$waitTimeout` and expect run status to be equal to `$expectedPipelineRunStatus`
gherkin
  • $pipelineName - The name of the pipeline to run.

  • $factoryName - The name of the factory.

  • $resourceGroupName - The name of the resource group of the factory.

  • $waitTimeout - The maximum duration of time to wait for the pipeline completion in ISO-8601 Durations format.

  • $expectedPipelineRunStatus - The expected pipeline run status, e.g. Succeeded.

Example 2. Run pipeline
When I run pipeline `vividus-pipeline` in Data Factory `vividus-data-factory` from resource group `vividus-resource-group-ingestion` with wait timeout `PT30S` and expect run status to be equal to `Succeeded`
gherkin

Run pipeline with parameters

Runs a pipeline with the provided input parameters in Data Factory, waits for its completion or until the timeout is reached and validates the run status is equal to the expected one.

When I run pipeline `$pipelineName` in Data Factory `$factoryName` from resource group `$resourceGroupName` with wait timeout `$waitTimeout` and with input parameters `$inputParametersJson` and expect run status to be equal to `$expectedPipelineRunStatus`
gherkin
  • $pipelineName - The name of the pipeline to run.

  • $factoryName - The name of the factory.

  • $resourceGroupName - The name of the resource group of the factory.

  • $waitTimeout - The maximum duration of time to wait for the pipeline completion in ISO-8601 Durations format.

  • $inputParametersJson - The input parameters of the pipeline run in JSON format.

  • $expectedPipelineRunStatus - The expected pipeline run status, e.g. Succeeded.

Example 3. Run pipeline with parameters
When I run pipeline `vividus-pipeline` in Data Factory `vividus-data-factory` from resource group `vividus-resource-group-ingestion` with wait timeout `PT30S` and with input parameters `
{
  "param1": "abc",
  "param2": 2022
}
` and expect run status to be equal to `Succeeded`
gherkin

Collect pipeline runs

Collects pipeline runs in Data factory based on input filter conditions.

When I collect runs of pipeline `$pipelineName` filtered by:$filters in Data Factory `$factoryName` from resource group `$resourceGroupName` and save them as JSON to $scopes variable `$variableName`
gherkin
  • $pipelineName - The name of the pipeline to find runs.

  • $filters - The ExamplesTable with filters to be applied to the pipeline runs to limit the resulting set.

    Table 1. The supported filter types
    Type Alias Description

    LAST_UPDATED_AFTER

    last updated after

    The time at or after which the run event was updated in ISO-8601 format.

    LAST_UPDATED_BEFORE

    last updated before

    The time at or before which the run event was updated in ISO-8601 format.

    The filters can be combined in any order and in any composition.

    Example 4. The combination of filters
    |filterType         |filterValue              |
    |last updated after |2021-11-15T00:00:00+03:00|
    |last updated before|2021-11-15T00:00:00+03:00|
    gherkin
  • $factoryName - The name of the factory.

  • $resourceGroupName - The name of the resource group of the factory.

  • $scopes - The comma-separated set of the variables scopes.

  • $variableName - The variable name to store the pipeline runs in JSON format.

The client should have permission to run action Microsoft.DataFactory/factories/pipelineruns/read over scope /subscriptions/{subscription ID}/resourceGroups/{resource group name}/providers/Microsoft.DataFactory.

Example 5. Find pipeline runs from the last day
When I collect runs of pipeline `vividus-pipeline` filtered by:
|filterType          |filterValue                                      |
|LAST_UPDATED_AFTER  |#{generateDate(-P1D, yyyy-MM-dd'T'HH:mm:ssXXX)} |
|LAST_UPDATED_BEFORE |#{generateDate(P, yyyy-MM-dd'T'HH:mm:ssXXX)}    |
in Data Factory `vividus-data-factory` from resource group `vividus-resource-group-ingestion` and save them as JSON to scenario variable `pipeline-runs`
gherkin