Azure Data Factory

The following post is intended for users that are new to Azure Data Factory (ADF) and wish to get a crash course.

What is Azure Data Factory?

Courtesy of MSoft

Azure Data Factory is Azure’s cloud ETL service for scale-out serverless data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management. You can also lift and shift existing SSIS packages to Azure and run them with full compatibility in ADF.

https://docs.microsoft.com/en-us/azure/data-factory/

ELT or ETL

Comparing ADF to SSIS at this point can add confusion. SSIS is intended to Extract, Transform and Load, whereas ADF is designed to Extract, Load and then Transform the data. This is because ADF is designed with big data in mind and therefore transforming the data in memory is not feasible, it instead leverages the power of the underlying data technology to support the Transformation.

Activity

Activities in a pipeline define actions to perform on the data (e.g. copy, transform, etc.)

Pipeline

A pipeline is grouping of activities that together perform a task

Starting a Pipeline

A pipeline can be started in two ways

  • Manual (on-demand)
  • Trigger (schedule or event)
    • Schedule trigger
    • Tumbling window trigger that operates on a periodic interval
    • Event-based trigger

Azure PowerShell

Azure PowerShell is a set of cmdlets for managing Azure resources directly from the PowerShell command line. It enables activities from the command line like Bulk and Incremental data movement

SDKs

Pipelines can be authored, managed, and monitored via the following IDEs:

  • Python SDK
  • PowerShell CLI
  • C# SDK

Integration Runtime

The integration runtime is a component that enables:

  • Data movement, between the source and destination data stores in a scalable manner. It provides support for built-in connectors, format conversion and column mapping.
  • Dispatch activities to execute SSIS packages.
  • Natively executes SSIS packages in a managed Azure compute environment.
  • Supports dispatching and monitoring transformation activities running on a variety of compute services (e.g. Azure HDInsight, Azure Machine Learning, SQL Server, etc.).

Leave a Reply

Your email address will not be published. Required fields are marked *

RSS
LinkedIn
Close Bitnami banner
Bitnami