Azure Synapse 101

The following post is intended for users that are new to Synapse and wish to get a crash course.

Sales

Some notes for the sales guys:

  • Synapse was formerly SQL Data Warehouse
  • It comprises:
    • Synapse SQL: T-SQL based analytics
      • SQL pool (pay per DWU provisioned)
      • SQL on-demand (pay per TB processed) (preview)
    • Spark: Apache Spark
    • Synapse Pipelines: Data integration
    • Studio: Unified development environment
  • It is intended to:
    • Simplify the build of data pipelines by providing a single environment to work in
    • Consolidate enterprise data warehousing and big data analytics
  • Comparable to :
    • SnowFlake
    • BigQuery
    • Redshift

Platform

  • Apache Spark is part of the underlying platform, as is the case with Databricks
  • Both Databricks cluster and the Azure Synapse access a common Blob storage container to exchange data between these two systems
  • Microsoft decided not to integrate Databricks into Synapse, instead they included Spark Pools

Features and Concepts

  • Languages supported include SQL, Python, .NET, Java, Scala and R
  • As with Databricks, a key feature is the concept of a Notebook. A notebook is a browser-based interface to a document (i.e. notebook) that contains runnable code, visualizations, etc. This is the primary location where a user would work.
  • There are two SQL options with Synapse:
    • SQL On Demand
    • SQL Pool
  • Choosing between Azure Synapse and Azure SQL Database is assisted by considering the following:
    • Azure SQL DB has a size limit for 8TB (General Purpose Tier) or 4TB (Business-critical tier). Azure Synapse is a good fit for 1 TB and above.
    • Azure SQL DB suits and OLTP workload, Synapse is a great fit for the OLAP workloads.
    • Azure SQL DB supports 6400 concurrent workers, Azure Synapse only supports 128.
    • Azure SQL DB has a set service tier configured for the compute instance, Azure Synapse scales in a linear fashion via SQL pool.
    • Azure SQL DB supports Always Encrypted while Azure Synapse does not.
    • Azure Synapse supports PolyBase to allow querying from external sources, Azure SQL DB doesn’t. 

Synapse SQL Options

The two options are:

SQL On Demand

SQL on-demand allows you to query files in your Azure storage accounts. It doesn’t have local storage or ingestion capabilities. So all files that the query targets are external to SQL on-demand. Everything related to reading files from storage might have an impact on query performance

SQL Pools

SQL pool offers T-SQL based compute and storage capabilities. After creating a SQL pool in your Synapse workspace, data can be loaded, modeled, processed, and delivered.

Loading Data

First, load your data into Azure Data Lake Storage or Azure Blob Storage. Next, use PolyBase to load your data into staging tables.