Just a few months ago, I had the pleasure of presenting on Sitecore Virtual Developer Day on how to use Azure Data Factory to Transform, Load, and Analyze Sitecore Web Analytics data. In today’s blog, I want to expand on that and cover the usage of the relatively new Analytics Platform from Microsoft – called Azure Synapse.
Sitecore Analytics is a powerful and feature-rich analytics platform designed to capture vast amounts of customer experience data. It stores Interactions and Contact data in the Sitecore Experience Database(xDB ) and relays on xConnects service layer for data access. Other Sitecore Cloud-first platforms are coming out with the Composability-first principle, making data access a core component of the Sitecore Ecosystem. As these systems and all other organizational platforms (CRM, ERP, etc), generate large amounts of operational and historical data in various formats, the data warehousing solution becomes a core necessity for any organization striving to utilize data in their decision-making. Choosing the right Data Warehousing solution could be a challenging process, considering the number of products available on the market today, but the Microsoft ecosystem is a formidable choice and possibly the best choice for most. For the fourth year running Microsoft was selected as a leader in the 2022 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms. Naturally, Synapse Analytics is a great choice for not just Data Warehousing but also for Data Lake, Data Engineering, and Data Transformation tasks, blending multiple tools into a single unified service that provides end-to-end analytics with infinite scale.
Azure Synapse Studio enables data engineers and data scientists to work side by side to ingest and analyze data through a single web-based user interface. Even though the Azure Synapse was introduced in 2020, some of the underlying systems have been used by data engineers and scientists for decades. Apache Spark, an industry standard for high-performance data analysis, is seamlessly integrated into the tool and allows the application of advanced data engineering concepts and ML algorithms. Synapse SQL can be used in a dedicated or serverless way to enable a cost-efficient approach to data access while ensuring availability and performance under high loads. Similarly to Azure Data Factory, Synapse has best in class codeless ETL engine designed to orchestrate data transformation, notebooks, Spark jobs, and more. Finally, Synapse can be easily integrated with Power BI for data visualization.
Now, let’s have a quick overview of how to connect, normalize and visualize Sitecore XP Analytics Data.
Connecting Sitecore to Synapse
Configure Self Hosted Integration runtime using Synapse Studio, validate configuration. More information here
Create Orchestration Pipeline and Normalize Sitecore data
Integrating Sitecore xConnect to Synapse starts with constructing OData URL against xConnect APIs(1) and configuring Synapse Pipeline to run the ETL process(3). One of the known complexities of integrating with xConnnect is configuring the pipeline to process large amounts of data, often returned through multiple and subsequent requests. Fortunately, the OData response from Sitecore contains the “next” page iterator, which we just need to capture and loop(1) through until no URL is coming in the “@odata.nextLink” field of the OData response. Additionally, as of this writing, the Sitecore OData feed is not compatible with the OData input of Azure ADF or Synapse; thus, an additional step is required to cleanse OData using the basic Azure Function. Finally, we use Synapse Dataset(4) to flatten the semi-structured data we get from Sitecore and route events that are coming as an array of dynamic objects to the appropriate data sink. We use data sets to additionally enhance data by applying formatting to Sitecore Datatypes.
Accessing Sitecore xDB Analytics Data in Azure
The external table, created using “CREATE EXTERNAL TABLE AS SELECT” of synapses, allows us to query Sitecore data stored on Azure storage in parquet format using a SELECT statement. This is also a vital component for integrating Sitecore Data with Power BI. I used Serverless SQL Pool to enable SQL on-demand functionality here at an extremely affordable cost of only ~$ 5 USD per 1 TB of data processed.
Connect Synapse to Power BI
There are two core ways to connect Synapse with PowerBI:
Power BI Synapse Linked Service
This is the most straightforward way to integrate Power BI with Synapse. Using this method developers can work with power BI reports and datasets directly from the Synapse interface. Unfortunately as of this moment, the Power BI can only connect to the same account under which the Synapse runs- this was not an option for my demo, as I am relying on VS Enterprise for Synapse and a corporate account for Power BI.
Serverless/Dedicated SQL pool
This can be called a classic approach and requires only SQL Connection String details, such as Host and SQL Server Credentials. In this way, we need to use SA password, and the integration is done outside of Synapse. We can choose an option to use Direct Query for light loads or import data into Power BI for better performance of large datasets.
More information can be found here
Finally, once Power BI is integrated, visualization can be started. I used the in-preview feature of Power BI called Insight. This feature allows us to find and explore insights such as anomalies, trends, and KPI analysis without any setup or configuration
As Sitecore Ecosystem moves to Composable, data integration between different systems and platform becomes an essential task for any data-driven organization. In addition to operational data integration, a data lake and data warehouse solution would be needed to establish an analysis of historical data. Selecting Synapse as a Data Analytics platform is the best, if not the only cost-efficient and robust option for many organizations looking to implement data warehousing solutions.