Real-Time Data Streaming and Search Indexing with Azure - Part 2 - Setup Resources
In the first part of this series, I designed and explained the high-level architecture to stream database-level changes to other components. Now, we are going to focus on the Azure Infrastructure needed to take steps towards the solution. Let's get going!
Azure Resources Needed
To start, we need to set up several key components in Azure. First, we will need a resource group that will act as a container for all the resources we create in this project.
To follow along our expiriment you can see all code in my github repository
For cost-effective solutions in any architecture, we can have shared workloads that span across multiple solutions. These can be considered the main core building blocks:
- Azure Event Hub: Provides the capability to stream a high volume of events.
- SQL Server: Hosts our database.
- Container Apps Environment: Hosts various container apps.
- Azure Container Registry (ACR): Allows pushing and pulling custom Docker images for the application code we will write.
- Azure AI Search: Optimizes our data for search operations.
Next, we have applications that utilize these shared workloads. Think of these as the application(s)/system:
- SQL Database: Serves as the source where changes are spawned via CRUD operations through CDC.
- Event Hub: Provides a hub to stream changes to.
- Container App: Debezium: Runs Debezium as a container app that connects to the SQL Server change stream and converts them to events to push to the Event Hub.
- Container App: Azure Function: A custom Azure Function application that handles change stream events from the Event Hub. Along with the Azure Function, a Storage Account is provisioned.
All these resources will be set up using bicep
and Azure Deployment Stacks. Read my article on Deployment Stacks to get started and learn about the awesome features provided.
First, I like to write bicep modules for the components to streamline the main infrastructure file for readability and adhere to specific naming conventions.
These modules are considered a minimal approach for my specific solution. Always consider using Azure Verified Modules or create your organization's version of
verified modules
.
Event Hub Namespace
@minLength(4)
param name string
param environmentName string
param location string
param sku string = 'Standard'
var eventHubNamespaceName = 'evhns-${name}-ns-${environmentName}'
resource eventHubNamespace 'Microsoft.EventHub/namespaces@2023-01-01-preview' = {
name: eventHubNamespaceName
location: location
sku: {
name: sku
tier: sku
capacity: 1
}
properties: {
isAutoInflateEnabled: false
maximumThroughputUnits: 0
}
}
output name string = eventHubNamespace.name
Sql Server
param serverName string
param environmentName string
param location string
param administratorLogin string
@secure()
param administratorLoginPassword string
var sqlServerName = 'sql-${serverName}-${environmentName}'
resource sqlServer 'Microsoft.Sql/servers@2022-05-01-preview' = {
name: sqlServerName
location: location
properties: {
administratorLogin: administratorLogin
administratorLoginPassword: administratorLoginPassword
}
}
output id string = sqlServer.id
output name string = sqlServer.name
Container App Environment
param name string
param environmentName string
param location string
var caeName = 'cae-${name}-${environmentName}'
resource environment 'Microsoft.App/managedEnvironments@2024-03-01' = {
name: caeName
location: location
properties: {
zoneRedundant: false
workloadProfiles: [
{
workloadProfileType: 'Consumption'
name: 'Consumption'
}
]
}
}
output name string = environment.name
ACR
@minLength(2)
param name string
param environmentName string
param location string
param acrSku string = 'Basic'
var acrName = 'acr${name}${environmentName}'
resource acrResource 'Microsoft.ContainerRegistry/registries@2023-01-01-preview' = {
name: acrName
location: location
sku: {
name: acrSku
}
properties: {
adminUserEnabled: true
}
}
output loginServer string = acrResource.properties.loginServer
output name string = acrResource.name
AI Search
param name string
@allowed([
'free'
'basic'
'standard'
'standard2'
'standard3'
'storage_optimized_l1'
'storage_optimized_l2'
])
param sku string = 'basic'
@minValue(1)
@maxValue(12)
param replicaCount int = 1
@allowed([
1
2
3
4
6
12
])
param partitionCount int = 1
param location string
resource search 'Microsoft.Search/searchServices@2020-08-01' = {
name: name
location: location
sku: {
name: sku
}
properties: {
replicaCount: replicaCount
partitionCount: partitionCount
}
}
Sql Database
param serverName string
param name string
param environmentName string
param location string
var sqlDbName = 'sqldb${name}${environmentName}'
resource sqlServer 'Microsoft.Sql/servers@2023-08-01-preview' existing = {
name: serverName
}
resource sqlDB 'Microsoft.Sql/servers/databases@2023-08-01-preview' = {
parent: sqlServer
name: sqlDbName
location: location
sku: {
name: 'GP_S_Gen5'
tier: 'GeneralPurpose'
family: 'Gen5'
capacity: 1
}
properties: {
collation: 'SQL_Latin1_General_CP1_CI_AS'
catalogCollation: 'SQL_Latin1_General_CP1_CI_AS'
maxSizeBytes: 1073741824
zoneRedundant: false
readScale: 'Disabled'
autoPauseDelay: 60
requestedBackupStorageRedundancy: 'Local'
#disable-next-line BCP036
minCapacity: '0.5'
}
}
output name string = sqlDB.name
Event Hub
param namespace string
param name string
param environmentName string
param messageRetentionInDays int = 7
param partitionCount int = 1
var eventHubName = 'evh-${name}-${environmentName}'
resource eventHubNamespace 'Microsoft.EventHub/namespaces@2023-01-01-preview' existing = {
name: namespace
}
resource eventHub 'Microsoft.EventHub/namespaces/eventhubs@2023-01-01-preview' = {
parent: eventHubNamespace
name: eventHubName
properties: {
messageRetentionInDays: messageRetentionInDays
partitionCount: partitionCount
}
}
output name string = eventHub.name
Container App
param name string
param location string
param environmentName string
param containerAppEnvironmentName string
param containerRegistryName string
param containerImageName string
param containerImageTag string
param replicaSizeCpu string
param replicaSizeMemory string
param minReplicas int = 1
param maxReplicas int = 1
param targetPort int = 80
param environmentVariables array = []
resource cr 'Microsoft.ContainerRegistry/registries@2023-01-01-preview' existing = {
name: containerRegistryName
}
resource cae 'Microsoft.App/managedEnvironments@2024-03-01' existing = {
name: containerAppEnvironmentName
}
var standardVariables = []
var containerVariables = union(environmentVariables, standardVariables)
resource ca 'Microsoft.App/containerApps@2024-03-01' = {
name: 'ca-${name}-${environmentName}'
location: location
properties: {
environmentId: cae.id
configuration: {
secrets: [
{
name: 'acr-password'
value: cr.listCredentials().passwords[0].value
}
]
activeRevisionsMode: 'Single'
ingress: {
allowInsecure: false
external: true
targetPort: targetPort
transport: 'auto'
}
registries: [
{
server: cr.properties.loginServer
passwordSecretRef: 'acr-password'
username: cr.listCredentials().username
}
]
}
workloadProfileName: 'Consumption'
template: {
containers: [
{
env: containerVariables
image: '${containerImageName}:${containerImageTag}'
name: 'main'
resources: {
cpu: json(replicaSizeCpu)
memory: replicaSizeMemory
}
}
]
scale: {
minReplicas: minReplicas
maxReplicas: maxReplicas
}
}
}
}
output domain string = cae.properties.defaultDomain
output name string = ca.name
output outboundIpAddresses array = ca.properties.outboundIpAddresses
Storage Account
@minLength(3)
param name string
param location string
param skuName string = 'Standard_LRS'
var storageAccountName = 'st${name}'
resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = {
name: storageAccountName
location: location
sku: {
name: skuName
}
kind: 'StorageV2'
properties: {
accessTier: 'Hot'
}
}
output storageAccountId string = storageAccount.id
output storageAccountName string = storageAccount.name
Using these modules, we can create an infrastructure boilerplate for our solution. Notice that we are targeting a subscription-wide deployment. This approach allows us to create the resource group within the template, along with all the other necessary resources. Resources and modules effectively use outputs to chain dependencies and pass variables to other dependent resources.
We also see that I've created a special module for debezium-app
, mostly for readability but also to be able to list the Event Hub namespace
connection strings and pass them to the container app
.
Solution infrastructure
targetScope = 'subscription'
param environmentName string
param location string = 'swedencentral'
param sqlServerUser string
@secure()
param sqlServerPassword string
@minLength(4)
@maxLength(4)
param uniqueId string = take(uniqueString('core'), 4)
var deploymentName = uniqueString(deployment().name)
var coreName = 'core-${uniqueId}'
resource rg 'Microsoft.Resources/resourceGroups@2023-07-01' = {
name: 'rg-product-${environmentName}'
location: location
}
module m_eventHubNamespace 'modules/event-hub-namespace/main.bicep' = {
scope: rg
name: 'm_hub_namespace-${deploymentName}'
params: {
name: coreName
environmentName: environmentName
location: location
}
}
module m_eventHub 'modules/event-hub-namespace/hub/main.bicep' = {
scope: rg
name: 'm_ns_hub-${deploymentName}'
params: {
namespace: m_eventHubNamespace.outputs.name
name: 'product'
environmentName: environmentName
}
}
module m_sqlServer 'modules/sql/server/main.bicep' = {
scope: rg
name: 'm_sql_server-${deploymentName}'
params: {
administratorLogin: sqlServerUser
administratorLoginPassword: sqlServerPassword
environmentName: environmentName
location: location
serverName: coreName
}
}
module m_sql_db 'modules/sql/db/main.bicep' = {
scope: rg
name: 'm_sql_db-${deploymentName}'
params: {
name: 'products'
environmentName: environmentName
location: location
serverName: m_sqlServer.outputs.name
}
}
module m_acr 'modules/container-registry/main.bicep' = {
scope: rg
name: 'm_acr-${deploymentName}'
params: {
name: replace(coreName, '-', '')
location: location
environmentName: environmentName
}
}
module m_cae 'modules/container-app-environment/main.bicep' = {
scope: rg
name: 'm_cae-${deploymentName}'
params: {
name: coreName
location: location
environmentName: environmentName
}
}
module m_search_ai 'modules/search-ai/main.bicep' = {
scope: rg
name: 'm_search_ai-${deploymentName}'
params: {
name: coreName
location: location
}
}
module m_ca_debezium 'debezium-app.bicep' = {
scope: rg
name: 'm_ca_debezium-${deploymentName}'
params: {
location: location
containerAppEnvironmentName: m_cae.outputs.name
containerRegistryName: m_acr.outputs.name
environmentName: environmentName
eventHubNameSpaceName: m_eventHubNamespace.outputs.name
name: 'debezium'
}
}
module m_st_account 'modules/storage-account/main.bicep' = {
scope: rg
name: 'm_st_account-${deploymentName}'
params: {
name: 'products${uniqueId}'
location: location
}
}
module m_ca_function 'modules/function/main.bicep' = {
scope: rg
name: 'm_ca_func-${deploymentName}'
params: {
name: 'change-processor'
location: location
containerImageName: 'mcr.microsoft.com/azure-functions/dotnet8-quickstart-demo:1.0'
environmentName: environmentName
managedEnvironmentName: m_cae.outputs.name
storageAccountName: m_st_account.outputs.storageAccountName
}
}
output resourceGroupName string = rg.name
output eventhubName string = m_eventHub.outputs.name
output eventhubNamespaceName string = m_eventHubNamespace.outputs.name
output sqlServerId string = m_sqlServer.outputs.id
output sqlServerName string = m_sqlServer.outputs.name
output sqlServerUser string = sqlServerUser
output sqlDatabaseName string = m_sql_db.outputs.name
output debeziumEndpoint string = m_ca_debezium.outputs.appEndpoint
output debeziumOutboundIps array = m_ca_debezium.outputs.outboundIps
Debezium module
param environmentName string
param name string
param location string
param containerAppEnvironmentName string
param containerRegistryName string
param eventHubNameSpaceName string
var deploymentName = uniqueString(deployment().name)
resource accessKeys 'Microsoft.EventHub/namespaces/authorizationRules@2024-01-01' existing = {
name: '${eventHubNameSpaceName}/RootManageSharedAccessKey'
}
module m_ca_debezium 'modules/container-app/main.bicep' = {
name: 'm_ca_debezium-${deploymentName}'
params: {
name: name
location: location
containerAppEnvironmentName: containerAppEnvironmentName
containerImageName: 'debezium/connect'
containerImageTag: '2.7'
containerRegistryName: containerRegistryName
environmentName: environmentName
replicaSizeCpu: '1'
replicaSizeMemory: '2Gi'
targetPort: 8083
environmentVariables: [
{
name: 'BOOTSTRAP_SERVERS'
value: '${eventHubNameSpaceName}.servicebus.windows.net:9093'
}
{
name: 'GROUP_ID'
value: '1'
}
{
name: 'CONFIG_STORAGE_TOPIC'
value: 'debezium_configs'
}
{
name: 'OFFSET_STORAGE_TOPIC'
value: 'debezium_offsets'
}
{
name: 'STATUS_STORAGE_TOPIC'
value: 'debezium_statuses'
}
{
name: 'CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE'
value: 'false'
}
{
name: 'CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE'
value: 'true'
}
{
name: 'CONNECT_REQUEST_TIMEOUT_MS'
value: '60000'
}
{
name: 'CONNECT_SECURITY_PROTOCOL'
value: 'SASL_SSL'
}
{
name: 'CONNECT_SASL_MECHANISM'
value: 'PLAIN'
}
{
name: 'CONNECT_SASL_JAAS_CONFIG'
value: 'org.apache.kafka.common.security.plain.PlainLoginModule required username="$$ConnectionString" password="${accessKeys.listKeys().primaryConnectionString}";'
}
{
name: 'CONNECT_PRODUCER_SECURITY_PROTOCOL'
value: 'SASL_SSL'
}
{
name: 'CONNECT_PRODUCER_SASL_MECHANISM'
value: 'PLAIN'
}
{
name: 'CONNECT_PRODUCER_SASL_JAAS_CONFIG'
value: 'org.apache.kafka.common.security.plain.PlainLoginModule required username="$$ConnectionString" password="${accessKeys.listKeys().primaryConnectionString}";'
}
{
name: 'CONNECT_CONSUMER_SECURITY_PROTOCOL'
value: 'SASL_SSL'
}
{
name: 'CONNECT_CONSUMER_SASL_MECHANISM'
value: 'PLAIN'
}
{
name: 'CONNECT_CONSUMER_SASL_JAAS_CONFIG'
value: 'org.apache.kafka.common.security.plain.PlainLoginModule required username="$$ConnectionString" password="${accessKeys.listKeys().primaryConnectionString}";'
}
]
}
}
output appEndpoint string = 'https://${m_ca_debezium.outputs.name}.${m_ca_debezium.outputs.domain}'
output outboundIps array = m_ca_debezium.outputs.outboundIpAddresses
You can also notice multiple output parameters in the solution infrastructure. This is to fully utilize an automatic deployment script for this project.
Deployment
In a real-world scenario, deployments are usually managed through various pipelines. However, for simplicity in our experiment, we will use a single PowerShell script to handle the deployment.
The deployment script will be thoroughly covered in the next part of this series. For now, to deploy the infrastructure, you simply need to execute the solution infrastructure .bicep
file as appropriate.
For those following along with this article, it's a good idea to change the
param uniqueId string = take(uniqueString('core'), 4)
to something other thancore
to avoid conflicts with others.
Deployment Stacks
param(
[string]$sqlServerDbPassword = "p@ssW0rd"
)
az stack sub create --name ChangeCapture --location swedencentral --template-file ".\infrastructure\main.bicep" --parameters ".\infrastructure\main.bicepparam" sqlServerPassword=$sqlServerDbPassword --dm none --yes
Deployment
param(
[string]$sqlServerDbPassword = "p@ssW0rd"
)
az deployment sub create --name ChangeCapture --location swedencentral --template-file ".\infrastructure\main.bicep" --parameters ".\infrastructure\main.bicepparam" sqlServerPassword=$sqlServerDbPassword
Always use strong passwords or other secure authentication methods instead of hardcoded strings and passwords for real world implementations. I can't stress this enough.
Wrapping up
In this part, we defined modules and integrated them to create the necessary solution infrastructure. We also briefly explored the deployment aspect of the infrastructure.
This was a short one, but in the next part, we will set up the CDC process and further refine our deployment script. That will be somewhat more content to digest.
In the mean time, Happy coding!