Real-Time Data Streaming and Search Indexing with Azure - Part 2 - Setup Resources

Real-Time Data Streaming and Search Indexing with Azure - Part 2 - Setup Resources

In the first part of this series, I designed and explained the high-level architecture to stream database-level changes to other components. Now, we are going to focus on the Azure Infrastructure needed to take steps towards the solution. Let's get going!

Azure Resources Needed

To start, we need to set up several key components in Azure. First, we will need a resource group that will act as a container for all the resources we create in this project.

To follow along our expiriment you can see all code in my github repository

For cost-effective solutions in any architecture, we can have shared workloads that span across multiple solutions. These can be considered the main core building blocks:

  1. Azure Event Hub: Provides the capability to stream a high volume of events.
  2. SQL Server: Hosts our database.
  3. Container Apps Environment: Hosts various container apps.
  4. Azure Container Registry (ACR): Allows pushing and pulling custom Docker images for the application code we will write.
  5. Azure AI Search: Optimizes our data for search operations.

Next, we have applications that utilize these shared workloads. Think of these as the application(s)/system:

  1. SQL Database: Serves as the source where changes are spawned via CRUD operations through CDC.
  2. Event Hub: Provides a hub to stream changes to.
  3. Container App: Debezium: Runs Debezium as a container app that connects to the SQL Server change stream and converts them to events to push to the Event Hub.
  4. Container App: Azure Function: A custom Azure Function application that handles change stream events from the Event Hub. Along with the Azure Function, a Storage Account is provisioned.

All these resources will be set up using bicep and Azure Deployment Stacks. Read my article on Deployment Stacks to get started and learn about the awesome features provided.

First, I like to write bicep modules for the components to streamline the main infrastructure file for readability and adhere to specific naming conventions.

These modules are considered a minimal approach for my specific solution. Always consider using Azure Verified Modules or create your organization's version of verified modules.

Event Hub Namespace

@minLength(4)
param name string
param environmentName string
param location string
param sku string = 'Standard'

var eventHubNamespaceName = 'evhns-${name}-ns-${environmentName}'

resource eventHubNamespace 'Microsoft.EventHub/namespaces@2023-01-01-preview' = {
  name: eventHubNamespaceName
  location: location
  sku: {
    name: sku
    tier: sku
    capacity: 1
  }
  properties: {
    isAutoInflateEnabled: false
    maximumThroughputUnits: 0
  }
}

output name string = eventHubNamespace.name

Sql Server

param serverName string
param environmentName string
param location string
param administratorLogin string
@secure()
param administratorLoginPassword string

var sqlServerName = 'sql-${serverName}-${environmentName}'

resource sqlServer 'Microsoft.Sql/servers@2022-05-01-preview' = {
  name: sqlServerName
  location: location
  properties: {
    administratorLogin: administratorLogin
    administratorLoginPassword: administratorLoginPassword
  }
}

output id string = sqlServer.id
output name string = sqlServer.name

Container App Environment

param name string
param environmentName string
param location string

var caeName = 'cae-${name}-${environmentName}'

resource environment 'Microsoft.App/managedEnvironments@2024-03-01' = {
   name: caeName
   location: location
   properties: {
      zoneRedundant: false
      workloadProfiles: [
         {
            workloadProfileType: 'Consumption'
            name: 'Consumption'
         }
      ]
   }
}

output name string = environment.name

ACR


@minLength(2)
param name string
param environmentName string
param location string
param acrSku string = 'Basic'

var acrName = 'acr${name}${environmentName}'

resource acrResource 'Microsoft.ContainerRegistry/registries@2023-01-01-preview' = {
  name: acrName
  location: location
  sku: {
    name: acrSku
  }
  properties: {
    adminUserEnabled: true
  }
}

output loginServer string = acrResource.properties.loginServer
output name string = acrResource.name
param name string

@allowed([
  'free'
  'basic'
  'standard'
  'standard2'
  'standard3'
  'storage_optimized_l1'
  'storage_optimized_l2'
])
param sku string = 'basic'

@minValue(1)
@maxValue(12)
param replicaCount int = 1

@allowed([
  1
  2
  3
  4
  6
  12
])
param partitionCount int = 1

param location string

resource search 'Microsoft.Search/searchServices@2020-08-01' = {
  name: name
  location: location
  sku: {
    name: sku
  }
  properties: {
    replicaCount: replicaCount
    partitionCount: partitionCount
  }
}

Sql Database

param serverName string
param name string
param environmentName string
param location string

var sqlDbName = 'sqldb${name}${environmentName}'

resource sqlServer 'Microsoft.Sql/servers@2023-08-01-preview' existing = {
  name: serverName
}

resource sqlDB 'Microsoft.Sql/servers/databases@2023-08-01-preview' = {
  parent: sqlServer
  name: sqlDbName
  location: location
  sku: {
    name: 'GP_S_Gen5'
    tier: 'GeneralPurpose'
    family: 'Gen5'
    capacity: 1
  }
  properties: {
    collation: 'SQL_Latin1_General_CP1_CI_AS'
    catalogCollation: 'SQL_Latin1_General_CP1_CI_AS'
    maxSizeBytes: 1073741824
    zoneRedundant: false
    readScale: 'Disabled'
    autoPauseDelay: 60
    requestedBackupStorageRedundancy: 'Local'
    #disable-next-line BCP036
    minCapacity: '0.5'
  }
}

output name string = sqlDB.name

Event Hub

param namespace string
param name string
param environmentName string
param messageRetentionInDays int = 7
param partitionCount int = 1

var eventHubName = 'evh-${name}-${environmentName}'

resource eventHubNamespace 'Microsoft.EventHub/namespaces@2023-01-01-preview' existing = {
  name: namespace
}

resource eventHub 'Microsoft.EventHub/namespaces/eventhubs@2023-01-01-preview' = {
  parent: eventHubNamespace
  name: eventHubName
  properties: {
    messageRetentionInDays: messageRetentionInDays
    partitionCount: partitionCount
  }
}

output name string = eventHub.name

Container App

param name string
param location string
param environmentName string
param containerAppEnvironmentName string
param containerRegistryName string
param containerImageName string
param containerImageTag string
param replicaSizeCpu string
param replicaSizeMemory string
param minReplicas int = 1
param maxReplicas int = 1
param targetPort int = 80
param environmentVariables array = []

resource cr 'Microsoft.ContainerRegistry/registries@2023-01-01-preview' existing = {
  name: containerRegistryName
}

resource cae 'Microsoft.App/managedEnvironments@2024-03-01' existing = {
  name: containerAppEnvironmentName
}

var standardVariables = []

var containerVariables = union(environmentVariables, standardVariables)

resource ca 'Microsoft.App/containerApps@2024-03-01' = {
  name: 'ca-${name}-${environmentName}'
  location: location
  properties: {
    environmentId: cae.id
    configuration: {
      secrets: [
        {
          name: 'acr-password'
          value: cr.listCredentials().passwords[0].value
        }
      ]
      activeRevisionsMode: 'Single'
      ingress: {
        allowInsecure: false
        external: true
        targetPort: targetPort
        transport: 'auto'
      }
      registries: [
        {
          server: cr.properties.loginServer
          passwordSecretRef: 'acr-password'
          username: cr.listCredentials().username
        }
      ]
    }
    workloadProfileName: 'Consumption'
    template: {
      containers: [
        {
          env: containerVariables
          image: '${containerImageName}:${containerImageTag}'
          name: 'main'
          resources: {
            cpu: json(replicaSizeCpu)
            memory: replicaSizeMemory
          }
        }
      ]
      scale: {
        minReplicas: minReplicas
        maxReplicas: maxReplicas
      }
    }
  }
}

output domain string = cae.properties.defaultDomain
output name string = ca.name
output outboundIpAddresses array = ca.properties.outboundIpAddresses

Storage Account

@minLength(3)
param name string
param location string
param skuName string = 'Standard_LRS'

var storageAccountName = 'st${name}'

resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = {
   name: storageAccountName
   location: location
   sku: {
      name: skuName
   }
   kind: 'StorageV2'
   properties: {
      accessTier: 'Hot'
   }
}

output storageAccountId string = storageAccount.id
output storageAccountName string = storageAccount.name

Using these modules, we can create an infrastructure boilerplate for our solution. Notice that we are targeting a subscription-wide deployment. This approach allows us to create the resource group within the template, along with all the other necessary resources. Resources and modules effectively use outputs to chain dependencies and pass variables to other dependent resources.

We also see that I've created a special module for debezium-app, mostly for readability but also to be able to list the Event Hub namespace connection strings and pass them to the container app.

Solution infrastructure

targetScope = 'subscription'

param environmentName string
param location string = 'swedencentral'
param sqlServerUser string
@secure()
param sqlServerPassword string
@minLength(4)
@maxLength(4)
param uniqueId string = take(uniqueString('core'), 4)

var deploymentName = uniqueString(deployment().name)
var coreName = 'core-${uniqueId}'

resource rg 'Microsoft.Resources/resourceGroups@2023-07-01' = {
  name: 'rg-product-${environmentName}'
  location: location
}

module m_eventHubNamespace 'modules/event-hub-namespace/main.bicep' = {
  scope: rg
  name: 'm_hub_namespace-${deploymentName}'
  params: {
    name: coreName
    environmentName: environmentName
    location: location
  }
}

module m_eventHub 'modules/event-hub-namespace/hub/main.bicep' = {
  scope: rg
  name: 'm_ns_hub-${deploymentName}'
  params: {
    namespace: m_eventHubNamespace.outputs.name
    name: 'product'
    environmentName: environmentName
  }
}

module m_sqlServer 'modules/sql/server/main.bicep' = {
  scope: rg
  name: 'm_sql_server-${deploymentName}'
  params: {
    administratorLogin: sqlServerUser
    administratorLoginPassword: sqlServerPassword
    environmentName: environmentName
    location: location
    serverName: coreName
  }
}

module m_sql_db 'modules/sql/db/main.bicep' = {
  scope: rg
  name: 'm_sql_db-${deploymentName}'
  params: {
    name: 'products'
    environmentName: environmentName
    location: location
    serverName: m_sqlServer.outputs.name
  }
}

module m_acr 'modules/container-registry/main.bicep' = {
  scope: rg
  name: 'm_acr-${deploymentName}'
  params: {
    name: replace(coreName, '-', '')
    location: location
    environmentName: environmentName
  }
}

module m_cae 'modules/container-app-environment/main.bicep' = {
  scope: rg
  name: 'm_cae-${deploymentName}'
  params: {
    name: coreName
    location: location
    environmentName: environmentName
  }
}

module m_search_ai 'modules/search-ai/main.bicep' = {
  scope: rg
  name: 'm_search_ai-${deploymentName}'
  params: {
    name: coreName
    location: location
  }
}

module m_ca_debezium 'debezium-app.bicep' = {
   scope: rg
   name: 'm_ca_debezium-${deploymentName}'
   params: {
      location: location
      containerAppEnvironmentName: m_cae.outputs.name
      containerRegistryName: m_acr.outputs.name
      environmentName:  environmentName
      eventHubNameSpaceName: m_eventHubNamespace.outputs.name
      name: 'debezium'
   }
}

module m_st_account 'modules/storage-account/main.bicep' = {
  scope: rg
  name: 'm_st_account-${deploymentName}'
  params: {
    name: 'products${uniqueId}'
    location: location
  }
}

module m_ca_function 'modules/function/main.bicep' = {
  scope: rg
  name: 'm_ca_func-${deploymentName}'
  params: {
    name: 'change-processor'
    location: location
    containerImageName: 'mcr.microsoft.com/azure-functions/dotnet8-quickstart-demo:1.0'
    environmentName: environmentName
    managedEnvironmentName: m_cae.outputs.name
    storageAccountName: m_st_account.outputs.storageAccountName
  }
}

output resourceGroupName string = rg.name
output eventhubName string = m_eventHub.outputs.name
output eventhubNamespaceName string = m_eventHubNamespace.outputs.name
output sqlServerId string = m_sqlServer.outputs.id
output sqlServerName string = m_sqlServer.outputs.name
output sqlServerUser string = sqlServerUser
output sqlDatabaseName string = m_sql_db.outputs.name
output debeziumEndpoint string = m_ca_debezium.outputs.appEndpoint
output debeziumOutboundIps array = m_ca_debezium.outputs.outboundIps

Debezium module

param environmentName string
param name string
param location string
param containerAppEnvironmentName string
param containerRegistryName string
param eventHubNameSpaceName string

var deploymentName = uniqueString(deployment().name)

resource accessKeys 'Microsoft.EventHub/namespaces/authorizationRules@2024-01-01' existing = {
  name: '${eventHubNameSpaceName}/RootManageSharedAccessKey'
}

module m_ca_debezium 'modules/container-app/main.bicep' = {
  name: 'm_ca_debezium-${deploymentName}'
  params: {
    name: name
    location: location
    containerAppEnvironmentName: containerAppEnvironmentName
    containerImageName: 'debezium/connect'
    containerImageTag: '2.7'
    containerRegistryName: containerRegistryName
    environmentName: environmentName
    replicaSizeCpu: '1'
    replicaSizeMemory: '2Gi'
    targetPort: 8083
    environmentVariables: [
      {
        name: 'BOOTSTRAP_SERVERS'
        value: '${eventHubNameSpaceName}.servicebus.windows.net:9093'
      }
      {
        name: 'GROUP_ID'
        value: '1'
      }
      {
        name: 'CONFIG_STORAGE_TOPIC'
        value: 'debezium_configs'
      }
      {
        name: 'OFFSET_STORAGE_TOPIC'
        value: 'debezium_offsets'
      }
      {
        name: 'STATUS_STORAGE_TOPIC'
        value: 'debezium_statuses'
      }
      {
        name: 'CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE'
        value: 'false'
      }
      {
        name: 'CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE'
        value: 'true'
      }
      {
        name: 'CONNECT_REQUEST_TIMEOUT_MS'
        value: '60000'
      }
      {
        name: 'CONNECT_SECURITY_PROTOCOL'
        value: 'SASL_SSL'
      }
      {
        name: 'CONNECT_SASL_MECHANISM'
        value: 'PLAIN'
      }
      {
        name: 'CONNECT_SASL_JAAS_CONFIG'
        value: 'org.apache.kafka.common.security.plain.PlainLoginModule required username="$$ConnectionString" password="${accessKeys.listKeys().primaryConnectionString}";'
      }
      {
        name: 'CONNECT_PRODUCER_SECURITY_PROTOCOL'
        value: 'SASL_SSL'
      }
      {
        name: 'CONNECT_PRODUCER_SASL_MECHANISM'
        value: 'PLAIN'
      }
      {
        name: 'CONNECT_PRODUCER_SASL_JAAS_CONFIG'
        value: 'org.apache.kafka.common.security.plain.PlainLoginModule required username="$$ConnectionString" password="${accessKeys.listKeys().primaryConnectionString}";'
      }
      {
        name: 'CONNECT_CONSUMER_SECURITY_PROTOCOL'
        value: 'SASL_SSL'
      }
      {
        name: 'CONNECT_CONSUMER_SASL_MECHANISM'
        value: 'PLAIN'
      }
      {
        name: 'CONNECT_CONSUMER_SASL_JAAS_CONFIG'
        value: 'org.apache.kafka.common.security.plain.PlainLoginModule required username="$$ConnectionString" password="${accessKeys.listKeys().primaryConnectionString}";'
      }
    ]
  }
}

output appEndpoint string = 'https://${m_ca_debezium.outputs.name}.${m_ca_debezium.outputs.domain}'
output outboundIps array = m_ca_debezium.outputs.outboundIpAddresses

You can also notice multiple output parameters in the solution infrastructure. This is to fully utilize an automatic deployment script for this project.

Deployment

In a real-world scenario, deployments are usually managed through various pipelines. However, for simplicity in our experiment, we will use a single PowerShell script to handle the deployment.

The deployment script will be thoroughly covered in the next part of this series. For now, to deploy the infrastructure, you simply need to execute the solution infrastructure .bicep file as appropriate.

For those following along with this article, it's a good idea to change the param uniqueId string = take(uniqueString('core'), 4) to something other than core to avoid conflicts with others.

Deployment Stacks

param(
   [string]$sqlServerDbPassword = "p@ssW0rd"
)

az stack sub create --name ChangeCapture --location swedencentral --template-file ".\infrastructure\main.bicep" --parameters ".\infrastructure\main.bicepparam" sqlServerPassword=$sqlServerDbPassword --dm none --yes

Deployment

param(
   [string]$sqlServerDbPassword = "p@ssW0rd"
)

az deployment sub create --name ChangeCapture --location swedencentral --template-file ".\infrastructure\main.bicep" --parameters ".\infrastructure\main.bicepparam" sqlServerPassword=$sqlServerDbPassword

Always use strong passwords or other secure authentication methods instead of hardcoded strings and passwords for real world implementations. I can't stress this enough.

Wrapping up

In this part, we defined modules and integrated them to create the necessary solution infrastructure. We also briefly explored the deployment aspect of the infrastructure.

This was a short one, but in the next part, we will set up the CDC process and further refine our deployment script. That will be somewhat more content to digest.

In the mean time, Happy coding!