The Azure Functions Fan Out Fan In pattern allows Azure Functions to scale and spread a workload across different worker functions that work together in parallel to process a batch of work. The results from each worker function is consolidated into a final result in a separate Function.
In this blog I write about the use of Azure Functions' Fan Out Fan In pattern in order to make use of this cloud scale parallel processing, on a collection of my voice utterances to turn them into text using Azure Speech To Text🛠⚡. These voice utterances in my case are a collection of 765 .wav files stored as blobs in an Azure Storage account.
Pre-Requisites
Azure Speech Services Resource (Free Tier)
Empty Blob Storage container to store result (in my case, this is called "outcontainer")
Blob Storage container to store voice utterance files (in my case, this is called "voice")
Visual Studio 2022 for deploying Azure Function App to Azure and development
A number of voice utterances as .wav files uploaded to the target storage container in Azure (uploaded to "voice" container)
The Architecture
The Code
First, an HTTP request is sent to the Client Function by using the Function URL by a Client Application, in my case a web browser:
The Client Function is triggered by the web browser request and calls the Orchestrator Function. This Orchestrator Function's job is to collect the batch of work items or jobs. The batch in this case contains all the filenames of the .wav audio files from the Azure blob storage container.
The Orchestrator Function then creates (and manages) an "orchestra" 🎼 of worker Activity Functions that first collect the batch of items ("GetBlobNamesBatch"), and then process the items in the batch by creating multiple instances of the worker Activity Function ("ProcessAudio"):
The Orchestrator Function will first call the batch collection GetBlobNamesBatch Activity Function here:
Next, audio files are processed in parallel with the ProcessAudio Activity Function:
The ProcessAudio Function receives the name of the work item in the form of the name of the .wav file from the known Blob storage container (in my example the container is called "voice" in my storage account). The .wav file is read as Stream object and parsed in to a SpeechRecognizer object that is hydrated with an Audio Config (that is hydrated with the audio stream details) and a Speech Config (that is hydrated with details of the Speech Services resource in Azure).
As processing continues, the logs show something like this, with different instances of the ProcessAudio Activity running in parallel:
When all audio files are recognised and all text results are returned through the Orchestrator Function automatically managing the aggregation of the results, the aggregated result is sent to the final Function for processing, shown here as ResultSummary:
The Results!!
For Reference, my package versions are as follows:
Now this makes it possible to to simply focus on recording a large amount of utterances without having to worry about writing them manually myself, as long I speak clearly with no background noise in my recordings, the Fan Out/Fan In Azure Function will be do that very efficiently for me 😃!! . When Functions work together like this, you can almost hear the harmony of it all!!
N.B Azure Speech To Text did have some difficulties on about 3 utterances and produced blank outputs, most likely because the recording quality was not the best at the time I recorded those ones. Might be an easy fix with a better recording environment with less echo on those entries!!!