Build Your Own Audio/Video Analytics App With HPE Haven OnDemand – Part 1

In this first part of a two part tutorial, learn how to leverage HPE Haven OnDemand's Machine Learning APIs to build an audio/video analytics app with minimal time and effort.

Get ready for text indexing


We can use the Text Index Services form from Haven OnDemand website to create our indexing database with the name and fields as follows:

Index name: speechindex
Parametric fields: mediatype

That should be enough information for us to create our text index database on Haven OnDemand. The rest of the indexing fields will be created automatically when we add content to our text index using JSON format. To prepare for the text content to be indexed, we create a C# class named ContentIndex as shown below:

public class ContentIndex {
    public List document { get; set; }
    public class Document {
        public string content { set; get; }
        public List offset { set; get; }
        public List text { get; set; }
        public List concepts { get; set; }
        public List occurrences { get; set; }
        public string mediatype { get; set; }
        public string filename { get; set; }
        public string medianame { get; set; }
        public string language { get; set; }

When the response from the Speech Recognition API arrives, we will parse it and fill out the ContentIndex object with corresponding values.

SpeechRecognitionResponse resp = parser.ParseSpeechRecognitionResponse(ref response);
if (resp != null) {
    var indexItem = new ContentIndex.Document();
    foreach (SpeechRecognitionResponse.Document doc in resp.document) {
        indexItem.content += doc.content + " ";
    indexItem.medianame = mediaMetadata.contentName;
    indexItem.mediatype = mediaMetadata.contentType;
    indexItem.filename = mediaMetadata.fileName;
    indexItem.language = mediaMetadata.mediaLanguage;

Note that when we called the Speech Recognition API, the “interval” parameter was set to 0, so the response is an array of words, offset values and confidence scores. For this demo, we will ignore the confidence score. We need to join words from the content array to make a text string and we will store the text string in the “content” data member of the indexItem. Then we fill out the indexItem.offset and indexItem.text arrays with the timestamps and words from the response.

We also need some additional media metadata such as the media type, the file name, the media name which were taken from the media file. And the selected language code for the speech.

For now, it should be enough information for us to add the text content and the media’s metadata to our text index. However, we want to find the key concepts of the media content and store them in our text index so we don’t need to call the API to find key concepts every time we need them. We will discuss about the usage of the key concepts later in this blog. To get the key concepts, we will call the Concept Extraction API from the code below:

var Params = new Dictionary();
var purecontent = indexItem.content.Replace("", "");
Params.Add("text", purecontent);
hodClient.PostRequest(ref Params, hodApp, HODClient.REQ_MODE.SYNC);

The text returned from the Speech Recognition API may contain the “<Music/Noise>” terms (as the media may contain music or noise), that is why we need to purify the text by suppressing those unwanted terms.

The only parameter we need to specify for the Concept Extraction API is the input data source. In this case, we use the “text” keyword and provide the value of the clean text string.

When we receive the response from the Concept Extraction API, we will parse it and continue to fill out the indexItem.concepts and indexItem.occurrences arrays with values:

var result = parser.ParseConceptExtractionResponse(ref response);
if (result != null) {
    foreach (ConceptExtractionResponse.Concepts item in result.concepts) {

Now let’s add the information to our text index by calling the AddToTextIndex API.

var contentIndex = new ContentIndex();
contentIndex.document = new List&lt;ContentIndex.Document&gt;();
string jsonVal = JsonConvert.SerializeObject(contentIndex);
var Params = new Dictionary&lt;string, object&gt;();
Params.Add("index", "speechindex");
Params.Add("json", jsonVal);
hodClient.PostRequest(ref Params, hodApp, HODClient.REQ_MODE.SYNC);

First, we create a contentIndex object then fill the document list with the indexItem object. Then, we convert the contentIndex object to a JSON string and finally we call the Add to Text Index API.

That is all what we need to extract text from a media content and add the text and metadata to the text index database. We can repeat the process every time we have a new media content uploaded to our online media gallery.

Editor's note: The conclusion of this tutorial will be posted tomorrow on KDnuggets.

Original. Reposted with permission.