Yay! Some time for myself and some ideas about what to write about.
I have wrestled with Azure Data Explorer lately. It has been… A journey! I won’t go into all those details, but I think some tips and trix are prudent to publish.
I had some data coming in from Event Hub being captured to a Storage Account. At first we thought we wouldn’t need the data, but storage is cheap and we saved it. For now… Fast forward a couple of weeks and, yes, the data was needed. No problem! I can just ingest it from the Storage Account! Said and done. I clicked my way through the ingestion wizard (I am NOT fluent enough in KQL to do that kind of exercise in a query window. Yet…) and started an ingestion of ALL the data. Noob mistake! Took forever!
[!NOTE] Lesson 1
Try out ingestion on smaller data batches! It can take some time to do an ingestion, and if the data is not as you wanted it will NOT be fun to redo the operation.
When the data finally was in the database and in the table I designated for it there was another small problem: The Avro (capture from Event Hub to Storage comes in three different flavours: Avro, Parquet and Delta Lake) format. Avro is supported for ingestion and I thought it would actually interpret it all in a neat table with the Body column being set to Dynamic, where I hoped that the ingestion would store the Event Message Body in JSON format.
Nope. All I got was a lot of comma separated values?
How the…? Ok, I’ve been here long enough to recognise an array of ASCII values in decimal form. 91 = [ and 123 = {. So this is my JSON in ASCII? HOW do I translate that in KQL???
// convert AVRO-ASCII to Separate columns
DEMO-Table
| extend BodyJSON = parse_json(todynamic(make_string(['Body'])))
|mv-apply BodyJSON on (
project BodyJSON,
['TagName'] = tostring(BodyJSON.TagName),
['Value'] = todecimal(BodyJSON.Value),
['TimeStamp'] = todatetime(BodyJSON.TimeStamp))
| project-away Body, Offset, SequenceNumber, SystemProperties, Properties, id
So, the real magic happens on row three in my example. I take the Body-element (with ASCII-codes), Make that into a string, Make that string Dynamic and then I parse it to JSON…. I do not know if this is the most efficient way to do it, so do not hold that against me!
I saw some references to being able to choose the number of levels to ingest in the wizard, but I did not find that in my experimental environment. Maybe I will find it and return to this post to update it?