Azure Blob tags: Working with numbers
Azure Blob tags are an absolutely excellent feature. They are used to index files in a Blob storage according to custom-defined metadata. Blobs can subsequently be queried on this metadata with the use of basic equality operators such as “=, !=, <, and >”. If only our local file system would support such a feature, it would make finding files on my computer a whole lot easier.
To get a rough idea, this is ultimately what it comes down to in code using the .NET Azure Storage SDK:
Using this technology has allowed me to rapidly develop an application for uploading, searching, updating, and exporting photos of invoices as an extension for an accountancy application. Experiencing how great Azure Blob tags work, made me want to tell everybody. Hence, this blog.
Azure Blob tags have been available for quite some time. The documentation is sufficient to start experimenting joyfully and successfully. Its restrictions can be quite unclear at first but will grow on you in a short while.
I will not elaborate further on what Azure Blog tags are or the basic usage of Azure Blob tags, I assume that if you found this page, you are already familiar with the concept. In case you are not, I suggest you first give this blog about Azure Blob tags a quick read.
This article is especially helpful if you have been struggling for a while to find Azure Blobs using numeric Blob tags, yet still experience invalid query errors or unexpected search results. It may also help you to get acquainted with some of the implementation details of Azure Blob tags and their consequences for usability.
Range operators
All tag keys are text. To use relational, or range operators, Microsoft has chosen to implement lexicographic ordering/sorting. This sorting works excellently when working with dates and text, yet proposes its challenges when finding numbers.
Soon after experimenting with range operations on a wide variety of numbers, the maleficent “invalid range” query value error will test any ambitious engineer who aims to weaponize numeric blob tags. Let me give you an example, and if, like me, you experienced it before, you will immediately recognize the inevitable frustration that comes with it.
Encountering problems
Since I was working with invoices for an accountancy application, it was required that I provide the price as metadata to each invoice image. I had to offer the availability to search invoices by price. So that is what I did, faithfully tagging all new uploads with a price; just a boring regular numeric value. I encountered two problems:
Problem 1: Invalid Range Query Error
What seems to us like an ordinary simple query, turned out to be impossible for Microsoft:
Azure Blob Tag Search Expression:
“price” > ‘400’ AND “price” < ‘1000’
I will give you an example of what is not an invalid range:
Azure Blob Tag Search Expression:
“price” > ‘200’ AND “price” < ‘800’
Problem 2: Range Query Returns Unexpected Results
As the numeric range of ‘price’ values started to increase when tagging blobs in my use case, I noticed valid numeric range queries started to return incorrect results. For example, I was querying for a price between 2500 and 4100 and the query returned numbers that were way out of range, such as 260 and 310. Querying Blobs tagged with a numeric value, without yielding any kind of strategy, simply returns unexpected results.
Lexicographical Ordering
First, let me try and explain why this is so unexpectedly an ordeal we must endeavour. Before I will propose a fully validated hack, which will allow you to find blobs tagged with numbers up to the decimal points accurately.
In the Dutch language, we say: a warned man counts as two. In English, one tells a warned man afterward: “I told you so”. And so yes, I cannot be angry with Microsoft as they told me so:
“Be familiar with lexicographical ordering when setting and querying on tags.” - Microsoft Docs
Allow me to quickly illustrate how this lexicographical ordering works with numbers:
Numbers before, in a regular ascending sequence:
The same sequence of numbers after arranging them lexicographically
Now, it becomes apparent why my extremely complex query did not work. In the world of lexicographic ordering, 400 does not occur before 1000. At this point, I realized the invalid range query error was not a wildly strange phenomenon, but an expected and completely expected outcome when trying to find blobs by numeric tags.
Solution: ‘Work-with-Dates-Hack’
As crazy as it may sound, tagging blobs with dates works really, exceptionally well. Just make sure you tag the dates in the right text format; 2022-12-31 20:45:12Z. When you crave to find the one blob tagged with this date successfully, I can guarantee you the following tag query will be everything you need:
“uploadTime” > ‘2022-12-31 20:45:11Z’ AND “uploadTime” < ‘2022-12-31 20:45:13Z’
So, what is the hack I propose here? Simply convert your numeric value to a date-time value, and query away. As simple and strange as it sounds, it does work seamlessly and solved my headache.
Now, I can tag invoices with prices to the decimal points specific, such as 283.26. Whenever I need to create a tag for a decimal value, I invoke a helper method. See the following example:
This illustrates three basic steps for an Azure Blob tag for a decimal value:
- Multiply the value with a large number; large enough so a 0.01 value difference could be expressed in at least one second.
- Construct an instance of DateTime with this multiplied decimal value. This multiplied value represents the total number of ticks.
- Format the date time to a string value according to the format in the example above.
Job done. Have fun hacking numbers with Azure Blob tags.
Useful links to get started
Azure Blob tags: Working with numbers