I came across the following weird behaviour in an ISABOUT query in SQL SERVER, that involves weight keyword and the final rank of the results. I want to describe that behaviour here, just in case some has a good explanation over it!
This posts assumes some basic knowledge of querying with full text search
The following bullets, are the sum up of the behaviour. Notice how the results are reversed as the weight value goes down!
- weight(1): RANK of KEY 1 is 249
(results order 1,2,3) - weight(0.8): RANK of KEY 1 is 321
(weight down => rank up, results order 1,2,3) - weight(0.2): RANK of KEY 1 is 998
(weight down => rank up, results order 1,2,3) - weight(0.17): RANK of KEY 1 is 802
(weight down => rank down, results order 2,3,1) - weight(0.16): RANK of KEY 1 is 935
(weight down => rank up, results order 2,1,3) - weight(0.01): RANK of KEY 1 is 50
(weight down => rank down, results order 3,2,1)
As you can see, from 0.2 to 0.17 ranking decreases and results are messed up! From 0.16 results are inverted (the weight values that reproduce this behaviour depend on terms, columns searched, etc).
Microsoft states here that the actual value of RANK is meaningless, but I am sure the results order isn’t!
Reproducing the behaviour
These are the exact queries that I used to reproduce this behaviour:
QUERY 1 (weight 1): (Initial ranking)
Results
KEY RANK 1 249 2 156 3 114
QUERY 2 (weight 0.8): (Ranking increases, initial order is preserved)
Results
KEY RANK 1 321 2 201 3 146
QUERY 3 (weight 0.2): (Ranking increases, initial order is preserved)
Results
KEY RANK 1 998 2 877 3 692
QUERY 4 (weight 0.17): (Ranking decreases, best match is now last, inverted behavior for these terms begin at 0.17)
Results
KEY RANK 2 960 3 958 1 802
QUERY 5 (weight 0.16): (Ranking increases, best match is now second)
Results
KEY RANK 2 978 1 935 3 841
QUERY 6 (weight 0.01): (Ranking decreases, best match is last again)
Results
KEY RANK 3 105 2 77 1 50
This of course causes major problems when you use a custom “word-breaker”, creating something like this:
But for now, and until a better solution is found, I changed the algorithm of the custom word-breaker to always use weights above 0.2!