Quantcast
Viewing latest article 8
Browse Latest Browse All 94

Performing queries in .Net with NEST

First of all, I created a document model in C# named EsOrganisation with some basic fields:

    [ElasticsearchType(Name = "organisation")]
    public class EsOrganisation
    {
        public Guid Id { get; set; }
        public DateTimeOffset CreatedDate { get; set; }
        public DateTimeOffset? UpdatedDate { get; set; }
        public int OrganisationTypeId { get; set; }
        public string OrganisationName { get; set; }
        public List<string> OrganisationAliases { get; set; }
        public List<string> OrganisationKeywords { get; set; }
        public List<int> Products { get; set; }
    }

Then I also created a factory to retrieve the Nest.ElasticClient, to simplify just have in mind that when I call to client.SearchAsync() I have already instantiated and prepared it.

Structured vs Unstructured Search

Structured or Unstructured Search refers as to how are the filters applied, Structured search refers to data like dates, times or numbers which can have a range or an absolute value in the search and the matches are either yes or no, but can’t be partially a match. Strings can also be structured like in a post labels, either you have the label or you don’t. Unstructured search then is about partial matches and that’s where score comes into play to determine the relevancy of the match.

Adding pagination

            var skipAmount = 20;
            var takeAmount = 10;
            var q1 = await client.SearchAsync<EsOrganisation>(s => s
                    .From(skipAmount)
                    .Size(takeAmount)
            );

Filtering by integer fields

            // Search for documents that have a certain productId
            var q2 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize)
                    .Query(q => q.Term(c => c.Field(p => p.Products).Value(3)))
            );

            // Search for documents included in an array of productIds (1,2,3,4)
            var q3 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize)
                    .Query(q => q.Terms(c => c.Field(p => p.Products).Terms(1, 2, 3, 4)))
            );

            // or
            var myList = new List<int>() {1, 2, 3, 4};
            var q4 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize)
                    .Query(q => q.Terms(c => c.Field(p => p.Products).Terms(myList)))
            );

Filtering by dates

            // Date range: year 2017
            var d1 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.DateRange(r => r
                            .Field(f => f.CreatedDate)
                            .GreaterThanOrEquals(new DateTime(2017, 01, 01))
                            .LessThan(new DateTime(2018, 01, 01))
                    ))
            );

More on date queries.

Filtering strings – Unstructured queries

Unstructured queries allow for partial matches, which is counted into the score to determine who matches better. Match(), Prefix() and MatchPhrasePrefix() are all unstructured queries.

            // Match exact word (one of the searched words or more)
            var t1 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.Match(m => m.Field(f => f.OrganisationName)
                            .Query("one two three")))
            );
            
            // starts with, only accepts one value, doesnt work if supplied with more than one word
            var t3 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.Prefix(m => m.Field(f => f.OrganisationName)
                        .Value("one")
                        //.Value("one two") <- doesn't work
                        ))
            );

            // exact match, last word can be prefixed
            var t4 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName)
                        .Query("one two thr")))
            );

            // words can be separated/disordered by amount of changes (slops)
            var t5 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName)
                        .Slop(5)
                        .Query("three one two")))
            );

            // limit max found (same as Size() but executed earlier, probably can help with performance?)
            var t6 = await client.SearchAsync<EsOrganisation>(s => s
                    .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName)
                        .MaxExpansions(takeAmount)
                        .Query("one two three")))
            );

Boolean queries

Boolean queries are composed queries in which there are more than one criteria and the sum of such criteria is done with ANDs, ORs and NOTs operators.

When creating Boolean queries we can add filters to it, a filter is essentially the same as a Must() query without adding the results into the score, allowing the score calculation to be quicker and the search to consume less resources. So try to add structured conditions into a filter while unstructured ones into a Must() that can calculate a score.

Operators
&&: AND
||: OR
!: NOT
+: filter. used to set this criteria as filter-type (not to be considered to calculate score)

Using ANDs and ORs inside a Query with operators:

            var s4 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => +q.Terms(c => c.Field(p => p.Products).Terms(products)) && (
                                q.Match(m => m.Field(f => f.OrganisationName).Query(query)) ||
                                q.Match(m => m.Field(f => f.OrganisationAliases).Query(query))) && 
                                !q.Match(m => m.Field(f => f.OrganisationKeywords).Query(query))
                    )
            );

Extracting the search filters

Which can be useful when you want to reuse filters or dynamically build a query

            var productFilter = new QueryContainerDescriptor<EsOrganisation>()
                  .Terms(c => c.Field(p => p.Products).Terms(products));
            var matchNameFilter = new QueryContainerDescriptor<EsOrganisation>()
                  .Match(m => m.Field(f => f.OrganisationName).Query(query));
            var matchAliasFilter = new QueryContainerDescriptor<EsOrganisation>()
                  .Match(m => m.Field(f => f.OrganisationName).Query(query));
            var matchKeywordFilter = new QueryContainerDescriptor<EsOrganisation>()
                  .Match(m => m.Field(f => f.OrganisationName).Query(query));

            var s3 = await client.SearchAsync<EsOrganisation>(sr => sr
                  .Query(q => +productFilter && (matchNameFilter || matchAliasFilter) && !matchKeywordFilter)
            );

Using the Bool method

Warning: Still finding out how it works, work-in-progress.
Note: I’m using the filters extracted at point “Extracting the search filters“.

            // This works, productFilter doesn't affect score but filters
            var b2 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Must(phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter)
                        .Filter(productFilter)
                        ))
            );

            // All three are required as a MUST
            var b3 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        // This works as an AND
                        .Must(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter)
                        ))
            );

            // This works as non-exclusive filters just counting for the score, 
            // if no Minimum was set everything would be included, just sorted by score
            var b4 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Should(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter)
                        .MinimumShouldMatch(1) //match at least one, then sort by relevancy
                        ))
            );
            // If we add a filter it filters without affecting score
            var b4B = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Should(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter)
                        .MinimumShouldMatch(1)
                        .Filter(productFilter)
                        ))
            );

            // Works as ORs
            var listOfFilters = new QueryContainer[] {phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter};
            var boolQuery1 = new BoolQuery {
                Name = "boolQuery",
                Should = listOfFilters,
                MinimumShouldMatch = 1,
                Filter = new QueryContainer [] { productFilter }
            };
            var b8 = await client.SearchAsync<EsOrganisation>(sr => sr.Query(q => boolQuery1));

More about boolean queries.

Queries that won’t work

Just some query attempts that won’t work, useful to know what you can’t do:

            // Attempting an OR/AND between queries -> Fails
            var s4 = await client.SearchAsync<EsOrganisation>(sr => sr
                .Query(q => productFilter) // WARNING: This one is overridden by the second, DON'T DO THIS
                .Query(q => phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter)
            );

            // Attempting an OR between Fields -> Fails
            var phrasePrefixInAllFields = new QueryContainerDescriptor<EsOrganisation>()
                .MatchPhrasePrefix(m => m
                    .Field(f => f.OrganisationName)
                    .Field(f => f.OrganisationAliases)
                    // Again, this last Field method overrides the two previous ones, so THIS CAN'T BE DONE
                    .Field(f => f.OrganisationKeywords)
                    .Slop(2)
                    .Query(query)
            );
            var s6 = await client.SearchAsync<EsOrganisation>(sr => sr.Query(q => phrasePrefixInAllFields));

            // WARNING: This won't work, second Must overrides the first!!
            var b1 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Must(productFilter)
                        .Must(phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter)
                        ))
            );

            // This doesn't work as last Should overrides previous ones
            var b6 = await client.SearchAsync<EsOrganisation>(sr => sr
                    .Query(q => q.Bool(b => b
                        .Should(phrasePrefixNameFilter)
                        .Should(phrasePrefixAliasFilter)
                        .Should(phrasePrefixKeywordFilter)
                        .MinimumShouldMatch(1)
                        .Filter(productFilter)
                        ))
            );

Boosting a field

When performing unstructured queries, we can determine which fields have more relevancy than the others, just use Boost() to multiply the value of such match.

            var phrasePrefixKeywordFilter = new QueryContainerDescriptor<EsOrganisation>()
                .MatchPhrasePrefix(m => m
                .Boost(3) // make this field three times more important when calculating score
                .Field(f => f.OrganisationKeywords)
                .Slop(2)
                .Query(query)
                );

Useful links


Viewing latest article 8
Browse Latest Browse All 94

Trending Articles