Skip to content
willkg edited this page Apr 17, 2013 · 5 revisions

EEP1: Fixing queries

Status

2013-04-17: Big revision based on thoughts from Erik and Rob
2013-03-18: Draft

Summary

  1. It's hard to compose and operate on queries as units.

  2. It's impossible to build readable and intuitive bool queries with 'must', 'should', and 'must not' boolean queries.

    The current implementation of query has unintuitive behavior in regards to bool queries. What happens when we chain two .query() functions? If you chain two .query() functions, then it puts them together into a BoolQuery 'must'.

    http://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html

  3. Because of the way we specify which query to use via field actions and we use kwargs, then it's impossible to do arguments to queries.

Problem examples

Example 1

I want to do a match query on a bunch of fields and some startswith queries and they should be grouped in a 'should' bool query.

querytext = 'some text the user wrote'

qs = S()

queries = {}
for fieldname in ['title', 'summary', 'description']:
    queries['{}__match'.format(fieldname)] = querytext

queries['creatorname__startswith'] = querytext

qs = qs.query(or_=queries)

Both Kitsune and Zamboni do this but have far more than 3 fields involved.

The good thing here is that I can incrementally compose things and it fits together, but the bad thing is that I've broken out of the metaphor with a different syntax which doesn't look particularly nice.

Also, in terms of ES, wtf does or_ mean here? If I read the code, it translates to "should", but that's:

  1. not obvious, and
  2. it's not quite the same thing since and/or/not imply boolean algebra and must/should/must-not are not boolean algebra

I think that's both confusing and misleading.

Example 2

I want to do a query that elasticutils doesn't have an action for.

You effectively can't! Haha!

Example 3

I want to do a match query and change the default behavior by passing fuzziness and operator values.

You can't! Haha!

Talk about possibilities

Add a Q class and doing bool query things

I think the first thing I want to do is add a Q class. Then we can more easily operate on query units.

For example:

if some thing:
    q = Q(foo__match='bar')
else:
    q = Q(foo__match='bar', author__startswith='all')
s = S().query(q)

We could support __add__ allowing you to join Q classes cumulatively:

q = Q(foo__match='bar')
q += Q(author__startswith='all')

This would be the same as:

q = Q(foo__match='bar', author__startswith='all')

Unlike F, we won't support |, ~ and & because queries don't involve those kinds of operations whereas filters do.

Add should, must and must_not special flags

Add a "special flag" to query and Q that specifies how the queries are connected.

For example:

q = Q(foo__match='bar', author__startswith='all', should=True)

will create a bool query with a should clause that has the match and prefix queries.

Example:

s = S().query(foo__match='bar', author__startswith='all', must=True)

will create a bool query with a must clause that has the match and prefix queries.

Reimplement process_query like process_filter

process_filter now allows you to subclass S and add additional filter types and override the behavior of existing filter types by adding methods to your S subclass.

Similarly, we'll do the same for process_queries. This would let you add new query types that ElasticUtils doesn't support and also override the behavior of existing query types. This would let you provide additional arguments, too.

Add a query_raw

If the developer uses query_raw, then we ignore all the other query bits and just use that verbatim.

Some examples

Example 1:

# This works currently
s = S().query(foo__match='bar')

'query': {
    'match': ...
 }


s = S().query(foo__match='bar', another__startswith='all')

'query': {
    'bool': {
        'must': {
            'match': ...,
            'prefix': ...
        }
    }
}


s = S().query(foo__match='bar').query(another__startswith='all')

'query': {
    'bool': {
        'must': {
            'match': ...,
            'prefix': ...
        }
    }
}

Example 2:

# This is equivalent to example 1, but uses a Q.
q = Q(foo__match='bar')
s = S().query(q)

'query': {
    'match': ...
}

Example 3:

# .query can take any number of Qs and joins them in a must by default
s = S().query(Q(foo__match='bar'), Q(another__startswith='all'))

'query': {
    'bool': {
        'must': {
            'match': ...,
            'prefix': ...
        }
    }
}

# Similarly

s = S().query(Q(foo__match='bar')).query(Q(another__startswith='all'))

'query': {
    'bool': {
        'must': {
            'match': ...,
            'prefix': ...
        }
    }
}

Example 4:

# .query with the should flag.
s = S().query(Q(foo__match='bar'), Q(another__startswith='all', should=True)

'query': {
    'bool': {
        'should': {
            'match': ...,
            'prefix': ...
        }
    }
}

Example 5:

# chaining two .query with should flags
s = S().query(Q(foo__match='bar'), should=True)
s = s.query(another__startswith='all', should=True)


# Those get combined into a single should clause
'query': {
    'bool': {
        'should': {
            'match': ...,
            'prefix': ...
        }
    }
}

Example 6:

# More chaining and flags

# chaining two .query with should flags
s = S()
s = s.query(foo__match='bar', should=True)
s = s.query(another__startswith='all', must=True)
s = s.query(baz='bat', should=True)

# Those get combined into a single bool query
'query': {
    'bool': {
        'should': {
            'match': ...,  # foo__match
            'term': ...    # baz
        },
        'must': {
            'prefix': ...  # another__startswith
        }
    }
}

Example 7:

# the example from the top re-written
querytext = 'some text the user wrote'

s = S()
queries = {}

for fieldname in ['title', 'summary', 'description']:
    queries['{}__match'.format(fieldname)] = querytext

queries['creatorname__startswith'] = querytext

qs = qs.query(should=True, **queries)


# Those get combined into a single should clause
'query': {
    'bool': {
        'should': {
            'match': ...,  # title__match
            'match': ...,  # summary__match
            'match': ...,  # description__match
            'prefix': ...  # creatorname__startswith
        }
    }
}

This supports most of the old behavior, but nixes the or_.