-
Notifications
You must be signed in to change notification settings - Fork 0
EEP1: fix queries
| 2013-04-17: | Big revision based on thoughts from Erik and Rob |
|---|---|
| 2013-03-18: | Draft |
-
It's hard to compose and operate on queries as units.
-
It's impossible to build readable and intuitive bool queries with 'must', 'should', and 'must not' boolean queries.
The current implementation of query has unintuitive behavior in regards to bool queries. What happens when we chain two
.query()functions? If you chain two.query()functions, then it puts them together into a BoolQuery 'must'.http://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html
-
Because of the way we specify which query to use via field actions and we use kwargs, then it's impossible to do arguments to queries.
I want to do a match query on a bunch of fields and some startswith queries and they should be grouped in a 'should' bool query.
querytext = 'some text the user wrote'
qs = S()
queries = {}
for fieldname in ['title', 'summary', 'description']:
queries['{}__match'.format(fieldname)] = querytext
queries['creatorname__startswith'] = querytext
qs = qs.query(or_=queries)
Both Kitsune and Zamboni do this but have far more than 3 fields involved.
The good thing here is that I can incrementally compose things and it fits together, but the bad thing is that I've broken out of the metaphor with a different syntax which doesn't look particularly nice.
Also, in terms of ES, wtf does or_ mean here? If I read the code,
it translates to "should", but that's:
- not obvious, and
- it's not quite the same thing since and/or/not imply boolean algebra and must/should/must-not are not boolean algebra
I think that's both confusing and misleading.
I want to do a query that elasticutils doesn't have an action for.
You effectively can't! Haha!
I want to do a match query and change the default behavior by passing fuzziness and operator values.
You can't! Haha!
I think the first thing I want to do is add a Q class. Then we can more easily operate on query units.
For example:
if some thing:
q = Q(foo__match='bar')
else:
q = Q(foo__match='bar', author__startswith='all')
s = S().query(q)
We could support __add__ allowing you to join Q classes cumulatively:
q = Q(foo__match='bar') q += Q(author__startswith='all')
This would be the same as:
q = Q(foo__match='bar', author__startswith='all')
Unlike F, we won't support |, ~ and & because queries don't
involve those kinds of operations whereas filters do.
Add a "special flag" to query and Q that specifies how the queries are connected.
For example:
q = Q(foo__match='bar', author__startswith='all', should=True)
will create a bool query with a should clause that has the match and prefix queries.
Example:
s = S().query(foo__match='bar', author__startswith='all', must=True)
will create a bool query with a must clause that has the match and prefix queries.
process_filter now allows you to subclass S and add additional filter types and override the behavior of existing filter types by adding methods to your S subclass.
Similarly, we'll do the same for process_queries. This would let you add new query types that ElasticUtils doesn't support and also override the behavior of existing query types. This would let you provide additional arguments, too.
If the developer uses query_raw, then we ignore all the other query bits and just use that verbatim.
Example 1:
# This works currently
s = S().query(foo__match='bar')
'query': {
'match': ...
}
s = S().query(foo__match='bar', another__startswith='all')
'query': {
'bool': {
'must': {
'match': ...,
'prefix': ...
}
}
}
s = S().query(foo__match='bar').query(another__startswith='all')
'query': {
'bool': {
'must': {
'match': ...,
'prefix': ...
}
}
}
Example 2:
# This is equivalent to example 1, but uses a Q.
q = Q(foo__match='bar')
s = S().query(q)
'query': {
'match': ...
}
Example 3:
# .query can take any number of Qs and joins them in a must by default
s = S().query(Q(foo__match='bar'), Q(another__startswith='all'))
'query': {
'bool': {
'must': {
'match': ...,
'prefix': ...
}
}
}
# Similarly
s = S().query(Q(foo__match='bar')).query(Q(another__startswith='all'))
'query': {
'bool': {
'must': {
'match': ...,
'prefix': ...
}
}
}
Example 4:
# .query with the should flag.
s = S().query(Q(foo__match='bar'), Q(another__startswith='all', should=True)
'query': {
'bool': {
'should': {
'match': ...,
'prefix': ...
}
}
}
Example 5:
# chaining two .query with should flags
s = S().query(Q(foo__match='bar'), should=True)
s = s.query(another__startswith='all', should=True)
# Those get combined into a single should clause
'query': {
'bool': {
'should': {
'match': ...,
'prefix': ...
}
}
}
Example 6:
# More chaining and flags
# chaining two .query with should flags
s = S()
s = s.query(foo__match='bar', should=True)
s = s.query(another__startswith='all', must=True)
s = s.query(baz='bat', should=True)
# Those get combined into a single bool query
'query': {
'bool': {
'should': {
'match': ..., # foo__match
'term': ... # baz
},
'must': {
'prefix': ... # another__startswith
}
}
}
Example 7:
# the example from the top re-written
querytext = 'some text the user wrote'
s = S()
queries = {}
for fieldname in ['title', 'summary', 'description']:
queries['{}__match'.format(fieldname)] = querytext
queries['creatorname__startswith'] = querytext
qs = qs.query(should=True, **queries)
# Those get combined into a single should clause
'query': {
'bool': {
'should': {
'match': ..., # title__match
'match': ..., # summary__match
'match': ..., # description__match
'prefix': ... # creatorname__startswith
}
}
}
This supports most of the old behavior, but nixes the or_.