Skip to content

Search

exists_query(field)

A convenience function which returns an exists query for the given field.

Parameters:

Name Type Description Default
field str

the field path

required

Returns:

Type Description
Query

an exists query on the field using the full parsed path

Source code in splitgill/search.py
110
111
112
113
114
115
116
117
def exists_query(field: str) -> Query:
    """
    A convenience function which returns an exists query for the given field.

    :param field: the field path
    :return: an exists query on the field using the full parsed path
    """
    return Q('exists', field=parsed_path(field, parsed_type=None, full=True))

has_geo()

Create an exists query which filters for records which have geo data. Currently, this uses ALL_POINTS, but it could just as easily use ALL_SHAPES, it doesn't matter.

Returns:

Type Description
Query

an exists Query object

Source code in splitgill/search.py
100
101
102
103
104
105
106
107
def has_geo() -> Query:
    """
    Create an exists query which filters for records which have geo data. Currently,
    this uses ALL_POINTS, but it could just as easily use ALL_SHAPES, it doesn't matter.

    :return: an exists Query object
    """
    return Q('exists', field=DocumentField.ALL_POINTS)

id_query(record_id)

Returns a term query on the _id field in the record's data with the record_id value passed. This uses the data's _id not the documents ID root field.

Parameters:

Name Type Description Default
record_id str

the record's ID

required

Returns:

Type Description
Query

a term query

Source code in splitgill/search.py
35
36
37
38
39
40
41
42
43
def id_query(record_id: str) -> Query:
    """
    Returns a term query on the _id field in the record's data with the record_id value
    passed. This uses the data's _id not the documents ID root field.

    :param record_id: the record's ID
    :return: a term query
    """
    return term_query(DATA_ID_FIELD, record_id, ParsedType.KEYWORD)

index_specific_version_filter(indexes_and_versions)

Creates the elasticsearch-dsl Bool object necessary to query the given indexes at the given specific versions. If there are multiple indexes that require the same version then a terms.

The query will be created covering the group rather than several term queries for each index - this is probably no different in terms of performance, but it does keep the size of the query down when large numbers of indexes are queried. If all indexes require the same version then a single term query is returned (using the create_version_query above) which has no index filtering in it at all.

Parameters:

Name Type Description Default
indexes_and_versions Dict[str, int]

a dict of index names -> versions

required

Returns:

Type Description
Query

an elasticsearch-dsl Query object

Source code in splitgill/search.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def index_specific_version_filter(indexes_and_versions: Dict[str, int]) -> Query:
    """
    Creates the elasticsearch-dsl Bool object necessary to query the given indexes at
    the given specific versions. If there are multiple indexes that require the same
    version then a terms.

    The query will be created covering the group rather than several term queries for
    each index - this is probably no different in terms of performance, but it does keep
    the size of the query down when large numbers of indexes are queried. If all indexes
    require the same version then a single term query is returned (using the
    create_version_query above) which has no index filtering in it at all.

    :param indexes_and_versions: a dict of index names -> versions
    :return: an elasticsearch-dsl Query object
    """
    # flip the dict we've been given to group by the version
    by_version = defaultdict(list)
    for index, version in indexes_and_versions.items():
        by_version[version].append(index)

    if len(by_version) == 1:
        # there's only one version, just use it in a single meta.versions check with no
        # indexes
        return version_query(next(iter(by_version.keys())))
    else:
        filters = []
        for version, indexes in by_version.items():
            version_filter = version_query(version)
            if len(indexes) == 1:
                # there's only one index requiring this version so use a term query
                filters.append(
                    Bool(filter=[Q('term', _index=indexes[0]), version_filter])
                )
            else:
                # there are a few indexes using this version, query them using terms as
                # a group
                filters.append(
                    Bool(filter=[Q('terms', _index=indexes), version_filter])
                )
        return Bool(should=filters, minimum_should_match=1)

infer_parsed_type(value)

Given a value, infer the ParsedType based on the type of the value.

If no ParsedType can be matched, a ValueError is raised.

Parameters:

Name Type Description Default
value Union[int, float, str, bool, date, datetime]

the value

required

Returns:

Type Description
ParsedType

a ParsedType

Source code in splitgill/search.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
def infer_parsed_type(
    value: Union[int, float, str, bool, datetime.date, datetime.datetime],
) -> ParsedType:
    """
    Given a value, infer the ParsedType based on the type of the value.

    If no ParsedType can be matched, a ValueError is raised.

    :param value: the value
    :return: a ParsedType
    """
    if isinstance(value, str):
        return ParsedType.KEYWORD
    elif isinstance(value, bool):
        return ParsedType.BOOLEAN
    elif isinstance(value, (int, float)):
        return ParsedType.NUMBER
    elif isinstance(value, (datetime.date, datetime.datetime)):
        return ParsedType.DATE
    else:
        raise ValueError(f'Unexpected type {type(value)}')

match_query(query, field=None, **match_kwargs)

Create and return a match query using the given query and the optional field name. If the field name is not specified, all text data is searched instead using the ALL_TEXT field.

Parameters:

Name Type Description Default
query str

the query to match

required
field Optional[str]

the field to query, or None if all fields should be queried

None
match_kwargs

additional options for the match query

{}

Returns:

Type Description
Query

a Query object

Source code in splitgill/search.py
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def match_query(query: str, field: Optional[str] = None, **match_kwargs) -> Query:
    """
    Create and return a match query using the given query and the optional field name.
    If the field name is not specified, all text data is searched instead using the
    ALL_TEXT field.

    :param query: the query to match
    :param field: the field to query, or None if all fields should be queried
    :param match_kwargs: additional options for the match query
    :return: a Query object
    """
    if field is None:
        path = ALL_TEXT
    else:
        path = text(field)
    return Q('match', **{path: {'query': query, **match_kwargs}})

range_query(field, gte=None, lt=None, gt=None, lte=None, parsed_type=None, **range_kwargs)

Create and return a range query using the given parameters to specify the extent. At least one of the gte/lt/gt/lte parameters must be specified otherwise a ValueError is raised. If the parsed_type parameter is not specified, it will be inferred from the first non-None gte/lt/gt/lte parameter.

Parameters:

Name Type Description Default
field str

the field to query

required
gte Union[int, float, str, date, datetime]

the greater than or equal to value

None
lt Union[int, float, str, date, datetime]

the less than value

None
gt Union[int, float, str, date, datetime]

the greater than value

None
lte Union[int, float, str, date, datetime]

the less than or equal to value

None
parsed_type Optional[ParsedType]

the parsed type of the field to use, or None to infer from value

None
range_kwargs

additional options for the range query

{}

Returns:

Type Description
Query

a Query object

Source code in splitgill/search.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
def range_query(
    field: str,
    gte: Union[int, float, str, datetime.date, datetime.datetime] = None,
    lt: Union[int, float, str, datetime.date, datetime.datetime] = None,
    gt: Union[int, float, str, datetime.date, datetime.datetime] = None,
    lte: Union[int, float, str, datetime.date, datetime.datetime] = None,
    parsed_type: Optional[ParsedType] = None,
    **range_kwargs,
) -> Query:
    """
    Create and return a range query using the given parameters to specify the extent. At
    least one of the gte/lt/gt/lte parameters must be specified otherwise a ValueError
    is raised. If the parsed_type parameter is not specified, it will be inferred from
    the first non-None gte/lt/gt/lte parameter.

    :param field: the field to query
    :param gte: the greater than or equal to value
    :param lt: the less than value
    :param gt: the greater than value
    :param lte: the less than or equal to value
    :param parsed_type: the parsed type of the field to use, or None to infer from value
    :param range_kwargs: additional options for the range query
    :return: a Query object
    """
    range_inner = {}
    for_inference = None
    for key, value in zip(['gte', 'lt', 'gt', 'lte'], [gte, lt, gt, lte]):
        if value is None:
            continue
        if for_inference is None:
            for_inference = value
        # date is the parent class of datetime so this check is ok
        if isinstance(value, datetime.date):
            range_inner[key] = to_timestamp(value)
        else:
            range_inner[key] = value

    if not range_inner:
        raise ValueError('You must provide at least one of the lt/lte/gt/gte values')

    if parsed_type is None:
        parsed_type = infer_parsed_type(for_inference)

    range_inner.update(range_kwargs)

    return Q(
        'range', **{parsed_path(field, parsed_type=parsed_type, full=True): range_inner}
    )

rebuild_data(parsed_data)

Rebuild the original data from the parsed version of the data created by the parse function above.

Parameters:

Name Type Description Default
parsed_data dict

the parsed dict

required

Returns:

Type Description
dict

the rebuilt data dict

Source code in splitgill/search.py
236
237
238
239
240
241
242
243
244
245
246
def rebuild_data(parsed_data: dict) -> dict:
    """
    Rebuild the original data from the parsed version of the data created by the parse
    function above.

    :param parsed_data: the parsed dict
    :return: the rebuilt data dict
    """
    # this doesn't need _ checks because you can't currently have parsed types at the
    # root level of the data dict
    return {key: rebuild_dict_or_list(value) for key, value in parsed_data.items()}

rebuild_dict_or_list(value)

Rebuild a dict or a list inside the parsed dict.

Parameters:

Name Type Description Default
value Union[dict, list]

a dict which can either be for structure or a value, or a list of either value or structure dicts

required

Returns:

Type Description
Union[int, str, bool, float, dict, list, None]

a dict, list, or value

Source code in splitgill/search.py
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
def rebuild_dict_or_list(
    value: Union[dict, list],
) -> Union[int, str, bool, float, dict, list, None]:
    """
    Rebuild a dict or a list inside the parsed dict.

    :param value: a dict which can either be for structure or a value, or a list of
        either value or structure dicts
    :return: a dict, list, or value
    """
    if isinstance(value, dict):
        if ParsedType.UNPARSED in value:
            # this is a value dict, return the original value
            return value[ParsedType.UNPARSED]
        else:
            # this is a structural dict, pass each value through this function but
            # filter out fields that start with an underscore, unless they are the
            # special _id field
            return {
                key: rebuild_dict_or_list(value)
                for key, value in value.items()
                if not key.startswith('_') or key == DATA_ID_FIELD
            }
    elif isinstance(value, list):
        # pass each element of the list through this function
        return [rebuild_dict_or_list(element) for element in value]
    else:
        # failsafe: just return the value. This should only really happen with lists
        # containing Nones (which is technically allowed)
        return value

term_query(field, value, parsed_type=None)

Create and return a term query which will find documents that have an exact value match in the given field. If the parsed_type parameter is not specified, it will be inferred based on the value type.

Parameters:

Name Type Description Default
field str

the field match

required
value Union[int, float, str, bool, date, datetime]

the value to match

required
parsed_type Optional[ParsedType]

the parsed type of the field to use, or None to infer from value

None

Returns:

Type Description
Query

a Q object

Source code in splitgill/search.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def term_query(
    field: str,
    value: Union[int, float, str, bool, datetime.date, datetime.datetime],
    parsed_type: Optional[ParsedType] = None,
) -> Query:
    """
    Create and return a term query which will find documents that have an exact value
    match in the given field. If the parsed_type parameter is not specified, it will be
    inferred based on the value type.

    :param field: the field match
    :param value: the value to match
    :param parsed_type: the parsed type of the field to use, or None to infer from value
    :return: a Q object
    """
    if parsed_type is None:
        parsed_type = infer_parsed_type(value)

    # date is the parent class of datetime so this check is ok
    if parsed_type == ParsedType.DATE and isinstance(value, datetime.date):
        value = to_timestamp(value)

    return Q('term', **{parsed_path(field, parsed_type=parsed_type, full=True): value})

version_query(version)

Creates the elasticsearch-dsl term necessary to find the correct data from some searched records given a version. You probably want to use the result of this function in a filter, for example, to find all the records at a given version.

Parameters:

Name Type Description Default
version int

the requested version

required

Returns:

Type Description
Query

an elasticsearch-dsl Query object

Source code in splitgill/search.py
46
47
48
49
50
51
52
53
54
55
def version_query(version: int) -> Query:
    """
    Creates the elasticsearch-dsl term necessary to find the correct data from some
    searched records given a version. You probably want to use the result of this
    function in a filter, for example, to find all the records at a given version.

    :param version: the requested version
    :return: an elasticsearch-dsl Query object
    """
    return Q('term', **{DocumentField.VERSIONS: version})