Options
ParsingOptionsBuilder
¶
Builder for the ParsingOptions class.
Source code in splitgill/indexing/options.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | |
build()
¶
Builds a new ParsingOptions object using the internal state of the builder's options and returns it.
Returns:
| Type | Description |
|---|---|
ParsingOptions
|
a new ParsingOptions object |
Source code in splitgill/indexing/options.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
clear_date_formats()
¶
Clears out the date formats in this builder. Note that this will remove the default formats which handle the default way Splitgill handles datetime and date objects through from ingest to indexing.
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
189 190 191 192 193 194 195 196 197 198 | |
clear_false_values()
¶
Clear out all false values in this builder.
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
221 222 223 224 225 226 227 228 | |
clear_geo_hints()
¶
Clear out all geo hints in this builder.
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
230 231 232 233 234 235 236 237 | |
clear_true_values()
¶
Clear out all true values in this builder.
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
212 213 214 215 216 217 218 219 | |
reset_date_formats()
¶
Reset the date formats in this builder back to the default set.
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
200 201 202 203 204 205 206 207 208 209 210 | |
with_date_format(date_format)
¶
Add the given date format to the set of date formats to parse and return self (for easing chaining). The date format should be one that datetime.strptime can use to parse a string.
If the date format is None or the empty string, nothing happens. If the date format is already in the set of date formats, nothing happens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date_format
|
str
|
a date format string |
required |
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
with_false_value(value)
¶
Add the given value to the set of strings that means False and return self (for easy chaining). The value is lowercased before adding it to the set of accepted values.
If the value is None or the empty string, nothing happens. If the value is already in the set of false values, nothing happens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
the string value representing False |
required |
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
with_float_format(float_format)
¶
Sets the format string to use when converting a float to a string for indexing. The string will have its format() method called during indexing with the float value passed as the only parameter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
float_format
|
str
|
the format string |
required |
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
177 178 179 180 181 182 183 184 185 186 187 | |
with_geo_hint(latitude_field, longitude_field, radius_field=None, segments=16)
¶
Add the given lat/lon/radius field combination as a hint for the existence of a geo parsable field. The radius field name is optional. A segments parameter can also be provided which specifies the number of segments to use when creating the circle around the point if radius is specified.
Latitude fields across hints must be unique and therefore, if a hint is set with a latitude field that already exists in this options builder, the current hint will be replaced. The reason the latitude is the only field considered for a hint's uniqueness is because we store the geo shape and geo point data on the latitude field and we have chosen to only store one of these values per field to allow searching against just that one field.
When parsing of a record's data occurs, the latitude, longitude, and, if provided, radius fields named here will be checked to see if they exist in the record. If they do then further validation of their values is undertaken and if the values in the fields are valid then they are combined into either a Point (if only latitude and longitude are provided) or Polygon object (if the radius is provided as well, this value is used to create a circle around the latitude and longitude point).
When matching, if the radius_field is provided but not found in a record's data but the latitude and longitude fields are found, the hint will still match the record and produce a precise point.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
latitude_field
|
str
|
the name of the latitude field |
required |
longitude_field
|
str
|
the name of the longitude field |
required |
radius_field
|
Optional[str]
|
the name of the radius field (optional) |
None
|
segments
|
int
|
the number of segments to use when creating the circle (optional, defaults to 16) |
16
|
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
with_keyword_length(keyword_length)
¶
Sets the maximum keyword length which will be used when indexing. Any strings longer than this value will be trimmed down before they are sent to Elasticsearch.
Elasticsearch provides an ignore_above feature we could use on keywords to limit the length entered, however, this means that anything longer is completely ignored and not indexed rather than just being truncated. Truncating the data before it goes into Elasticsearch to ensure it is indexed no matter what seems more appealing.
This method will error if the length is below 1 (for obvious reasons) or above 32766. If using full 4 byte UTF-8 characters, this will need to be reduced to 8191 but to avoid restricting when it is potentially not necessary, we use 32766. Relevant documentation, though it's not exactly detailed: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keyword_length
|
int
|
the maximum keyword length |
required |
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
with_true_value(value)
¶
Add the given value to the set of strings that means True and return self (for easy chaining). The value is lowercased before adding it to the set of accepted values.
If the value is None or the empty string, nothing happens. If the value is already in the set of true values, nothing happens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
the string value representing True |
required |
Returns:
| Type | Description |
|---|---|
ParsingOptionsBuilder
|
self |
Source code in splitgill/indexing/options.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |