The Elasticsearch Query Builder project is designed to convert a string input into a valid Elasticsearch (ES) query. It handles various query types like search, count, and aggregation, and supports nested fields, date manipulations, and validation against an Elasticsearch mapping. The project consists of multiple JavaScript files, each serving a specific purpose.
/project-directory
|-- mapping.json
|-- main.js
|-- console.js
|-- nestedQueryParser.js
|-- utils
|---- dateUtils.js
|---- queryUtils.js
|---- mappingUtils.js
- mapping.json: Contains the Elasticsearch mapping, which defines the fields and their types.
- main.js: Main entry point for converting string input to ES query.
- console.js: Console-based interface for interacting with the query Builder.
- nestedQueryParser.js: Core logic for parsing and converting queries.
- utils: Directory containing utility functions.
- dateUtils.js: Functions for date manipulation and validation.
- queryUtils.js: Functions for tokenizing input and validating query components.
- mappingUtils.js: Functions for loading and processing the mapping file.
The mapping.json
file contains the Elasticsearch index mapping, which defines the schema for your index, including field names, types, and indexing options. This file is essential for validating queries and ensuring that they conform to the structure of your Elasticsearch index.
-
Retrieve the Mapping from Elasticsearch:
- Use the Elasticsearch
_mapping
API to retrieve the mapping for your index. Replaceyour_index_name
with the name of your index.
curl -X GET "localhost:9200/your_index_name/_mapping?pretty"
- This command will output the mapping of your index in JSON format.
- Use the Elasticsearch
-
Save the Mapping to a File:
- Copy the output of the
_mapping
API call. - Create a new file named
mapping.json
in your project directory. - Paste the copied JSON content into the
mapping.json
file and save it.
- Copy the output of the
{
"YOUR_ELASTIC_INDEX": {
"mappings": {
"properties": {
"mobile": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"gender": {
"type": "keyword"
}
}
}
}
}
The main.js
file serves as the main entry point for the project. It provides functions to parse a query string and convert it into an Elasticsearch query.
const {createQuery } = require('./nestedQueryParser');
const input = 'profession.status = "active" and createDate >= "today-2d"' + // query string
' ; '+ // ';' is concatenated to seperate query string and options
'queryType = "aggregation", '+ // options are separated by comma ','
'aggregationType = "date_histogram", '+
'aggregationField = "createDate", '+
'timeZone = "+05:30", '+
'fixed_interval = "1d", '+
'size = 0 '+
'';
const result = createQuery(input);
if (result.errorCode) {
console.error(`Error Code: ${result.errorCode}`);
console.error(`Message: ${result.message}`);
} else {
console.log("Parsed JSON:");
console.log(JSON.stringify(result.parsedJSON, null, 2));
console.log("\nElasticsearch Query:");
console.log(JSON.stringify(result.query, null, 2));
}
The console.js
file provides a command-line interface for interacting with the query Builder. It prompts the user for input and processes the query in a loop until the user exits.
Run the script using Node.js:
node console.js
The script will prompt for a query string, query type, and other parameters, and then output the parsed JSON and Elasticsearch query.
The nestedQueryParser.js
file contains the core logic for parsing query strings and converting them into Elasticsearch queries. It includes functions for tokenizing, parsing, and building ES queries.
createQuery(queryString)
: Calls all the required functions to convert query string into Elastic query format. Developers only have to call this function in their projects.
tokenize(input)
: Tokenizes the input query string into individual tokens.parseQuery(tokens, timeZone)
: Parses the tokenized query string into an expression tree.jsonToESQuery(parsedJSON, options)
: Converts the parsed JSON expression tree into an Elasticsearch query.isAggregationAllowed(fieldName, aggregationType)
: Checks if the specified aggregation type is allowed on the given field.
The dateUtils.js
file contains functions for date manipulation and validation.
parseRelativeDate(dateString, timeZone)
: Parses relative date strings likenow
,today
,now-2h
, etc.formatDateString(dateString, timeZone)
: Formats date strings into the required format with the specified time zone.
The queryUtils.js
file contains utility functions for tokenizing input and validating query components.
validateFixedInterval(interval)
: Validates the fixed interval for date histogram aggregations.validateSize(size)
: Validates the size parameter for search queries.tokenize(input)
: Tokenizes the input query string into individual tokens.
The mappingUtils.js
file contains functions for loading and processing the mapping file.
loadMapping()
: Loads the mapping frommapping.json
and extracts fields and indexed fields.
- Clone the Repository
git clone https://github.com/Sarv/Elastic_Query_Builder.git
cd Elastic_Query_Builder
- Install Dependencies
npm install
- Run Main Script
node main.js
- Run Console Interface
node console.js
(gender = "female" or age != 30)
(profession.empCode = "183" and address.country = "India")
(createDate <= "now" and createDate >= "2023-10-30 11:45:00.432")
(marks = 189)
(Field not present in mapping)
---Error
: Field marks not found in the mapping.
=
Equality!=
Inequality>=
Greater than or equal to<=
Less than or equal to>
Greater than<
Less than:
This operator is used to check if a field exists in the document~
This operator is particularly useful when you need to perform size-based conditions on a specific field
Query String:
'age > 31 and gender = "male"'
Resulting Elasticsearch Query:
{
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": "31"
}
}
},
{
"term": {
"gender": {
"value": "male"
}
}
}
]
}
},
"size": 10
}
-
queryType
: search, count, aggregation (default: search) -
aggregationType
: avg, sum, min, max, terms, etc. (default: null) -
fixedInterval
: Interval for date histogram (default: 1d) -
size
: Size parameter for search queries (default: 10) -
timeZone
: Time zone offset (default: Z)
Date fields can support various formats as values in the query string. The project provides functionality to handle these formats:
- Exact Date : A specific date and time.
--- Format
: YYYY-MM-DD HH:mm:ss.SSS
--- Example
: "2024-05-13 12:00:00.432"
- Relative Date : Relative to the current date and time.
--- Format
: now, today, now-2h, today+2d, etc.
--- Example
: "now-2h" (2 hours ago from now)
- Time Zone Support : Adjusts dates based on the specified time zone.
--- Format
: +05:30, -02:00, etc.
--- Example
: "2024-05-13 12:00:00.432+05:30"
Query String:
'createDate <= "now" and createDate >= "2023-10-30 11:45:00.432"'
Time Zone:
const timeZone = '+05:30';
Resulting Elasticsearch Query:
{
"query": {
"bool": {
"must": [
{
"range": {
"createDate": {
"lte": "2024-05-25T17:04:35.395+05:30"
}
}
},
{
"range": {
"createDate": {
"gte": "2023-10-30T11:45:00.432+05:30"
}
}
}
]
}
},
"size": 10
}
'(profession.status = "active" and createDate >= "2023-10-30 11:45:00.432") or (gender = "male" and age > 30) or (address.country = "India" or durations > 9782 ) '
{
"conditions": [
{
"conditions": [
{
"field": "profession.status",
"operator": "=",
"value": "active"
},
{
"field": "createDate",
"operator": ">=",
"value": "2023-10-30T11:45:00.432+05:30"
}
],
"logicGate": "and"
},
{
"conditions": [
{
"field": "gender",
"operator": "=",
"value": "male"
},
{
"field": "age",
"operator": ">",
"value": "30"
}
],
"logicGate": "and"
},
{
"conditions": [
{
"field": "address.country",
"operator": "=",
"value": "India"
},
{
"field": "durations",
"operator": ">",
"value": "9782"
}
],
"logicGate": "or"
}
],
"logicGate": "or"
}
'(profession.status = "active" and createDate >= "2023-10-30 11:45:00.432") or (gender = "male" and age > 30) or (address.country = "India" or durations > 9782 ) '
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"nested": {
"path": "profession",
"query": {
"bool": {
"must": [
{
"term": {
"profession.status": {
"value": "active"
}
}
}
]
}
}
}
},
{
"range": {
"createDate": {
"gte": "2023-10-30T11:45:00.432+05:30"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"gender": {
"value": "male"
}
}
},
{
"range": {
"age": {
"gt": "30"
}
}
}
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "address",
"query": {
"bool": {
"must": [
{
"term": {
"address.country": {
"value": "India"
}
}
}
]
}
}
}
},
{
"range": {
"durations": {
"gt": "9782"
}
}
}
]
}
}
]
}
},
"size": 10
}
Nested fields are fields that are objects containing other fields. To query nested fields, use the dot notation to specify the path to the nested field.
'profession.status = "active"'
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "profession",
"query": {
"bool": {
"must": [
{
"term": {
"profession.status": {
"value": "active"
}
}
}
]
}
}
}
}
]
}
},
"size": 10
}
Our Elastic Query Builder library offers robust support for a wide range of aggregations, enabling comprehensive data analysis capabilities. The supported aggregations include:
- avg: Calculate the average value of a numeric field.
- sum: Compute the sum of numeric field values.
- min: Determine the minimum value of a numeric field.
- max: Identify the maximum value of a numeric field.
- stats: Provide a collection of statistical values (min, max, avg, sum, count).
- extended_stats: Extend basic statistics with additional metrics (variance, standard deviation).
- value_count: Count the number of non-null values for a field.
- percentiles: Calculate percentiles on numeric field values.
- cardinality: Estimate the number of distinct values in a field.
- terms: Aggregate data based on unique terms in a field.
- date_histogram: Group data by date intervals.
- nested: Perform aggregations within nested documents.
- histogram: Group data into numeric intervals.
This extensive support ensures that users can perform detailed and versatile data analysis efficiently.
'profession.status = "active" and createDate >= "today-2d"'
'date_histogram'
'createDate'
'1h'
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "profession",
"query": {
"bool": {
"must": [
{
"term": {
"profession.status": {
"value": "active"
}
}
}
]
}
}
}
},
{
"range": {
"createDate": {
"gte": "2024-05-23T00:00:00.000+05:30"
}
}
}
]
}
},
"aggs": {
"agg_name": {
"date_histogram": {
"field": "createDate",
"fixed_interval": "1d"
}
}
}
}
In Elasticsearch, fields in a document can be indexed or non-indexed. Indexed fields are searchable and can be used in queries, while non-indexed fields are not searchable. The ability to query a field is crucial for ensuring efficient and accurate searches. This project includes a mechanism to check whether a field is indexed before allowing it to be used in a query.
-
Loading the Mapping:
- The mapping of an Elasticsearch index is loaded from a
mapping.json
file. This file contains the schema of the index, defining each field and its type, along with whether it is indexed.
- The mapping of an Elasticsearch index is loaded from a
-
Parsing the Mapping:
- The mapping is parsed to extract the necessary information about each field. This includes checking the
index
property of each field to determine if it is indexed. If theindex
property istrue
, the field is considered indexed. If theindex
property isfalse
or the field has theenabled
property set tofalse
, the field is considered non-indexed.
- The mapping is parsed to extract the necessary information about each field. This includes checking the
-
Validation:
- During query parsing, each field referenced in the query is checked against the parsed mapping. If a field is not found in the mapping or is found to be non-indexed, an error is returned, and the field is not allowed to be used in the query.
Mapping Example :
{
"YOUR_ELASTIC_INDEX": {
"mappings": {
"properties": {
"gender": {
"type": "keyword",
"index": true
},
"activityLog": {
"type": "object",
"enabled": false
}
}
}
}
}
In this example:
gender
is an indexed field ("index": true
).activityLog
is a non-indexed field ("enabled": false
).
Query Validation:
- If a query attempts to use
gender
, it is allowed becausegender
is indexed. - If a query attempts to use
activityLog
, an error is returned becauseactivityLog
is not indexed.
Query String :
'(activityLog = "value")'
Error :
{
"errorCode": "FIELD_NOT_INDEXED",
"message": "Field activityLog is present but not indexed."
}
This mechanism ensures that only indexed fields are queried, preventing errors and improving the performance of Elasticsearch queries. By validating fields against the mapping, we maintain the integrity and efficiency of search operations.
This document provides a comprehensive list of error codes and corresponding error messages used in the Elasticsearch Query Builder project.
Error Code | Error Message |
---|---|
INVALID_FIELD | Invalid field {field} found. |
FIELD_NOT_INDEXED | Field {field} is present but not indexed. |
OPERATOR_NOT_ALLOWED | Operator {operator} is not allowed for field type {fieldType} . |
INVALID_OPERATOR | Invalid operator {operator} found. |
INVALID_DATE_FORMAT | Date format should be YYYY-MM-DD HH:mm:ss.SSS . |
INVALID_TIMEZONE_FORMAT | Time zone format should be Z or ±HH:MM |
INVALID_FIXED_INTERVAL | Fixed interval format should be a number followed by s (seconds), m (minutes), h (hours), d (days), w (weeks), M (months), or y (years). |
AGGREGATION_NOT_ALLOWED | Aggregation {aggregationType} is not allowed on field type {fieldType} . |
INVALID_AGGREGATION_TYPE | Invalid aggregation type {aggregationType} specified. |
MISSING_FIELD | Field is missing before operator {operator} . |
FIELD_TYPE_NOT_DEFINED | Field type for {field} is not defined. |
FIELD_NOT_FOUND | Field {fieldName} not found in the mapping. |
INVALID_QUERY_TYPE | Invalid query type |
MISSING_PARAMETERS | Missing aggregation type or field |
INVALID_SIZE | Size must be a positive integer. |
INVALID_INTERVAL | Interval must be a positive integer and greater than 0. |
INVALID_QUERY_FORMAT | Only one ";" is allowed to separate filter and options. |
INVALID_OPTION | Option ( ${option} ) is not valid |
INVALID_VALUE_FOR_OPERATOR | Operator ${currentOperator} is not allowed with values containing '/ ' for field '${currentKey} ' |
Operator ${currentOperator} is not allowed with values other than 'true ' or 'false ' for field '${currentKey} ' |
|
INVALID_SIZE_VALUE | The value "${currentValue} " cannot be converted to an integer for size comparison |
In the table above:
{field}
will be replaced by the name of the field causing the error.{operator}
will be replaced by the operator causing the error.{fieldType}
will be replaced by the type of the field.{aggregationType}
will be replaced by the type of aggregation.{fieldName}
will be replaced by the name of the field.{option}
will be replaced by the invalid option passed in the query string.
The options part of the query string allows for customizing the behavior of the Elasticsearch query, such as setting the query type, aggregation type, aggregation field, time zone, fixed interval, and size. This flexibility enables fine-tuning of the queries based on specific needs.
The options are specified after a semicolon (;
) in the query string. Each option is separated by a comma (,
), and the key-value pairs are separated by an equal sign (=
). For example:
queryType = "aggregation", aggregationType = "sum", aggregationField = "duration.queue", timeZone = "+05:30", fixed_interval = "1d", size = 100
-
queryType:
- Specifies the type of the query.
- Possible values:
"search"
,"aggregation"
- Default value:
"search"
-
aggregationType:
- Specifies the type of aggregation to perform.
- Possible values:
"sum"
,"avg"
,"min"
,"max"
,"terms"
,"date_histogram"
, etc. - Required if
queryType
is"aggregation"
-
aggregationField:
- Specifies the field on which the aggregation is performed.
- Example:
"duration.queue"
- Required if
queryType
is"aggregation"
-
timeZone:
- Specifies the time zone for date calculations.
- Format:
±[hh]:[mm]
(e.g.,"+05:30"
,"-04:00"
) - Default value:
"Z"
(UTC)
-
fixed_interval:
- Specifies the fixed interval for
date_histogram
aggregation. - Possible values: A number followed by
s
(seconds),m
(minutes),h
(hours),d
(days),w
(weeks),M
(months), ory
(years). - Example:
"1d"
(one day) - Required if
aggregationType
is"date_histogram"
- Default value:
"1d"
- Specifies the fixed interval for
-
size:
- Specifies the number of results to return.
- Example:
10
- Default value:
10
-
interval:
- Specifies the interval for
histogram
aggregation. - Required if
aggregationType
is"histogram"
- Example:
10
- Default value:
10
- Specifies the interval for
Here’s an example of a full query string with both filter conditions and options:
'profession.status = "active" and createDate >= "today-2d" ; queryType = "aggregation", aggregationType = "date_histogram", aggregationField = "createDate", timeZone = "+05:30", fixed_interval = "1d", size = 0 '
If you want to optimise your query and hence want to use terms
instead of term
then this library supports querying fields with multiple values using the /
separator. This feature is especially useful when you need to match any of several possible values for a field.
=
(Equals): Matches any of the specified values.!=
(Not Equals): Excludes any of the specified values.
const queryString = 'job = "teacher / doctor / scientist" or status = "active"';
In this example, the fieldName field will match any of the values "teacher", "doctor", or "scientist". The generated Elasticsearch query will use the terms
query for job and term
for status.
{
"query": {
"bool": {
"should": [
{
"terms": {
"job": ["teacher", "doctor", "scientist"]
}
},
{
"term": {
"status": {
"value": "active"
}
}
}
]
}
},
"size": 10
}
The /
separator can only be used with the =
and !=
operators. If you attempt to use /
with any other operator (e.g., >, <, >=, <=), an error will be returned.
If an invalid operator is used with the /
separator, the system will return the following error:
{
"errorCode": "INVALID_OPERATOR_FOR_VALUE",
"message": "Operator > is not allowed with values containing '/' for field age."
}
This ensures that only valid operators are used in conjunction with multiple values.
'person.assets = "home/shop/gold" or tax != "paid"'
The :
operator has been introduced to allow checking the existence of a field in your documents. This operator is specifically designed for boolean-like queries where you want to check whether a field exists (true
) or does not exist (false
).
The operator accepts only true
or false
(case-insensitive).
const queryString = 'status : "true" or category : "false"';
In this example, the status field will be checked to see if it exists in the document. The category field will be checked to see if it does not exist in the document.
{
"query": {
"bool": {
"should": [
{
"exists": {
"field": "status"
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "category"
}
}
}
}
]
}
},
"size": 10
}
The recent updates introduce the ~
operator for handling size queries in your Elasticsearch queries. This operator is particularly useful when you need to perform size-based conditions on a specific field.
~=
Checks if the size of a field is equal to a specified value.~!=
Checks if the size of a field is not equal to a specified value.~<
Checks if the size of a field is less than a specified value.~<=
Checks if the size of a field is less than or equal to a specified value.~>
Checks if the size of a field is greater than a specified value.~>=
Checks if the size of a field is greater than or equal to a specified value.
- Simple Size Query:
'fieldName ~= "5"'
This will generate a query to check if the size of the fieldName is exactly 5.
{
"script": {
"script": {
"source": "doc['fieldName'].size() == params.size",
"params": {
"size": 5
}
}
}
}
- Size Not Equal Query:
'fieldName ~!= "10"'
This will generate a query to check if the size of the fieldName
is not 10
.
{
"script": {
"script": {
"source": "doc['fieldName'].size() != params.size",
"params": {
"size": 10
}
}
}
}
- Other Examples:
'nested.field ~>= "3"'
'fieldName ~<= "5"'
- The
~
operator expects a numeric value as the size. If the value provided cannot be converted to an integer, an error will be returned with an appropriate error code and message (INVALID_SIZE_VALUE
). - Only valid operators combined with
~
(like~=
,~!=
, etc.) are allowed. Using an invalid combination will result in an error.
This project provides a flexible and powerful way to convert query strings into Elasticsearch queries, with support for various query types, nested fields, and date manipulations. The structure and utility functions are designed to be modular and easy to extend for additional features.