Configuration¶
The Elasticsearch client in Invenio is configured using the two configuration
variables SEARCH_CLIENT_CONFIG
and
SEARCH_ELASTIC_HOSTS
.
Invenio-Search relies on the following two Python packages to integrate with Elasticsearch:
Hosts¶
The hosts which the Elasticsearch client in Invenio should use are configured using the configuration variable:
-
invenio_search.config.
SEARCH_ELASTIC_HOSTS
= None¶ Elasticsearch hosts.
By default, Invenio connects to
localhost:9200
.The value of this variable is a list of dictionaries, where each dictionary represents a host. The available keys in each dictionary is determined by the connection class:
elasticsearch.connection.Urllib3HttpConnection
(default)elasticsearch.connection.RequestsHttpConnection
You can change the connection class via the
SEARCH_CLIENT_CONFIG
. If you specified thehosts
key inSEARCH_CLIENT_CONFIG
then this configuration variable will have no effect.
Clusters¶
Normally in a production environment, you will run an Elasticsearch cluster on one or more dedicated nodes. Following is an example of how you configure Invenio to use such a cluster:
SEARCH_ELASTIC_HOSTS = [
dict(host='es1.example.org'),
dict(host='es2.example.org'),
dict(host='es3.example.org'),
]
Elasticsearch will manage a connection pool to all of these hosts, and will automatically take nodes out if they fail.
Basic authentication and SSL¶
By default all traffic to Elasticsearch is via unencrypted HTTP because Elasticsearch does not come with built-in support for SSL unless you pay for the enterprise X-Pack addition. A cheaper alternative to X-Pack is to simply setup a proxy (e.g. nginx) on each node with SSL and HTTP basic authentication support.
Following is an example of how you configure Invenio to use SSL and Basic authentication when connecting to Elasticsearch:
params = dict(
port=443,
http_auth=('myuser', 'mypassword'),
use_ssl=True,
)
SEARCH_ELASTIC_HOSTS = [
dict(host='node1', **params),
dict(host='node2', **params),
dict(host='node3', **params),
]
Self-signed certificates¶
In case you are using self-signed SSL certificates on proxies in front of
Elasticsearch, you will need to provide the ca_certs
option:
params = dict(
port=443,
http_auth=('myuser', 'mypassword'),
use_ssl=True,
ca_certs='/etc/pki/tls/mycert.pem',
)
SEARCH_ELASTIC_HOSTS = [
dict(host='node1', **params),
# ...
]
Disabling SSL certificate verification
Warning
We strongly discourage you to use this method. Instead, use the method
with the ca_certs
option documented above.
Disabling verification of SSL certificates will e.g. allow man-in-the-middle attacks and give you a false sense of security (thus you could simply use plain unencrypted HTTP instead).
If you are using a self-signed certificate, you may also disable verification
of the SSL certificate, using the verify_certs
option:
import urllib3
urllib3.disable_warnings(
urllib3.exceptions.InsecureRequestWarning
)
params = dict(
port=443,
http_auth=('myuser', 'mypassword'),
use_ssl=True,
verify_certs=False,
ssl_show_warn=False, # only from 7.x+
)
SEARCH_ELASTIC_HOSTS = [
dict(host='node1', **params),
# ...
]
The above example will also disable the two warnings (InsecureRequestWarning
and a UserWarning
) using the ssl_show_warn
option and urllib3 feature.
Again, we strongly discourage you from using this method. The warnings are
there for a reason!
Other host options¶
For a full list of options for configuring the hosts, see the connection classes documentation:
elasticsearch.connection.Urllib3HttpConnection
(default)elasticsearch.connection.RequestsHttpConnection
Other options include e.g.:
url_prefix
client_cert
client_key
Client options¶
More advanced options for the Elasticsearch client are configured via the configuration variable:
-
invenio_search.config.
SEARCH_CLIENT_CONFIG
= None¶ Dictionary of options for the Elasticsearch client.
The value of this variable is passed to
elasticsearch.Elasticsearch
as keyword arguments and is used to configure the client. See the available keyword arguments in the two following classes:If you specify the key
hosts
in this dictionary, the configuration variableSEARCH_ELASTIC_HOSTS
will have no effect.
Timeouts¶
If you are running Elasticsearch on a smaller/slower machine (e.g. for development or CI) you might want to be a bit more relaxed in terms of timeouts and failure retries:
SEARCH_CLIENT_CONFIG = dict(
timeout=30,
max_retries=5,
)
Connection class¶
You can change the default connection class by setting the connection_class
key (e.g. use requests library instead of urllib3):
from elasticsearch.connection import RequestsHttpConnection
SEARCH_CLIENT_CONFIG = dict(
connection_class=RequestsHttpConnection
)
Note, that the default urllib3 connection class is more lightweight and performant than the requests library. Only use requests library for advanced features like e.g. custom authentication plugins.
Connection pooling¶
By default urllib3 will open up to 10 connections to each node. If your
application calls for more parallelism, use the maxsize
parameter to raise
the limit:
SEARCH_CLIENT_CONFIG = dict(
# allow up to 25 connections to each node
maxsize=25,
)
Hosts via client config¶
Note, you may also use SEARCH_CLIENT_CONFIG
instead of SEARCH_ELASTIC_HOSTS
to configure
the Elasticsearch hosts:
SEARCH_CLIENT_CONFIG = dict(
hosts=[
dict(host='es1.example.org'),
dict(host='es2.example.org'),
dict(host='es3.example.org'),
]
)
Other client options¶
For a full list of options for configuring the client, see the transport class documentation:
Other options include e.g.:
url_prefix
client_cert
client_key
Index prefixing¶
Elasticsearch does not provide the concept of virtual hosts, and thus the only way to use a single Elasticsearch cluster with multiple Invenio instances is via prefixing index, alias and template names. This is defined via the configuration variable:
Warning
Note that index prefixing is only prefixing. Multiple Invenio instances sharing the same Elasticsearch cluster all have access to each other’s indexes unless you use something like https://readonlyrest.com or the commercial X-Pack from Elasticsearch.
-
invenio_search.config.
SEARCH_INDEX_PREFIX
= ''¶ Any index, alias and templates will be prefixed with this string.
Useful to host multiple instances of the app on the same Elasticsearch cluster, for example on one app you can set it to dev- and on the other to prod-, and each will create non-colliding indices prefixed with the corresponding string.
Usage example:
# in your config.py SEARCH_INDEX_PREFIX = 'prod-'
For templates, ensure that the prefix __SEARCH_INDEX_PREFIX__ is added to your index names. This pattern will be replaced by the prefix config value.
Usage example in your template.json:
{ "index_patterns": ["__SEARCH_INDEX_PREFIX__myindex-name-*"] }
Index creation¶
Invenio will by default create all aliases and indexes registered into the
invenio_search.mappings
entry point. If this is not desirable for some
reason, you can control which indexes are being created via the configuration
variable:
-
invenio_search.config.
SEARCH_MAPPINGS
= None¶ List of aliases for which, their search mappings should be created.
- If None all aliases (and their search mappings) defined through the
invenio_search.mappings
entry point in setup.py will be created. - Provide an empty list
[]
if no aliases (or their search mappings) should be created.
For example if you don’t want to create aliases and their mappings for authors:
# in your `setup.py` you would specify: entry_points={ 'invenio_search.mappings': [ 'records = invenio_foo_bar.mappings', 'authors = invenio_foo_bar.mappings', ], } # and in your config.py SEARCH_MAPPINGS = ['records']
- If None all aliases (and their search mappings) defined through the