Miscellaneous docs / pre-emptive justification¶
Search Dispatchers, Resolvers and Indexers¶
Search Dispatchers¶
Search dispatchers abstract the interface of a particular search engine and provide an asynchronous interface to submit some query and retrieve the results. At present the interface is unstable and highly subject to change.
Search Dispatcher Logging¶
It can be useful to see exactly what Solr search is being dispatched.
This can be enabled by setting the eu.ehri.project.search.solr
logger to DEBUG
by adding the following like to
conf/logback-play-dev.xml
:
<configuration>
<!-- Lots of stuff... -->
<logger name="eu.ehri.project.search.solr" level="DEBUG" />
<!-- Lots more stuff... -->
</configuration>
Search Resolvers¶
When we get some search results from the dispatcher the first thing we
often want to do is look them up in the database for a fully-formed
instance of the items returned. The Resolver
type handles this task.
It is a separate 'thing' for several reasons:
- there are several ways to do bulk lookups depending on the characteristics of the backend
- we might want to use a different bulk lookup strategy in testing
The simplest way to look up a set of, say, 20 search results in the DB
would be to iterate over them and issue a separate call for each one
depending on its type and id. This, however, would be incredibly slow. A
better approach would be to look them all up in one go in a manner
analogous to an SQL WHERE id IN ('foo', 'bar')
clause. The EHRI REST
backend provides a way to do this with both synthetic string identifiers
(the ones EHRI derives ourselves) and the internal (long) graph
identifier.
Doing bulk lookups with the synthetic string identifiers is simple but has the disadvantage with our current REST backend that because there is a single global index bulk lookups get slower as the graph is populated with more material.
Bulk lookups using native graph identifiers are much faster since these - in Neo4j at least - are essentially pointers to a position on disk. Bulk lookups therefore stay more or less 0(1) regardless of how much stuff the graph contains (assuming it can all fit into memory.) However, this is an implementation detail, and, moreover, since native graph IDs are not stable, it cannot reliably be used during testing.
At runtime, the application therefore uses a Resolver
implementation
that uses native graph IDs mapped from the gid
field from the search
result. While testing we use an implementation that uses synthetic
string IDs.
Search Indexers¶
Search indexers provide an asynchronous interface to instruct the search engine that some data has changed in the database and that it should re-index the relevant items.