Loading JSON Data Dumps¶
A common scenario is exporting data from a production DB to a local testing instance. There are a few ways to do this:
- using a Neo4j hot backup of the
- using a GraphSON dump
Exporting GraphSON into a local instance¶
First, open an SSH port-forward to the local machine, e.g:
ssh [EHRISERVER] -L7777:localhost:7474
Neo4j typically runs on port 7474, so the remote instance will be available on port 7777 on the local machine.
Now, make sure the local Neo4j instance is stopped and that the local
graph.db directory is either non-existent or completely empty. We
can then combine the
export-graphson web service method with the
graphson command-line tool to pipe data from the (online) remote
instance to the (offline) local one line so:
curl -H "X-User:admin" http://localhost:7777/ehri/admin/export-graphson | ./scripts/cmd graphson --load - --buffer-size 100000 && ./scripts/cmd gen-schema
The options are:
--load -(load from stdin)
--buffer-size 10000(flush the TX every 100000 primitives loaded, to avoid running out of memory)
The follow on command
./scripts/cmd gen-schema is necessary to
instantiate the Neo4j indexes and constraints.
Neo4j's lax read isolation can cause problems with GraphSON dumps, since it's possible for edges to exist in the GraphSON that refer to non-existent nodes (this can happen if the data changes while the export is taking place.) In this case, loading the dump will subsequently fail with the following error:
Error: Given element was not created in this baseGraph
If in doubt, prevent graph modification from taking place, or use online backup instead.