Pages

Persistence to Neo4j graph datastores

Sunday, July 29, 2012
Whilst DataNucleus JDO/JPA already supported persistence and querying of objects to/from RDBMS (all variants), ODBMS (NeoDatis), Documents (XML, Excel, ODF), Web (JSON), Document-based (MongoDB), Map-based (HBase, AppEngine, Cassandra), as well as others like LDAP and VMForce, it was clear that we didn't yet have a plugin to any of the nice new graph datastores like Neo4j. To this end, we now provide a new store plugin, supporting persistence to Neo4j.

Usage

Just like all of the other store plugins we aim to make its usage as seamless and transparent as possible so that you, the user, has a high level of portability for your application. In simple terms you just mark your model classes with JDO or JPA metadata (annotations or XML) just as you would do for RDBMS (or any other datastore), and write your JDO or JPA persistence code in the normal way. The only difference is that the data is persisted into Neo4j transparently. I've not had time to write up a tutorial yet, but the model and persistence code would be identical to persisting to any other datastore, just that in the definition of the datastore "URL" it would be something like 
datanucleus.ConnectionURL=neo4j:{my_datastore_location}

Refer to the DataNucleus docs for more details. Note that the plugin is not yet released, but is available as a nightly build for anyone wishing to give it a try


Currently supported

  • Each object of a class becomes a Neo4j Node.
  • Supports datastore identity, application identity, and nondurable identity
  • Supports versioned objects
  • Fields of all primitive and primitive wrappers can be persisted
  • Fields of many other standard Java types can be persisted (Date, URL, URI, Locale, Currency, JodaTime, javax.time, plus many more)
  • 1-1, 1-N, M-N, N-1 relation is persisted as a Neo4j Relationship (doesn't support Map fields currently)
  • JDOQL/JPQL queries can be performed, and the operators &&, ||, ==, !=, >, >=, <, <= are processed using Cypher, with any remaining syntax handled in-memory currently.
  • Support for using Neo4j-assigned "node id" for "identity" value strategy.
  • Checks for duplicate object identity
  • Embedded (and nested embedded) 1-1 fields, and querying of these fields


Likely supported soon

  • Processing of more JDOQL/JPQL syntaxis in Cypher to minimise any in-memory processing
  • Support for backed SCO collection wrappers allowing more efficient Relationship management.

Feedback is welcome (over on the DataNucleus Forum, or below in the comments). Additionally if anyone with more experience in Neo4j who would like this plugins capabilities to be enhanced why not get involved? You contribute a few patches for example - the source code is available here, and the issue tracker is a good place to start
Enjoy!

7 comments:

  1. Are 1:M and N:M relations stored as many relationships in Neo4j, or just one?

    ReplyDelete
    Replies
    1. If an object A has a collection field (a 1:N) with elements B1, B2 then this is represented as 2 Relationship objects
      A->B1
      A->B2
      i.e it doesn't also have B1->A, B2->A. Relationship objects are added only for the owning side of the relation (even if the relation is bidirectional).

      Delete
  2. Andy,

    What are your thoughts on the seeming impedance mismatch wrt Neo4j's relationships? We see them as first-class citizens: http://static.springsource.org/spring-data/data-graph/snapshot-site/reference/html/#tutorial_relationships

    ReplyDelete
    Replies
    1. Lasse,
      the vast majority of (Java) applications have relationships without props hence I'm (currently) catering for (my) main use-case. Obviously if the user wants an 'attributed relation' between class A and B, and introduces an 'intermediate class' AB to contain the properties of the relation then this would currently get persisted as a Node.

      Doesn't mean that this plugin can't be updated in a future release to allow the user to tag each 'intermediate class' (with two object references) as representing a relation, with the related objects being marked; wouldn't consider that a major amount of work. Obviously JDO allows @Extension annotations just for such a thing (whereas JPA is lacking - but then its designed for RDBMS only anyway).

      Delete
  3. Hi Andy,

    Unless Im missing something, Its a little misleading to imply that Datanucleus JDO/JPA support persistence to Cassandra. Whatever support there was was based on a very old version of datanucleus and seems to have been discontinued a long time ago now. Given the popularity of Cassandra are there any plans to develop a cassandra plugin in house?

    ReplyDelete
    Replies
    1. Fail to see anything misleading in that the DN docs list Cassandra as *working with DN2.2*, as per http://www.datanucleus.org/products/accessplatform_3_2/datastores/index.html All "inactive" plugins are listed with the version of DN they work against.

      I personally have no plans (if that's what you mean by "in-house" ... "in-house" to me only applies to companies and I'm not a company). This project is open-source, open to anyone to get involved, add plugins, add features, add tests etc, and always has been. If you want a plugin you could easily get involved, take that existing plugin and upgrade it, publicise it so others can help. Quite a long time ago I asked the person who has the most recent version of the DN2.2 Cassandra plugin what he help he needed in getting it to a recent version ... and got no response. Yes it would be nice to have a plugin for it, but then why should I spend my spare time doing it when I'd have to spend it getting used to Cassandra itself first, and users of said plugin would give nothing back. You or your company could sponsor such work, but without payback of some form my time won't be involved in it (other than answering questions on how to implement things if somebody gets serious about doing it).

      That's the state of open source, sadly, with 95+% only interested in using other peoples work, not thinking of the benefits of getting involved and what could be achieved.

      Delete
  4. Andy,

    Aloha.

    Can you please explain how this works:

    datanucleus.ConnectionURL=neo4j:{my_datastore_location}

    When I put something like:

    datanucleus.ConnectionURL=neo4j:testDB

    Then a local directory tree called "testDB" is created on my machine. However, is there a way to connect to a running neo4j DB server? If so, can you please provide a concrete example?

    ReplyDelete