Win a copy of 97 Things Every Java Programmer Should Know this week in the Java in General forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Frits Walraven
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • salvin francis
  • fred rosenberger

Scaling out with Neo4J?

 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've seen a fair amount of evidence online (videos, blogs etc) describing how Neo4j performs very well when scaling up data volumes on a single machine, partly because relationships (edges) between data items can be followed directly instead of indirectly via FKs as in relational systems, and Neo4j's ACID transaction support is a real bonus in the NoSQL world. But to what extent is it amenable to scaling out across multiple servers in the way that Hadoop or MongoDB allow? How far is it possible to achieve high performance across multiple servers, given the fact that your data will inevitably have dependencies that cross machine boundaries, affecting how your transactions are managed?

I can see potential for Neo4j in some applications in my workplace, where we currently have a lot of relationships between tables on a strictly 3NF (unfortunately!) relational database that require lots of joins, because those relationships are expensive to traverse and Neo4j would make that process much more efficient. But I'm not clear on how far it would be possible to scale that solution out if the need arose, especially as the general trend is away from scaling up on big expensive kit and towards scaling out on cheaper servers.
 
Author
Posts: 10
5
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Chris,

This is a great question.

Scaling graphs horizontally in terms of storage (data sharding) is a hard problem. Traversing graph with the cost of network hop for a relationship can quickly become too expensive.
So in those terms, Neo4j will not compete with likes of Hadoop in the near future.

What Neo4j does provide is a very intelligent caching engine (both JVM heap caching and off-memory caching), which should speed things significantly if entire data set can fit into memory.
For cases where data is larger then the amount of memory available, you can run Neo4j in a clustered master-slave setup. Each node (master and slave) will have the entire copy of the data, but by smart routing within your application you keep different data sets cached on different nodes, making traversals scalable on large data sets.
In addition, code Neo4j data structures are very small - 9 bytes for a node and 33 bytes for a relationships - which means that you can fit billions of nodes and relationships in a standard server RAM memory for example. It is properties that add size to the data - and if you are after core traversal performance, you can store only minimal amount of information in Neo4j and use other storage system for persist other data - this is what polyglot persistence i all about.

Aleksa
 
chris webster
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks again, Aleksa. It sounds like server-based Neo4j would probably be fine for some of our applications, and I like the idea of polyglot persistence using the right DB for the right kind of data. Just need to persuade my bosses...
 
    Bookmark Topic Watch Topic
  • New Topic