• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Tim Holloway
  • Carey Brown
  • salvin francis

Neo4j - analytics?

 
Ranch Hand
Posts: 125
1
Clojure Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jonas Partner, Aleksa Vukotic, and Nicki Watt,

Thanks for taking place in the ranch book giveaway.

I am experienced using traditional SQL databases, but I don't have a strong background in the related theory. My knowledge of graph theory is even weaker. I'm trying to understand how you would run analytical queries on Neo4j, and how well it would support them. To use the social network as an example, is it easy to say, "give me a list of all users and the number of friends they have, ordered by the number of friends, descending" (which of course is trivial to do with an SQL select)? Does it perform fairly well for that? If the Neo4j graph stores the date a relationship is established, could I write a query equivalent to, "How many people on the network added 30 friends in March 2014?" ?

That's my main question. As a minor question, I'm curious what the largest Neo4j data stores in production, that you're aware of, are.

Thanks for your time.
-Mike

 
Author
Posts: 10
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Mike,

thanks for taking a part in this Q&A session.

Doing the same query in Cypher (Neo4j query language) would be equally simple:

MATCH (user:USER)-[rel:IS_FRIEND_OF]-x return user, count(rel) order by count(rel) DESC

One caveat is that query like this would scan entire graph (all users and all their IS_FRIEND_OF relationship), and would require to store the counts in memory for sorting.
For best performance, query should have as little start nodes as possible and touch as little properties as possible. However, this query would still perform reasonably well for a db of few million users for example.

Neo4j's sweet spot is the real time analytics for a few starting nodes (for example what does customer buys at the same time with products A and B).
As for the time based query, that is possible as well, but it would be up to application to store relevant timestamps as a relationship property. So the query would look like this:

MATCH (user:USER)-[rel:IS_FRIEND_OF]-x
where rel.timestamp>='20140301'
AND rel.timestamp>='20140301'
AND count(rel) > 30
return count(user)

The Neo4j largest setup I have been involved with had ~1TB of data, was running on 3 nodes and had approx. 50 million nodes and few billion relationships.

Aleksa
 
Michael Swierczek
Ranch Hand
Posts: 125
1
Clojure Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aleksa Vukotic,

Thank you for the quick and detailed response.
 
A Vukotic
Author
Posts: 10
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just realized I had a typo in the second query. It should be:
MATCH (user:USER)-[rel:IS_FRIEND_OF]-x
where rel.timestamp>='20140301'
AND rel.timestamp<='20140331'
AND count(rel) > 30
return count(user)
 
Do not set lab on fire. Or this tiny ad:
create, convert, edit or print DOC and DOCX in Java
https://products.aspose.com/words/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!