Into the Matrix with Neo4j: 3 - Why Graphs

Category: Data
Tags: neo4j graph
Last Modified: May 2 2017
Oct 28 2014

In this blog post I am going to return to my investigation of the graph database Neo4j.  So far I have done a brief introduction to Neo 4j and I have shown how easy it is to install Neo 4j in an Azure VM.

Lets now turn our attention to what is a graph database.  Nowadays we tend to think of Social Graphs when we think of graphs - who is my friend or who is following me or who do I follow,  but in reality most of todays problems can be modelled by graphs. 

Actually as developers we tend to do our initial white-board modelling in the form of graphs.  If we are trying to model a shopping cart we will draw a product on the board often as a circle then we will draw an order also as a circle.  We then define a relationship – an order has a product - by drawing a line between the two.

Graph_WhiteBoard

This is a basic graph.

Once we have finished our design we try and convert this conceptual model into a database design by creating a Products table and an Orders table and because an Order can have many Products AND a Product can belong to many Orders we model the relationship with another table ProductOrders – which in relational database terms we use to model the join between the tables.

But why should we force our graph into a relational database which is based on set theory.  What if we could use a database designed to support relationships as a first class citizen – i.e. a Graph Database.

Native Graph Databases essentially solve the problem by storing two different pieces of data. The Nodes store the information about the various nodes in the system (the Product and Order) while the Relationships store information about the relationships that connect the nodes (has a).  In addition the nodes also store all the “ids” of the relationships that each node is involved in and each relationship stores the source and destination node.

Nodes are indexed (using Lucene) so once you find the starting node, it is then just an exercise in traversing the “relationships”.  In relational database terms there is one Index seek/scan and then a number of relatively “fast” traversals.  The traversals are relatively immune to scale as we don’t need to do any lookups (we just have to follow links) so the only impact on scale is the initial lookup in the Index.

This explains why Graph Databases are becoming more popular.  Many of todays problems – social websites, recommendation engines, security models are best modeled by using a Graph Database and these databases scale complicated relationships much better.

In the next blog in this series I will start to dive deeper into how to use Neo4j.

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

Tags