Look Mom, NoSQL! - 10. An Introduction to Graph Databases

Category: Data
Last Modified: May 2 2017
May 19 2013

Last year, when I blogged on NoSQL databases I briefly described the types of databases that are usually encompassed by the term NoSQL Databases.  In that series of blogs I reviewed three types of databases, Key-Value Databases, Column-Oriented Databases and Document Databases, before focusing my attention on RavenDB – a document database.

In this blog, I will introduce a fourth class of NoSQL Database – Graph Databases.  While Graph Databases are NoSQL Databases they are significantly different from the other three classes of NoSQL Databases.  Graph Databases are based on the mathematical concepts of graph theory. 

Many of the business problems that we try to solve today are easily modeled as graphs.  In fact as developers we probably use graphs for most of our modeling.  We then try and force these “graphs” to be stored in relational databases or in other NoSQL stores.

Lets look at an example – access control lists or permissions.  When building apps we invariably need to apply a security model to the various resources within our application.  

DotNetNuke has a rich permissions model, so lets take that model as an example.

Figure 1: The DotNetNuke Permission Model

Access_LIst_Graph

Figure 1 shows how we would diagram the DotNetNuke permission model.  In simple terms, Users are members of Roles which are given permissions (View, Edit, Add) to access objects (pages, modules, files).  

This diagram is a directed Graph.  In Graph terminology, User, Role, Permission and Object are nodes, and “Member Of” “Has” and “Can Access” are edges.  Most of us are used to modeling this in a relational database.  We start by creating entities (tables) to represent the 4 nodes in the diagram (Figure 2).

Figure 2: The Permission Model Entities in a Relational Database

Access_List_Entities

That part was easy.  A node in a graph models nicely to a table/entity in a relational model.  But how do we model the “relationships”.  Figure 3 shows the traditional approach to add the relationships as many-to-many or intersect tables.

Figure 3: Adding Relationships in a Relational Database

Access_List_Entities_Relationships

Ok, so in a relational database we model the nodes in a graph with tables and we model the edges (or relationships) with many-to-many or intersect tables, as everything in a relational database is a table. 

This is a lot of tables !!

Most of the time we are going to query what pages our user has permission to view.  This would need a query like the one shown in Figure 4.

Figure 4: SQL to find the Pages that John Doe has Permission to View

Access_List_SQL

In SQL terms joins are relatively expensive and we have 6 Joins in this query, so we would expect this to be a relatively slow query especially in large sites with a large number of users and a large number of pages.  Indexes help but we need to do 6 Clustered Index Seeks/Scans in order to return the list of pages which John Doe can View. (Note: that we (DotNetNuke) don’t actually use a query like that in Figure 4, but have optimized the permission lookups by caching some of the data in memory.)

The real problem here is that we are trying to solve a Graph problem by using Sets (Relational Databases are an implementation of mathematical  Set Theory).  What we really need is a way to store the Graph data natively, and this is where Graph Databases come in. 

Native Graph Databases essentially solve the problem by storing two different pieces of data. The Nodes store the information about the various nodes in the system while the Relationships store information about the relationships that connect the nodes.  In addition the nodes also store all the “ids” of the relationships that each node is involved in and each relationship stores the source and destination node. 

Nodes are indexed so once you find the starting node (in this case the User with the User Name John Doe), it is then just an exercise in traversing the “relationships”.  In relational database terms there is one Index seek/scan and then a number of relatively “fast” traversals.  The traversals are relatively immune to scale as we don’t need to do any lookups (we just have to follow links) so the only impact on scale is the initial lookup of the User in the Index, so Graph Databases tend to scale much better with complicated models like this.

This explains why Graph Databases are becoming more popular.  Many of todays problems – social websites, recommendation engines, security models are best modeled by using a Graph Database and these databases scale complicated relationships much better.  In future blogs I will take a closer look at Graph Databases – Neo4j in particular.

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

Tags