How to Represent Real-World Relationships in a Graph Database

At CyberGRX we employ several database management systems as solutions to specific problems we encounter. One database management system we use has a manner of structuring stored data referred to as a property graph model. We’ll use some arbitrary data to illustrate just what this means:

Let’s say we have two employees who each have a name and an employee ID number and a company also with a name. It is well known that these two employees work together at this company, but the issue is that this relationship is not actually stored with the data and we’d like it to be.

The property graph has the intent of reclaiming the simplicity of design for storage that can be lost with Relational storage models when systems grow, becoming larger and more complex,  while retaining the inherent relatedness of the data being stored.

In our employee and company example, we could group the employee information (comprised of names and an employee ID) into a record, the basic unit of storage in a property graph database. The company could also be grouped into a record comprised of its name. The information grouped, the names, and ID’s, are referred to as properties, the namesake of the property graph model.

It is important to note that the term ‘graph’ used in this context refers to a subset of mathematics named graph theory. A basic premise of graph theory is that one node can be connected to another node by an edge, much like one dot is drawn to another dot with a line.

In this scenario, it is appropriate to think of one node as our employee record, one node as our company record, and the edge as the relationship between the two. Within the property graph model, we may name the relationship – a name that conveys the employee ‘works for’ the company. In this fashion, we have stored the concept that our stored employee record works for our stored company record.

An interesting benefit to storing data in this way is how the graph allows one to infer relationships between the stored data as well. If each of our employee records ‘work for’ the company record, it can be inferred that the employees most likely know one another, and a new relationship can be created to persist this assumption in the stored data.

Some property graph databases even go so far as to store the information natively as a graph on the hardware, with pointers in memory representing the relationships and the nodes represented by the actual blocks in memory. In this way, data can be retrieved quickly and intuitively.

As a system grows in complexity, the graph model may create new relationships between the connected data and alter the grouped data to change with the system all while allowing new relationships and patterns that weren’t immediately apparent at first to emerge. Discovering well-connected employees and influential companies through node centrality becomes as simple as reading the data stored. The larger and more complex the dataset, the more valuable this type of available insight becomes.

One reason why CyberGRX continues provide an excellent product is because of our desire to leverage technologies like graph databases that provide us with meaningful insights on risk and aid in implementing new features that will better serve our growing community.





Leave a Reply