Graph databases explained
For a long time, ‘big data’ has been a byword for our society’s ongoing digitalization. The wide availability of large quantities of data, however, also provides us with a few challenges: That is to say, rapidly growing, fast-paced and weakly structured volumes of data also require high-performance IT solutions so that they can be effectively analyzed and used.
One database model that can handle highly interconnected information is the graph database. It also provides an answer to the problems of the classical, relational database, which quickly comes up against its limits when handling large and complex data sets. Graph databases, then, rank among the modern database alternatives that are free of the traditional, relational approach, and that are brought together under the umbrella term NoSQL (‘Not only SQL’). But how exactly does a graph database work, and what advantages does its structure offer?
What is a graph database?
As its name suggests, a graph database is modeled based on graphs. These graphs represent complex, interconnected information as well as the relationships within it in a clear way, and they store this data as a large, coherent data set. The graphs are made up of nodes – clearly labeled and identifiable data entities and objects – and edges. The latter involves the relationships between the objects. Both components are represented visually as points and lines. Edges each have a start and end point, while each node always has a certain number of relationships to other nodes, whether incoming, outgoing, or undirected.
Established concepts for constructing such graph databases are the labeled property graph and the resource description framework (RDF): With the former, certain properties are assigned to both the nodes and the edges. In the resource description framework (RDF), meanwhile, the modeling of the graph is regulated using triples and quads. Triples consist of three elements in the pattern node-edge-node. Quads complement triples with additional contextual information, which makes it easier to organize the latter into groups.
How do queries work in a graph database?
There is a wide range of query possibilities that can be exploited when using a graph database. The main reason for this is that there is no uniform query language. Unlike traditional models, graph databases count on special algorithms to fulfill their essential function: simplifying and speeding up complicated data queries.
Two of the most important algorithms are the depth-first search and the breadth-first search: The depth-first searches for the next node below in each case, while the breadth-first search moves from layer to layer. The algorithms make it possible to find graph patterns as well as direct and indirect adjacent nodes. Other algorithms make it possible to calculate the shortest path between two nodes, and to identify cliques (subsets of nodes) and hotspots (information that is particularly highly interconnected). One of the strengths of the graph database is that relationships are stored in the database itself, so they don’t need to be calculated in the query. This results in a high performance speed, even for complicated queries.
Differentiation from relational databases and other NoSQL databases
Relational databases have become established as the standard in databases since they first appeared in 1970. Unlike graph databases, they work based on tables that organize the relations of data sets, called tuples, into individual rows. In the columns, meanwhile, characteristics with varying attribute values can be illustrated. Except with regard to structure and the composition, their functioning is also fundamentally different from representation by graph. In order to be able to represent and store relationships with highly interconnected information, several tables must be laboriously linked and offset with one another. With large quantities of data, this can often prove time-consuming and expensive.
While table-based databases exclusively use the query language SQL (“structured query language”), the more modern NoSQL databases are increasingly moving away from this query language and the relational concept it is affiliated with – an approach that graph databases, as a member of the NoSQL family, also follow. Alongside graph databases, lots of other models, such as key-value databases, column-oriented databases and document-oriented databases also belong to this family. These principally process and store more structured and less interconnected data sets.
What are graph databases used for?
Graph databases can be used for many different sectors and purposes. They allow interconnected information to be analyzed, and processes and connections to be understood, evaluated and made useful.
A typical example use of graph databases is in analyzing user relationships in social networks or users’ buying behavior in online shops. Targeted product and friend suggestions can be made based on different data and relationships, for example, allowing individual personal and product networks to be built up. Businesses also benefit from the possibility of creating comprehensive customer profiles based on information from search queries, click histories and other components. Graph databases are used in supply chain management to track all processes, from design right through to sales. Finally, the databases are used for risk assessments, fraud detection and debugging.
The advantages and disadvantages of graph databases
The strength of a database can be measured using four principal factors: Integrity, performance, efficiency and scalability. The data query ought to become quicker and simpler – the main purpose of graph databases can be roughly summarized in this way. Where relational databases reach their capacity limits, the graph-based model is particularly agile, because complexity and the quantity of data don’t negatively influence the query process in this model.
Also, with the graph database model, real facts can be stored in a natural way. The structure used is very similar to human thinking, and this is why the links are so clear. Graph databases are not a complete solution, though. They are limited, for example, where scalability is concerned. As they are principally designed for one-tier architecture, growth represents a (mathematical) challenge. Plus, there is still no uniform query language.
An overview of the advantages and disadvantages of graph databases:
|Query speed only dependent on the number of concrete relationships, and not on the amount of data||Difficult to scale, as designed as one-tier architecture|
|Results in real time||No uniform query language|
|Clear and manageable representation of relationships|
|Flexible and agile structures|
Graph databases should not be considered generally to be an absolute better replacement for conventional databases. Relational structures remain entirely reasonable standard models, guaranteeing high data integrity and stability, and permitting flexible scalability. As so often, the same applies here: It all depends on the intended purpose!
An overview of the best-known graph databases
- Neo4j: Neo4j is the most popular graph database and is conceived as an open-source model.
- Amazon Neptune: This graph database can be used with the public cloud for Amazon Web Services and was released in 2018 as a high-performance database.
- SAP Hana Graph: With SAP Hana, the developer SAP has created a platform that builds upon a relational database management system and that is complemented by the integrated, graph-oriented model SAP Hana Graph.
- OrientDB: This graph database is one of the quickest models currently available.