G-Store: A Storage Manager for Graph Data

Many modern applications are based on graph data. Social networks, for instance, are based on graphs that describe relationships among people. Paths of disease outbreaks form a graph, as do airline routes, and citations among academic papers. These graphs often contain a massive amount of data.

Relational databases are today’s system of choice for storing and querying large amounts of data. Their use has been backed up by decades of research and commercially successful database management systems such as Oracle, IBM DB2, and PostgreSQL. Relational databases are extremely fast in finding, filtering, inserting, deleting, and updating information in a database table. Joining information from several tables, on the other hand, often takes orders of magnitude longer.

From a theoretical point of view, the relational database model is not a good fit for representing highly interconnected data. Regardless, as businesses and processes around the world get more interconnected, relational databases are increasingly used for storing such data. Perhaps one reason is the lack of a suitable, stable, and well-supported alternative.

G-Store is a prototype of a storage manager for large vertex-labeled graphs. G-Store exploits the structure of the graph to derive a data placement on disk that is optimized for access patterns found in graph queries. The placement strategy is based on a multilevel algorithm that partitions the graph into pages and arranges these pages on disk to minimize the distance on disk between adjacent vertices. G-Store has a built-in query engine that supports depth-first traversal, reachability testing, shortest path search, and shortest path tree search.

System Prototype

G-Store consists of approximately 12,000 lines of C/C++ code and runs on Windows. Visual C++ 2010 or this redistributable package must be installed. The latest version of the G-Store executable (version 0.11c) can be downloaded here. An experimental 64-bit version is available here. G-Store is alpha software and currently requires administrator privileges to run. Source code can be downloaded here. It would be great if you could let us know how you plan to use G-Store.

You can download this dataset to try out G-Store on the Simple English Wikipedia graph. The graph has 0.1 million vertices (articles including redirects) and 1.3 million edges (links between articles). Each vertex is labeled with the title, the first sentence, the creation date, the creation user, and several IDs for an article. To store the graph with G-Store, extract the archive into the folder with the G-Store executable and enter RUN "schema.g"; into G-Store’s command line interface. G-Store is now ready to answer queries. The following query finds eight paths with a length of four or less from article “Adam Smith” to either article “Bread” or article “Butter”:

G-Store> SELECT LEVEL, PATH(title,'-')

G-Store> WHERE (title = "Bread" OR title = "Butter") AND LEVEL <= 4 AND ROWNUM <= 8

G-Store> START WITH title = "Adam Smith";

↓

LEVEL;PATH(title,'-')

3;-Adam Smith-Economics-Inflation-Bread

4;-Adam Smith-Sympathy-Like-Sandwich-Bread

4;-Adam Smith-Book-Glue-Wheat-Bread

4;-Adam Smith-Book-Glue-Milk-Butter

4;-Adam Smith-Ethics-Economics-Inflation-Bread

4;-Adam Smith-Scotland-Human-Cooking-Bread

4;-Adam Smith-Scotland-Human-Cooking-Butter

4;-Adam Smith-Scotland-Human-Continent-Bread

Found 8 records.

This query finds a path with minimum length from article “Alps” to article “Oxford” that does not go through article “France” or article “City”:

G-Store> SELECT title, abstract IN SHORTEST PATH

G-Store> START WITH title = "Alps"

G-Store> END WITH title = "Oxford"

G-Store> THROUGH title != "France" AND title != "City";

↓

title;abstract

Alps;The Alps are one of the great mountain range systems of Europe

Cold;Cold is a relative term used in comparison with the adjective warm

Mars;Mars is the fourth planet from the Sun in our Solar System

Pluto;Pluto is the second-largest dwarf planet in the Solar System

Oxford;Oxford is a city in England on the River Thames

Found path of length 4. Found 5 records.

Learn more about how to use G-Store.

Documentation

• Emanuel Ferm: Block-aware Query Evaluation in Graph Databases Using G-Store, MSc in CS, Oxford, 2011

• G-Store Changelog

• Robin Steinhaus, Dan Olteanu, and Tim Furche: G-Store: A Storage Manager for Graph Data, Technical report, Oct 2010

• Robin Steinhaus: G-Store: A Storage Manager for Graph Data, MSc in CS, Oxford, 2010

Current Team

• Dan Olteanu*

• Robin Steinhaus†

• Tim Furche*

• Emanuel Ferm‡