This thesis introduced G-Store, a prototype of a storage manger for graph data. It described the theoretical foundation of G-Store’s multilevel storage algorithm and showed how G-Store can solve queries that traditional database management systems struggle with. Initial experimental results are promising and confirm the usefulness of the underlying ideas of G-Store.
A tool like G-Store is potentially useful in various applications, which is why we intend to make G-Store version 0.2 available through a source repository. G-Store version 0.1 has been developed under a tight schedule and lacks some features that a user might expect or find useful:
• G-Store uses the Windows API to implement unbuffered disk access and therefore works only in a Windows environment. It should not be difficult to transfer the functions to UNIX.
• The current storage algorithm requires that the input graph (without labels) can be represented in main memory once. On a machine with 3 GB memory, we successfully tested G-Store’s storage algorithm for a directed input graph with 4 million vertices and 17 million edges. The algorithm ran out of memory for a directed input graph with 5 million vertices and 70 million edges. We plan to implement an option to run the storage algorithm successively on parts of a large input graph.
• In the current version, G-Store’s query engine must be restarted to load a graph in a different working directory. We intend improve the support for multiple graphs.
• A storage manager should support the insertion and deletion of data. An important question is how the insertion or deletion of a vertex or an edge should affect the placement of other vertices.
• Finally, although this is not a priority, we will think about implementing parts of the algorithm for a parallel infrastructure.