Steve Dekorte and Rich Collins 2009
Creative Commons Attribution License 3.0



The Vertex Database


Introduction

    VertexDB is a high performance graph database server that supports automatic garbage collection. It uses the HTTP protocol for requests and JSON for it's response data format and the API is inspired by the FUSE filesystem API plus a few extra methods for queries and queues.

    Performance

    Requests per second are typically network i/o bound and should be in the thousands on OSX, and possibly tens of thousands on Linux (on a typical PC of circa 2008).

    Integrity

    On startup, VertexDB makes a backup of the database (if it exists). When shut down properly, it creates a file that marks the database as being safely stored. When started, if this file is not found, the database file is replaced by it's old copy. There is also an admin backup request that can be sent to the database to store a backup at any time.

    Security

    The current way to secure the server is by running it on an isolated network. HTTP auth support will be added in the future.

    Applications

    Graph databases aren't ideal for all uses, but can be more suitable than other models for dealing with changing or informal schemas and data that is more naturally represented as a graph. It's atomic queue APIs can also make vertexdb useful as a high performance persistent queueing server.

    Implementation

    The current implementation is in C and built on top of:

    • TokyoCabinet, a b-tree key/value disk store
    • libevent, for asynchronous sockets and HTTP
    • Yajl, for JSON generation

    VertexDB uses a single OS thread and handles all requests serially, but uses async sockets so no request is blocked on socket i/o (though it can still block on disk i/o).

    Requests should be generally designed such that responses don't spend too much time blocking on disk i/o or using too much memory in queued response buffers. This can be done by "paging" requests by limiting their results count and repeatedly using a query that selects the next set of matches.

    Future

    Possible future features:

    • FUSE interface
    • distributed redundant servers
    • incremental garbage collection
    • distributed synced in-memory transactions
    • automatic index creation
    • a NodeJS implementation which uses JS for queries?


Basics

    Keys, Values and Terminology

      VertexDB is composed of nodes which are folders of key/value pairs. Keys are stored in lexical ordering and can be any string not containing a forward slash character (as these are used as path separators).

      Keys beginning with an underscore tell the database that the value is a string, otherwise, the value is a pointer to another node. In this way, the database's garbage collector knows how to traverse the nodes. There is a root node that corresponds to the / path that serves as the root reference of garbage collection scan. Each node also tracks it's size (number of keys it contains) so this can be returned quickly.


    Requests

      The HTTP protocol is used for requests. The path part of the query specifies a path in the database's graph of nodes and the query parameters are used to specify actions on that path (read, write, search, etc).

    Responses

      Responses are sent in JSON format in the HTTP content. On error, a HTTP response status of 500 is sent with the HTTP content containing the error message in JSON format.

    URLs

      For brevity, the "http://host:port" parts of the request URLs fields below are left out and the word "path" is used to mean any forward slash delimited path.


Node Operations

    mkdir

      Creates a node at the specified path. If the path already exists, no operation is performed. Returns null on success.
      sample request: /path?action=mkdir
      sample response: null
      

    rm

      Removes a node at the specified path. Returns an error if the path does not exist. Returns null on success.
      sample request: /path?action=rm&key=foo
      sample response: null
      

    size

      Returns the number of slots at the node in the specified path. Returns an error if the path doesn't exist.
      sample request: /path?action=size
      sample response: 423
      

    link

      Links a node at a source path to a key on a node at the destination path. Returns an error if the path doesn't exist.
      sample request: /sourcePath/?action=link&key=k&toPath=destinationPath
      sample response: 423
      


Value Operations

    read

      Read a value at a given path. Returns the value on success.
      sample request: /path?action=read 
      sample response: "foo"
      

    write

      Sets or appends a value at a specified path. Returns null on success.

      set

      sample request: /path?key=k&action=write&mode=set
      with post: foo
      sample response: null
      

      append

      sample request: /path?action=write&mode=append
      with post: bar
      sample response: null
      


Select Operations

    select

      optional parameters:
        before=key
        after=key
        count=max
        whereKey=k&whereValue=v
        

      op=pairs

        sample request: /path?action=select&op=pairs
        sample response: [["a","1"], ["b","2"], ["c","3"]]
        

      op=keys

        Returns a list of keys whose nodes match the query.
        sample request: /path?action=select&op=keys
        sample response: ["keyA", "keyB", "keyC"]
        

      op=values

        Returns a list of values matching the query.
        sample request: /path?action=select&op=values
        sample response: ["value1", "value2", "value3"]
        

      op=object

        Returns a list of the matching objects in JSON dict.
        sample request: /path?action=select&op=object
        sample response: {"a":"1"; "b":"2"; "c":"3" } ???
        

      op=counts

        sample request: /path?action=select&op=counts
        sample response: ???
        

      op=rm

        Removes keys matching selection. Returns the number of removed keys on success.
        sample request: /path?action=select&op=rm
        sample response: ???
        

Queue Operations

    queuePopTo

      Atomically moves the first sub-node found at the source path into the destination path (using the same key it was under in the source path). Returns the key that the node was under.

      sample request: /sourcePath/?action=queuePopTo&toPath=/destinationPath
      sample response: akey123
      

      optional parameters:

      whereKey=k&whereValue=v 
      ttl=numberOfSeconds
      
      The ttl attribute is a time-to-live value as a number of seconds. When the server pops the node to the new location, it computes the expiration date and creates a _qexpire and _qtime slots on the node which are used by the server when queueExpireTo requests are sent. Expirations on a node are only performed when the queueExpireTo request is sent to it.

    queueExpireTo

      Expires items in the queue by looking at their _expire attribute and moves expired items to the specified path. The _qexpire and _qtime keys are removed after the node is moved. Returns the number of items expired.
      sample request: /sourcePath/?action=queueExpireTo&toPath=/destinationPath
      sample response: 42
      

Transactions

    transaction

      Allows a set of actions to be done in a single request. Returns null on success.
      sample request: /frompath/?action=transaction
      with post: [list of request urls]
      sample response: null