Steve Dekorte and Rich Collins 2009
Creative Commons Attribution License 3.0

The Vertex Database


    VertexDB is a high performance graph database server that supports automatic garbage collection. It uses the HTTP protocol for requests and JSON for it's response data format and the API is inspired by the FUSE filesystem API plus a few extra methods for queries and queues.


    Requests per second are typically network i/o bound and should be in the thousands on OSX, and possibly tens of thousands on Linux (on a typical PC of circa 2008).


    On startup, VertexDB makes a backup of the database (if it exists). When shut down properly, it creates a file that marks the database as being safely stored. When started, if this file is not found, the database file is replaced by it's old copy. There is also an admin backup request that can be sent to the database to store a backup at any time.


    The current way to secure the server is by running it on an isolated network. HTTP auth support will be added in the future.


    Graph databases aren't ideal for all uses, but can be more suitable than other models for dealing with changing or informal schemas and data that is more naturally represented as a graph. It's atomic queue APIs can also make vertexdb useful as a high performance persistent queueing server.


    The current implementation is in C and built on top of:

    • TokyoCabinet, a b-tree key/value disk store
    • libevent, for asynchronous sockets and HTTP
    • Yajl, for JSON generation

    VertexDB uses a single OS thread and handles all requests serially, but uses async sockets so no request is blocked on socket i/o (though it can still block on disk i/o).

    Requests should be generally designed such that responses don't spend too much time blocking on disk i/o or using too much memory in queued response buffers. This can be done by "paging" requests by limiting their results count and repeatedly using a query that selects the next set of matches.


    Possible future features:

    • FUSE interface
    • distributed redundant servers
    • incremental garbage collection
    • distributed synced in-memory transactions
    • automatic index creation
    • a NodeJS implementation which uses JS for queries?


    Keys, Values and Terminology

      VertexDB is composed of nodes which are folders of key/value pairs. Keys are stored in lexical ordering and can be any string not containing a forward slash character (as these are used as path separators).

      Keys beginning with an underscore tell the database that the value is a string, otherwise, the value is a pointer to another node. In this way, the database's garbage collector knows how to traverse the nodes. There is a root node that corresponds to the / path that serves as the root reference of garbage collection scan. Each node also tracks it's size (number of keys it contains) so this can be returned quickly.


      The HTTP protocol is used for requests. The path part of the query specifies a path in the database's graph of nodes and the query parameters are used to specify actions on that path (read, write, search, etc).


      Responses are sent in JSON format in the HTTP content. On error, a HTTP response status of 500 is sent with the HTTP content containing the error message in JSON format.


      For brevity, the "http://host:port" parts of the request URLs fields below are left out and the word "path" is used to mean any forward slash delimited path.

Node Operations


      Creates a node at the specified path. If the path already exists, no operation is performed. Returns null on success.
      sample request: /path?action=mkdir
      sample response: null


      Removes a node at the specified path. Returns an error if the path does not exist. Returns null on success.
      sample request: /path?action=rm&key=foo
      sample response: null


      Returns the number of slots at the node in the specified path. Returns an error if the path doesn't exist.
      sample request: /path?action=size
      sample response: 423


      Links a node at a source path to a key on a node at the destination path. Returns an error if the path doesn't exist.
      sample request: /sourcePath/?action=link&key=k&toPath=destinationPath
      sample response: 423

Value Operations


      Read a value at a given path. Returns the value on success.
      sample request: /path?action=read 
      sample response: "foo"


      Sets or appends a value at a specified path. Returns null on success.


      sample request: /path?key=k&action=write&mode=set
      with post: foo
      sample response: null


      sample request: /path?action=write&mode=append
      with post: bar
      sample response: null

Select Operations


      optional parameters:


        sample request: /path?action=select&op=pairs
        sample response: [["a","1"], ["b","2"], ["c","3"]]


        Returns a list of keys whose nodes match the query.
        sample request: /path?action=select&op=keys
        sample response: ["keyA", "keyB", "keyC"]


        Returns a list of values matching the query.
        sample request: /path?action=select&op=values
        sample response: ["value1", "value2", "value3"]


        Returns a list of the matching objects in JSON dict.
        sample request: /path?action=select&op=object
        sample response: {"a":"1"; "b":"2"; "c":"3" } ???


        sample request: /path?action=select&op=counts
        sample response: ???


        Removes keys matching selection. Returns the number of removed keys on success.
        sample request: /path?action=select&op=rm
        sample response: ???

Queue Operations


      Atomically moves the first sub-node found at the source path into the destination path (using the same key it was under in the source path). Returns the key that the node was under.

      sample request: /sourcePath/?action=queuePopTo&toPath=/destinationPath
      sample response: akey123

      optional parameters:

      The ttl attribute is a time-to-live value as a number of seconds. When the server pops the node to the new location, it computes the expiration date and creates a _qexpire and _qtime slots on the node which are used by the server when queueExpireTo requests are sent. Expirations on a node are only performed when the queueExpireTo request is sent to it.


      Expires items in the queue by looking at their _expire attribute and moves expired items to the specified path. The _qexpire and _qtime keys are removed after the node is moved. Returns the number of items expired.
      sample request: /sourcePath/?action=queueExpireTo&toPath=/destinationPath
      sample response: 42



      Allows a set of actions to be done in a single request. Returns null on success.
      sample request: /frompath/?action=transaction
      with post: [list of request urls]
      sample response: null