The Sponge Guide#

Welcome to the Sponge Guide!

Sponge is a real-time change data capture and integration service.

While Sponge can be used to build a traditional extract-transform-load (ETL) pipeline between services, it actually thinks about data integration differently. For Sponge, the outcome of data integration is that one datastore (the source) notifies another (the destination) about changes in real-time such that, at any point in time, the destination has all the source’s data and data history.

Of course Sponge allows data to be transformed between source and destination (some of which is done automatically using machine learning). But, the main emphasis is on providing a way for a source to keep a destination in sync by incrementally streaming updates on demand.

Getting Started#

Using Sponge#

TBA

Development#

Developers can use the sponge-framework to build custom integrations for any datastore.

The sponge-framework is located in Cinchapi’s maven-enterprise repository. You will need to use your Cinchapi marketplace credentials to access this repository.

maven {
  url  "http://cinchapi.bintray.com/maven-enterprise"
  credentials {
    username System.getenv('CINCHAPI_MARKETPLACE_USER')
    password System.getenv('CINCHAPI_MARKETPLACE_KEY')
  }
}

dependencies {
  compile group: 'com.cinchapi', name: 'sponge-framework', version: '0.2.0'
}

Important Concepts#

Datastore#

A Sponge in one-way connection between two datastores that facilities real-time syncing of data changes.

Sponge’s notion of a Datastore is very flexible – you can sync data between any combination of databases, APIs and filesystems. Essentially, any service that can return and/or receive data is compatible with Sponge.

Parameters#

Each Datastore requires a different set of information in order to function properly. Therefore, Sponge ensures flexibility by assuming that each Datastore can be constructed using two parameters: a LoggerConfig object that describes the runtime logging settings and a Parameters object that contains a mapping from configuration keys to their values.

At runtime, Sponge collect the information provided to connect to a Datastore and passes it as a Parameters object to the appropriate Datastore class. Each Datastore class is responsible for performing its own internal validation on the parameters to ensure that the required information has been presented.

Because there’s a large discrepancy in the conventions for configuration variable naming conventions, the Parameters interface provides a getIgnoreCaseFormat method that will search for any configuration key across well known case formats (e.g. camelCase, UpperCamelCase, underscore_case, UPPER_UNDERSCORE_CASE, hyphen-case).

Object#

The Sponge data model is built on the Object abstraction.

An Object is simply a mapping from a key (e.g. attribute, property, column, etc) to a collection of values. This conforms very closely to the JSON data format.

Sponge assumes that each Datastore supports this data model.

Id#

An Id is used to uniquely identify objects within a Datastore.

It is necessary that each object is associated with an Id so that Sponge can accurately detect changes and sync information between the two datastores. Since each datastore uses different kinds of ids internally, Sponge’s notion of an Id is a simple byte array containing 16 bytes or less.

As long as a Datastore can express its internal id in this format, Sponge can accurately track changes.