Getting started with Gnocchi

Jonathan MathewsPublic

Getting started with Gnocchi

Gnocchi is an open source time series database created in 2014 when OpenStack was looking for a highly scalable, fault-tolerant time series database that did not depend on a specialized database (e.g., Hadoop, Cassandra, etc.).

Gnocchi was originally built inside OpenStack, but later moved out of the project, because it was built to be platform-agnostic. Even so, Gnocchi is still used as the main time series backend by this cloud platform; for example, OpenStack Ceilometer leverages Gnocchi’s large scalability and high-availability properties to ensure its telemetry is always up and fast.

The problem that Gnocchi solves is storage and indexing of time series data and resources at large scale. Modern cloud platforms are not only huge, but they also are dynamic and potentially multi-tenant. Gnocchi takes all of that into account.

Aggregation

Gnocchi takes a unique approach to time series storage: Rather than storing raw data points, it aggregates them before storing them. This built-in feature is different from most other time series databases, which usually support this mechanism as an option and compute aggregation (average, minimum, etc.) at query time.

Because Gnocchi computes all the aggregations at ingestion, getting the data back is extremely fast, as it just needs to read back the pre-computed results.

The way those data points are aggregated is configurable on a per-metric basis, using an archive policy. An archive policy defines which aggregations to compute and how many aggregates to keep. Gnocchi supports a wild number of aggregation methods, such as minimum, maximum, average, Nth percentile, standard deviation, etc. Those aggregations are computed over a period of time (called granularity) and are kept for a defined timespan. Aggregates are stored in a compressed format, ensuring the data points take as little space as possible.

Full Article