Data-Driven Decisions: How to Select the Right Database for Your Metrics

Roman Glushach
10 min readJun 3, 2023

--

Time-series databases

Data is the crucial of any business, and metrics are the vital signs that show how well it is performing. Collecting data based on your company’s key performance indicators (KPIs) and transforming that data into actionable insights can help you make objective decisions based on facts instead of biases.

However, to make the most of your data, you need to choose the right database to store and analyze it. There are many factors to consider, such as scalability, performance, security, and cost.

The Importance of Choosing the Right Database

The data storage system is the heart of the design, and it’s crucial to choose the right database for your metrics.

A general-purpose database, such as MySQL, is not recommended for storing time-series data.

While a relational database can support time-series data in theory, it would require expert-level tuning to make it work at scale.

Moreover, a general-purpose relational database does not perform well under constant heavy write load.

On the other hand, while a few NoSQL databases on the market could handle time-series data effectively, using a general-purpose NoSQL database is not appealing

Therefore, it’s essential to choose a database that is optimized for time-series data.

How to Choose the Right Database

Data Access Pattern

Understanding your data access pattern is crucial when selecting a database.

The write load is heavy, and there can be many time-series data points written at any moment. At the same time, the read load is spiky, and depending on the access patterns of the graphs and alerts, the read volume could be bursty.

Therefore, you need a database that can handle heavy write loads and bursty read loads.

Data Storage System

A database optimized for time-series data is essential to ensure that you can store and analyze your data effectively.

Scalability

Your database should be scalable to handle your growing data needs. As your business grows, your data needs will increase, and you need a database that can scale with your business.

Ease of Use

Your database should be easy to use and maintain. You don’t want to spend a lot of time and resources on database administration.

Cost

The cost of the database is also an important factor to consider. You want a database that provides good value for money.

Possible Problems and Solutions

When using a database for your metrics, you may encounter some problems

Heavy Write Load

If you have a heavy write load, you need a database that can handle it. One solution is to use a database that supports sharding.

Sharding is the process of splitting a database into smaller, more manageable pieces called shards. Each shard can handle a portion of the write load, allowing you to scale horizontally.

Bursty Read Load

If you have a bursty read load, you need a database that can handle it. One solution is to use a database that supports caching. Caching is the process of storing frequently accessed data in memory, allowing you to serve read requests faster.

Scalability

If you need to scale your database, you need a database that can scale horizontally. One solution is to use a database that supports sharding.

Ease of Use

If you want a database that is easy to use and maintain, you may want to consider a managed database service. Managed database services take care of database administration tasks, such as backups and software updates, allowing you to focus on your business.

Use Cases

Monitoring and Alerting

Time-series databases are ideal for monitoring and alerting systems. You can store your metrics in a time-series database and use it to generate alerts when certain thresholds are exceeded.

IoT Data

Time-series databases are also ideal for storing IoT data. IoT devices generate a lot of data, and time-series databases can handle the heavy write load.

Financial Data

Time-series databases are also used for storing financial data. Financial data is time-sensitive, and time-series databases can handle the high frequency of data.

Time-based Databases

InfluxDB is the most popular and most used time-series database in the world. It is an open-source database that provides functionality for cheaply evicting large sets of data and constantly summarizing that data at scale. Timescale is an open-source relational database that makes SQL scalable for time-series data. Prometheus is another popular time series database that is widely used in systems like Kubernetes. Other databases like DataStax, QuestDB, AWS Timestream, Aerospike, Raima Database Manager (RDM), Graphite, and OpenTSDB are also commonly used for time series data management.

InfluxDB

Use cases

  • InfluxDB is used as a time-series database for business intelligence, system statistics, and infrastructure monitoring
  • is used for monitoring applications, storing large amounts of metrics, and analyzing data in real-time, which allows developers to build more reliable software and give their users a better experience
  • InfluxDB is used for monitoring virtual machines, containers, services, and applications
  • is used for financial market data and machine learning

Advantages

  • InfluxDB is a purpose-built time series database, which means it is optimized for time-stamped data
  • It is built from the ground up to be a time series database, which makes it unique
  • is part of a comprehensive platform that supports the collection, storage, monitoring, visualization, and alerting of time series data
  • is open-source and has a large community of contributors
  • has a high-speed read-and-write capability

Disadvantages

  • InfluxDB is not a relational database, which may limit how it can be used by developers
  • non-SQL strategy isolates the database and limits how it can be used by developers
  • has limited tools for backup, restore, and internal monitoring
  • may not be suitable for applications that require complex queries or transactions

Timescale

Use cases

  • monitoring software systems: virtual machines, containers, services, applications
  • monitoring physical systems: equipment
  • analyzing real-time flight data
  • building platforms for smarter cities
  • analyzing blockchain data

Advantages

  • TimescaleDB can scale to billions of rows on PostgreSQL while maintaining high, constant insert rates
  • allows users to store their relational metadata and time-series together in the same database, query them together using time-series-optimized SQL, and continue to use SQL for complex queries
  • is built on top of PostgreSQL, which means it inherits all of PostgreSQL’s features, including reliability, versatility, and security
  • provides new time-oriented SQL functions so that any user of SQL can work with time-series data without having to learn a new language
  • is open-source and has a large developer community

Disadvantages:

  • TimescaleDB is not as widely used as other databases, which means there may be a smaller pool of developers with experience in it
  • may require more setup and configuration than other databases
  • may not be the best choice for data warehousing

Prometheus

Use Cases

  • monitoring virtual machines, containers, services, and applications
  • gathering numeric metrics from a service that runs constantly
  • monitoring distributed, cloud-native applications

Advantages

  • multi-dimensional data model
  • flexible query language called PromQL to query the metrics data collected
  • pull model data collection over HTTP
  • alerting system
  • good for monitoring any kind of time-series data

Disadvantages

  • Subpar UI compared to other monitoring tools
  • requires additional instrumentation in the code to monitor applications and services
  • not as popular as InfluxDB for monitoring

DataStax

Use cases

  • Data Warehousing: wide-column databases, which are a type of time-based database, are optimized for data warehousing and business intelligence applications, where large amounts of data need to be analyzed and aggregated
  • Real-time Analytics: time-based databases can handle high-volume, high-velocity data and a high number of concurrent users, which makes them a choice for real-time analytics applications
  • Big Data: time-based databases can handle large datasets and provide efficient storage and retrieval of data, therefore, can be used for big data applications
  • Metrics and Events: time-based databases are built specifically for handling metrics and events or measurements that are time-stamped

Advantages

  • Optimized for Time-Stamped Data: time-based databases are built specifically for handling time-stamped data, which makes them more efficient and effective for certain use cases
  • Efficient Storage and Retrieval: time-based databases are optimized for measuring change over time and can efficiently store and retrieve large amounts of time-stamped data

Disadvantages

  • Limited Functionality: time-based databases are optimized for handling time-stamped data, which means they may not be as effective for other types of data
  • Scaling Challenges: some time-based databases may have scaling challenges, which can make it difficult to handle large amounts of data

QuestDB

Use cases

  • monitoring infrastructure, real-time analytics, dashboards, sensor data, application metrics, and financial market data

Advantages

  • High performance: QuestDB has been praised for its high performance, with the ability to ingest large amounts of data and query it
  • SQL support: QuestDB implements SQL and exposes a Postgres wire protocol, making it easy to use for those familiar with SQL
  • Column-based storage model: QuestDB implements a highly optimized column-based storage model, which allows for high throughput ingestion and faster query speeds
  • Relational modeling for timeseries data: QuestDB supports relational modeling for timeseries data, meaning you can write joins and use SQL queries to read your data

Disadvantages

  • Lack of replication: At the time of writing, QuestDB lacks replication, which can be a drawback for some use cases
  • Difficulty in setup: Compared to relational or NoSQL databases, time series databases like QuestDB can be more difficult to set up and use

AWS Timestream

Use cases

  • IoT applications: Timestream is designed to store and analyze trillions of events per day, making it ideal for IoT applications that generate large amounts of time-series data
  • Operational applications: Timestream can also be used for operational applications that require real-time analytics, such as monitoring and troubleshooting systems

Advantages

  • High performance at low cost: Timestream is designed to be fast and scalable, with a purpose-built query engine that allows for quick access to data. It is also serverless, which means that users only pay for what they use
  • Simplified data access: allows users to access recent and previous data using a single tool, eliminating the need for multiple tools
  • Purpose-built for time series: has built-in time series functions for smoothing, approximation, and interpolation, making it easier to analyze time-series data
  • Storage tiering: simplifies data lifecycle management with storage tiers, including a memory store for recent data and a magnetic store for historical data

Disadvantages

  • Limited functionality: Timestream is designed specifically for time-series data and does not support some of the features of traditional relational databases, such as joins and transactions
  • Slow performance: in a comparison with TimescaleDB, Timestream was found to be slower and more expensive
  • Missing key database capabilities: lacks some key database capabilities such as backups, restores, updates, and deletes

Aerospike

Use Cases

  • Real-time bidding: Aerospike can be used to store and retrieve data records that are part of a “time series,” making it ideal for real-time bidding applications
  • Ad targeting: Aerospike can be used to store and retrieve user data in real-time, allowing for more accurate ad targeting
  • Fraud detection: Aerospike can be used to store and retrieve data related to financial transactions, making it ideal for fraud detection
  • IoT: Aerospike can be used to store and retrieve data from IoT devices in real-time, allowing for more efficient and effective management of these devices

Advantages

  • High throughput and low latency: Aerospike’s high throughput and low latency make it an ideal cache replacement
  • Reduced server footprint: Aerospike can reduce server footprint by up to 80%
  • Real-time data: Aerospike provides real-time data, making it ideal for applications that require up-to-date information
  • Scalability: Aerospike is highly scalable, allowing it to handle large amounts of data

Disadvantages

  • Complexity: Aerospike can be complex to set up and manage, requiring specialized knowledge and expertise
  • Cost: Aerospike can be expensive, especially for large-scale deployments
  • Limited functionality: Aerospike is designed for specific use cases, and may not be suitable for applications that require more complex functionality
  • Limited community support: Aerospike has a smaller community than some other databases, which may make it more difficult to find support and resources

Raima Database Manager (RDM)

Use cases

  • is designed for use in embedded systems applications
  • can be used to access real-time information from sensors and devices to improve assembly inventory management and automate preventive maintenance reports

Advantages

  • decreased time to market
  • faster and more performant application overall
  • lower maintenance costs
  • RDM is an ACID-compliant embedded database management system designed for use in embedded systems applications
  • it has been designed to utilize multi-core computers, networking (local or wide area), and on-disk or in-memory storage management
  • provides support for query and report writing utility for a network model database
  • is fully customizable, lightweight, and reliable

Disadvatantages

  • lack of UI for monitoring REST API
  • costly

Graphite

Use cases

  • Graphite is useful for monitoring computer system performance
  • can be used to track and display metrics such as access metrics, failure metrics, process behavior, workload monitoring, and other types of analytics data
  • is also used for monitoring server metrics, application performance monitoring, sensor data, network data, and click rates

Advantages

  • Graphite is free and open-source
  • it is easy to install and configure
  • can handle large amounts of data and can sort through complex data
  • it is optimized for measuring change over time

Disadvantages

  • Graphite is not ideal for managing time series data that requires fast insertion and retrieval, as it takes longer to insert and retrieve time series data than other databases
  • does not have built-in collection, storage, visualization, and exporting capabilities like other time-series databases such as Prometheus
  • is not a purpose-built time-series database and may not be as efficient as other databases for managing time series data

OpenTSDB

Use cases

  • Software Systems Monitoring: OpenTSDB can be used to monitor virtual machines, containers, services, and applications
  • Time Series Analysis: OpenTSDB can provide a probability value on potential future outcomes rather than the actual project value
  • Concurrent Inputs: OpenTSDB can manage concurrent inputs

Advantages

  • Scalability: OpenTSDB is a distributed database that can scale horizontally
  • Compatibility: is compatible with HBase and can be used with Hadoop
  • User Interface: has a built-in user interface and allows direct access to time series data via a REST interface

Disadvantages

  • Complexity: OpenTSDB can be complex to set up and maintain
  • Limited Query Flexibility: allows a fixed query structure, which can limit query flexibility
  • Limited Data Access: is not designed for storing non-time series data

Final words

Choosing the right database for your metrics is crucial to ensure that you can store and analyze your data effectively. When selecting a database, consider your data access pattern, the data storage system, scalability, ease of use, and cost. Time-series databases are ideal for monitoring and alerting systems, IoT data, and financial data.

--

--

Roman Glushach
Roman Glushach

Written by Roman Glushach

Senior Software Architect & Engineer Manager at Freelance

No responses yet