Scylla Documentation Logo Documentation
  • Server
    • Scylla Open Source
    • Scylla Enterprise
    • Scylla Alternator
  • Cloud
    • Scylla Cloud
    • Scylla Cloud Docs
  • Tools
    • Scylla Manager
    • Scylla Monitoring Stack
    • Scylla Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
Download
Menu

Caution

You're viewing documentation for an unstable version of Scylla Dev. Switch to the latest stable version.

Scylla Dev Design Notes Parallel aggregations

Parallel aggregations¶

This document describes the design of a mechanism that parallelizes execution of aggregation queries.

Architecture overview¶

New level of coordination was added. Node called super-coordinator splits aggregation query into sub-queries and distributes them across some group of coordinators. Super-coordinator is also responsible for merging results.

Detection¶

To develop a mechanism for speeding up count(*) queries, there was a need to detect which queries have a count(*) selector. Due to this pull request being a proof of concept, detection was realized rather poorly. It is only allows catching the simplest cases of count(*) queries (with only one selector and no column name specified).

Delegation¶

After detecting that a query is a count(*) it should be split into sub-queries and sent to another coordinators. Splitting part wasn’t that difficult, it has been achieved by limiting original query’s partition ranges. Sending modified query to another node was much harder. The easiest scenario would be to send whole cql3::statements::select_statement. Unfortunately cql3::statements::select_statement can’t be [de]serialized, so sending it was out of the question. Even more unfortunately, some non-[de]serializable members of cql3::statements::select_statement are required to start the execution process of this statement. Finally, I have decided to send a query::read_command paired with required [de]serializable members. Objects, that cannot be [de]serialized (such as query’s selector) are mocked on the receiving end.

Distributing¶

When a super-coordinator receives a count(*) query, it splits it into sub-queries. It does so, by splitting original query’s partition ranges into list of vnodes, grouping them by their owner and creating sub-queries with partition ranges set to successive results of such grouping. After creation, each sub-query is sent to the owner of its partition ranges. Owner dispatches received sub-query to all of its shards. Shards slice partition ranges of the received sub-query, so that they will only query data that is owned by them. Each shard becomes a coordinator and executes so prepared sub-query.

PREVIOUS
Paged queries
NEXT
Per-partition rate limiting
  • master
    • 4.6
    • 4.5
  • Scylla Developer Documentation
  • Alternator: DynamoDB API in Scylla
    • Getting Started With ScyllaDB Alternator
    • Scylla Alternator for DynamoDB users
  • Design Notes
    • IDL compiler
    • CDC
    • The Compaction Controller
    • Scylla CQL extensions
    • Scylla CQL extensions
    • CQL3 Type Mapping
    • Hinted Handoff Design
    • Performance Isolation in Scylla
    • CQL to Lua type mapping
    • Scylla Metrics
    • Migrating from users to roles
    • Paged queries
    • Parallel aggregations
    • Per-partition rate limiting
    • Protocol extensions to the Cassandra Native Protocol
    • Ports and protocols in Scylla
    • Intro
    • Raft application in Scylla
    • The group 0
    • Using Group 0 to perform schema changes
    • Redis API in Scylla
    • Repair based node operations
    • Reverse reads
    • Row Cache
    • Row level repair
    • Rust and C++
    • Secondary indexes in Scylla
    • Service Level Distributed Data
    • File format of the Scylla.db sstable component
    • sstables directory structure
    • System keyspace layout
    • Virtual tables in the system keyspace
    • System schema keyspace layout
    • WASM support for user-defined functions
  • Guides
    • Scylla RESTful API V2
    • Building Scylla
    • Debugging with GDB
    • Docker Hub Image
    • Logging in Scylla
    • Testing
    • Tracing
    • Virtual Tables
  • Contribute
    • Contributing to Scylla
    • Backport
    • Maintainer’s handbook
    • Review Checklist
  • Create an issue
  • Edit this page

On this page

  • Parallel aggregations
    • Architecture overview
    • Detection
    • Delegation
    • Distributing
Logo
Docs Contact Us About Us
Mail List Icon Slack Icon
© 2022, ScyllaDB. All rights reserved.
Last updated on 28 June 2022.
Powered by Sphinx 4.3.2 & ScyllaDB Theme 1.2.2