mongoDB sharding

mongoDB Sharding

Why - 横向拓展

Sharding in MongoDB is designed to do just that: partition your database into smaller pieces so that no single machine has to store all the data or handle the entire load

Overview

MongoDB uses sharding to support deployments with very large data sets and high throughput operations

scaling , 但是会有administrative and performance overhead。

  • storage distribution
  • load distribution
Screen Shot 2020-11-08 at 9.19.44 PM

主要的组件:

  • shards
  • mongos routers
  • config servers

shards用来存储数据;只有mongos router可以直接连接shards,mongos router 缓存了cluster metadata 这样可以route write/read 到正确到的shard 上;config server 作用 - persist shard的 meta data

Sharding Strategy

  • Hashed Sharding
Screen Shot 2020-11-08 at 10.57.29 PM

​ hash based shard 能够确保evenly distributed,但是range based query 很少能落到同一个shard,意味着需要更多的 cluster wider broadcast

  • ranged sharding

    Screen Shot 2020-11-08 at 11.09.11 PM

    A range of shard keys whose values are “close” are more likely to reside on the same chunk.

    因此要注意sharding key的选取策略。

Sharding Type

  • 在database 层面
  • 在collection层面 (常用)

collection sharding:

根据sharding key 来讲doc 进行逻辑分组。

// database name cloud-docs
sh.enableSharding("cloud-docs")

// cloud-docs.spreadsheets collection , 以username+_id compound key作为sharding key
sh.shardCollection("cloud-docs.spreadsheets", {username: 1, _id: 1})
Screen Shot 2020-11-08 at 9.29.36 PM

splits and migration

保持cluster balanced

Splitting is the process of dividing a chunk into two smaller chunks. This happens when a chunk exceeds the maximum chunk size, currently 64 MB by default.

Migrating is the process of moving chunks between shards. When some shards have significantly more chunks than others, this triggers something called a migration round.

query routing

如果query 包含了 shard key, mongos 能够非常容易地得出对应的shards 信息;如果query 没有包含shard key, 则需要遍历所有shard

Screen Shot 2020-11-08 at 9.50.10 PM

Indexing

Each shard maintains its own indexes.

Choose shard key

主要是针对range sharding 的策略。

主要是防止hotspots,也就是防止数据聚集到同一shard

One is how reads are targeted, the second is how well writes are distributed, and the last is how effectively MongoDB can split and migrate chunks in your collection.

比如objectID, ascending 这样new doc 都会到最大的trunk, 比如username ,粒度太粗可能导致很多doc 聚集到同一shard。 另外要根据application的需求,也就是query 会通常包含那个key,如果query key 不包含sharding key 会需要到所有的shards 上来check。

一个比较合理的shard key: username + id, 通常read query 会包含username,并且username alphabet 排序可以较为平均地写分布;id是uniquer 因此保证shard 粒度


https://docs.mongodb.com/manual/sharding/