系统设计之CAP

系统设计之CAP

一个分布式系统最多只能同时满足一致性(Consistency)、可用性(Availability)和分区容错性(Partition tolerance)这三项中的两项

Consistency 一致性

all nodes see the same data at the same time

Consistency means that all clients see the same data at the same time, no matter which node they connect to. For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed ‘successful.’

一致性是因为多个数据拷贝下并发读写才有的问题

  • 强一致性

    关系型数据库,要求更新过的数据能被后续的访问都能看到,这是强一致性

  • 弱一致性 (不推荐)

  • 最终一致性

最终一致性

  1. Your data is replicated on multiple servers
  2. Your clients can access any of the servers to retrieve the data
  3. Someone writes a piece of data to one of the servers, but it wasn't yet copied to the rest
  4. A client accesses the server with the data, and gets the most up-to-date copy
  5. A different client (or even the same client) accesses a different server (one which didn't get the new copy yet), and gets the old copy

Basically, because it takes time to replicate the data across multiple servers, requests to read the data might go to a server with a new copy, and then go to a server with an old copy. The term "eventual" means that eventually the data will be replicated to all the servers, and thus they will all have the up-to-date copy.

eventual consistent databases can implement their read/write operations at a lower latency than strongly consistent databases.

https://distributedthoughts.com/2013/09/08/eventual-consistency/

Availability 可用性

Availability means that that any client making a request for data gets a response, even if one or more nodes are down. Another way to state this—all working nodes in the distributed system return a valid response for any request, without exception.

Reads and writes always succeed, 即服务在正常响应时间内一直可用。

Partition Tolerance分区容错性

A partition is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes. Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.

the system continues to operate despite arbitrary message loss or failure of part of the system”,即分布式系统在遇到某节点或网络分区故障的时候,仍然能够对外提供满足一致性或可用性的服务。


通过CAP理论,我们知道无法同时满足一致性、可用性和分区容错性这三个特性

对于追求高可用高并发并且允许分区AP,需可能要舍弃C(最终一致性)

但对于涉及到钱的,C则是首位,CA,舍弃P,例如网络故障只允许读不允许写。


CAP NOSQL

NOSQL DB是分布式系统中比较理想的选择,相较与关系数据库更容易横向扩展。

CP

consistency and partition at the expense of availability

When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved.

eg Mongo

single-master , 默认情况mater/primary 接受读和写请求,即默认是强一致性

(也可以配置read preference从secondary node 读取数据)

可用性牺牲:当primary 不可用,会从secondary node中选举新的master(with most recent operation log). 当其他所有secondary 和新的master数据同步后,cluster重新可用。但是从就primary down到新primary 选举期间,db不可用(至少不可写)。

AP

availability and partition tolerance at the expense of consistency

When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. (When the partition is resolved, the AP databases typically resync the nodes to repair all inconsistencies in the system.)

eg: couchDB eventual consistency, llowing clients to write to any nodes at any time and reconciling inconsistencies as quickly as possible.

CA

consistency and availability, without ault tolerance

但事实情况是,在分布式系统中Partition是不可避免的,因此理论上CA通常是不存在的。 实际中关系数据库可以认为是CA,即通过多个节点部署做数据同步。

Screen Shot 2020-10-15 at 5.28.28 PM

Understanding the CAP theorem can help you choose the best database when designing a microservices-based application running from multiple locations. For example, if the ability to quickly iterate the data model and scale horizontally is essential to your application, but you can tolerate eventual (as opposed to strict) consistency, an AP database like Cassandra or Apache CouchDB can meet your requirements and simplify your deployment. On the other hand, if your application depends heavily on data consistency—as in an eCommerce application or a payment service—you might opt for a relational database like PostgreSQL.


https://www.ibm.com/cloud/learn/cap-theorem

https://robertgreiner.com/cap-theorem-revisited/

https://medium.com/@bikas.katwal10/mongodb-vs-cassandra-vs-rdbms-where-do-they-stand-in-the-cap-theorem-1bae779a7a15

https://towardsdatascience.com/cap-theorem-and-distributed-database-management-systems-5c2be977950e

https://towardsdatascience.com/cap-theorem-and-distributed-database-management-systems-5c2be977950e