CAP Paper

纯属个人学习时翻译，英语水平有限,可能会有用词语意不准的情况，可看原文。

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Abstract

When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.

当设计分布式web服务时，通常有三个特性是被渴望的:一致性，可用性以及分区容错性。同时实现这三个特性是不可能的，我们在异步网络模型中证明这个猜想，然后在部分同步模型中，讨论这种窘境的解决方案。

Introduction

At PODC 2000, Brewer , in an invited talk , made the following conjecture: it is impossible for a web service to provide the following three guarantees:

Consistency

Availability

Partition-tolerance

All three of these properties are desirable – and expected – from real-world web services. In this note, we will ﬁrst discuss what Brewer meant by the conjecture; next we will formalize these concepts and prove the conjecture;ﬁnally, we will describe and attempt to formalize some real-world solutions to this practical diﬃculty.

Most web services today attempt to provide strongly consistent data. There has been signiﬁcant research designing ACID databases, and most of the new frameworks for building distributed web services depend on these databases. Interactions with web services are expected to behave in a transactional manner: operations commit or fail in their entirety (atomic), committed transactions are visible to all future transactions (consistent), uncommitted transactions are isolated from each other (isolated), and once a transaction is committed it is permanent (durable). It is clearly important, for example, that billing information and commercial transaction records be handled with this type of strong consistency.

Web services are similarly expected to be highly available. Every request should succeed and receive a response. When a service goes down, it may well create signiﬁcant real-world problems; the classic example of this is the potential legal diﬃculties should the E-Trade web site go down. This problem is exacerbated by the fact that a web-site is most likely to be unavailable when it is most needed. The goal of most web services today is to be as available as the network on which they run: if any service on the network is available, then the web service should be accessible.

Finally, on a highly distributed network, it is desirable to provide some amount of fault-tolerance. When some nodes crash or some communication links fail, it is important that the service still perform as expected. One desirable fault tolerance property is the ability to survive a network partitioning into multiple components. In this note we will not consider stopping failures, though in some cases a stopping failure can be modeled as a node existing in its own unique component of a partition.

在 PODC 2000 ,Brewer在一次受邀讲演中做了如下的猜想:对于一个web服务来说提供下面三个保证是不可能的

一致
可用
分区容错

所有的这三个特性都被现实世界中的web服务所渴求并期望。在这篇论文中，我们首先讨论Brewer猜想的含义，然后形式化这些概念并且证明这个猜想，最后我们会描述并尝试对这一实际的困难给出一个现实世界中的解决方案。

现在的大部分web服务都试图提供强一致的数据，在设计ACID数据库方面已经有了重要的研究，而且大部分新的构建分布式web服务的框架依赖于这些数据库。web服务之间的交互被期望以事务的形式表现:操作要么全部提交要么全部失败(原子性)，已提交的事务对将来的事务来说都是可见的(一致性)，未提交的事务和其他的每个事务都隔离(隔离性),一旦事务已提交，就不变化(持久性)。这些特性显然很重要，举个例子，数亿信息和商业交易记录都在这种强一致的情况下被处理。

Web服务同样被期望高可用。任意一个请求都应该成功并接收到响应。一个服务当机可能会产生有显著影响的现实世界问题，典型例子是电子贸易网站宕机可能带来的法律问题。当一个网站在最需要web服务的时候，它很可能是不可用的，这一事实加剧了这个问题。当今大多数web服务的目标是与它们运行的网络一样可用:如果网络上有任何服务可用，那么web服务应该是可访问的。

最后，在一个高度分布的网络中，提供一定程度容错是理想的。当一些结点当机或者一些通信链路失效，服务的如期运行是重要的。一个理想的容错特性是在网络划分多个组件时生存下来的能力。在这篇论文中，我们不考虑stopping failures,通过一些情景一个stopping failure可以建模为节点存在在分区中唯一的一个组件中。

Formal Model

In this section, we will formally deﬁne what is meant by the terms consistent, available, and partition tolerant.

这一部分我们正式定义术语consistent,available和partition tolerant的含义

Atomic Data Objects

The most natural way of formalizing the idea of a consistent service is as an atomic data object. Atomic , or linearizable, consistency is the condition expected by most web services today. Under this consistency guarantee, there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time. One important property of an atomic read/write shared memory is that any read operation that begins after a write operation completes must return that value, or the result of a later write operation. This is the consistency guarantee that generally provides the easiest model for users to understand, and is most convenient for those attempting to design a client application that uses the distributed service. See for a more complete deﬁnition of atomic consistency.

把一致服务的概念作为 atomic data object，是最自然的形式化方式。原子性或可串行性，一致性被大部分现今的web服务所期望。在一致性的保证下，there must exist a total order on all operations such that each operation looks as if it were completed at a single instant。这等价于要求分布式共享内存中的请求表现的就像在一个结点上执行一样,一次响应一个操纵。共享内存中一个原子的读/写的一个重要的特性是任意一个在写操作完成后的读操作都必须返回那个值或者后续的写操作的结果。这是一致性保证，通常为用户提供了最容易理解的模型，对于那些试图设计使用分布式服务的客户端应用程序的人来说，这是最方便的。

Available Data Objects

For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response. That is, any algorithm used by the service must eventually terminate. In some ways this is a weak deﬁnition of availability: it puts no bound on how long the algorithm may run before terminating, and therefore allows unbounded computation. On the other hand, when qualiﬁed by the need for partition tolerance, this can be seen as a strong deﬁnition of availability: even when severe network failures occur, every request must terminate.

对于一个持续可用的分布式系统，每一个被没有故障结点接收到的请求必须有响应结果。意思就是任何服务使用的算法最终都能终止。在某些方面，这是可用性的弱定义，它没有定义算法在终止之前的运行时间的边界，因此允许无边界的计算。在其他方面，当符合分区容错要求时，这可以视为可用性的强定义，甚至当发生服务的网络失败时，每个请求必须终止。

Partition Tolerance

The above deﬁnitions of availability and atomicity are qualiﬁed by the need to tolerate partitions. In order to model partition tolerance, the network will be allowed to lose arbitrarily many messages sent from one node to another. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost. (And any pattern of message loss can be modeled as a temporary partition separating the communicating nodes at the exact instant the message is lost.)The atomicity requirement ( § 2.1) therefore implies that every response will be atomic, even though arbitrary messages sent as part of the algorithm might not be delivered. The availability requirement ( § 2.2) implies that every node receiving a request from a client must respond, even though arbitrary messages that are sent may be lost. Note that this is similar to wait-free termination in a pure shared-memory system: even if every other node in the network fails (i.e. the node is in its own unique component of the partition), a valid (atomic) response must be generated. No set of failures less than total network failure is allowed to cause the system to respond incorrectly.5

上述定义的可用性和原子性符合分区容错的要求。为了建模分区容错，网络被允许任意的丢弃从一个节点发送到另一个节点的多个消息。当网络是分区的，一个

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web