Quantum Nonlocality without Entanglement
Abstract
We exhibit an orthogonal set of product states of two threestate particles that nevertheless cannot be reliably distinguished by a pair of separated observers ignorant of which of the states has been presented to them, even if the observers are allowed any sequence of local operations and classical communication between the separate observers. It is proved that there is a finite gap between the mutual information obtainable by a joint measurement on these states and a measurement in which only local actions are permitted. This result implies the existence of separable superoperators that cannot be implemented locally. A set of states are found involving three twostate particles which also appear to be nonmeasurable locally. These and other multipartite states are classified according to the entropy and entanglement costs of preparing and measuring them by local operations.
pacs:
03.67.Hk, 03.65.Bz, 03.67.a, 89.70.+cI Introduction
The most celebrated manifestations of quantum nonlocality arise from entangled states—states of a compound quantum system that admit no description in terms of states of the constituent parts. Entangled states, by their experimentally confirmed violations of Belltype inequalities, provide strong evidence for the validity of quantum mechanics, and they can be used for novel forms of information processing, such as quantum cryptography [ekert], entanglementassisted communication [bfs, cleve98], and quantum teleportation [bbcjpw], and for fast quantum computations[PS, Grover], which pass through entangled states on their way from a classical input to a classical output. A related feature of quantum mechanics, also giving rise to nonclassical behavior, is the impossibility of cloning[WZ] or reliably distinguishing nonorthogonal states. Quantum systems that for one reason or another behave classically (e.g., because they are of macroscopic size or are coupled to a decohering environment) can generally be described in terms of a set of orthogonal, unentangled states.
In view of this, one might expect that if the states of a quantum system were limited to a set of orthogonal product states, the system would behave entirely classically, and would not exhibit any nonlocality. In particular, if a compound quantum system, consisting of two parts and held by separated observers (Alice and Bob), were prepared by another party in one of several mutually orthogonal, unentangled states, unknown to Alice and Bob, then it ought to be possible to reliably discover which state the system was in by locally measuring the separate parts. Also, it ought to be possible to clone the state of the whole by separately duplicating the state of each part. We show that this is not the case, by exhibiting sets of orthogonal, unentangled states of twoparty and threeparty systems such that

the states can be reliably distinguished by a joint measurement on the entire system, but not by any sequence of local measurements on the parts, even with the help of classical communication between the observers holding the separate parts;

the cloning operation cannot be implemented by any sequence of local operations and classical communication.
Some of the features of this new kind of nonlocality appeared first in [PV], which presented a set of orthogonal states of a bipartite system that cannot be cloned if Alice and Bob cannot communicate at all. However, the states in [PV] can be cloned if Alice and Bob use oneway classical communication.
Many more of the nonlocal properties considered in the present work were anticipated by the measurement protocol introduced by Peres and Wootters[PW]. Their construction indicates the existence of a nonlocality dual to that manifested by entangled systems: entangled states must be prepared jointly, but exhibit anomalous correlations when measured separately; the PeresWootters states are unentangled, and can be prepared separately, but exhibit anomalous properties when measured jointly. We note that such anomalies are at the heart of recent constructions for attaining the highest possible capacity of a quantum channel for the transmission of classical data[Holevo79, Holevo96, Schumacher97a, Haus].
In the PeresWootters scheme, the preparator chooses one of three linear polarization directions 0, 60, or 120 degrees, and gives Alice and Bob each one photon polarized in that direction. Their task is to determine which of the three polarizations they have been given by a sequence of separate measurements on the two photons, assisted by classical communication between them, but they are not allowed to perform joint measurements, nor to share entanglement, nor to exchange quantum information.
Of course, because the three twophoton states are nonorthogonal, they cannot be cloned or reliably distinguished, even by a joint measurement. However, Peres and Wootters performed numerical calculations which provided evidence (more evidence on an analogous problem was provided by the work of Massar and Popescu[MP]) indicating that a single joint measurement on both particles yielded more information about the states than any sequence of local measurements. Thus unentangled nonorthogonal states appear to exhibit a kind of quantitative nonlocality in their degree of distinguishibility. The discovery of quantum teleportation, incidentally, grew out of an attempt to identify what other resource, besides actually being in the same place, would enable Alice and Bob to make an optimal measurement of the PeresWootters states.
Another antecedent of the present work is a series of papers[GV, Asher, Tal98] resulting in the conclusion[Tal98] that several forms of quantum key distribution[BB84] can be viewed as involving orthogonal states of a seriallypresented bipartite system. These states cannot be reliably distinguished by an eavesdropper because she must let go of the first half of the system before she receives the second half. In this example, the serial timeordering is essential: if, for example, the two parts were placed in the hands of two separate classicallycommunicating eavesdroppers, rather than being serially presented to one eavesdropper, the eavesdroppers could easily cooperate to identify the state and break the cryptosystem.
In this paper we report a form of nonlocality qualitatively stronger than either of these antecedents. We extensively analyze an example in which Alice and Bob are each given a threestate particle, and their goal is to distinguish which of nine product states, the composite quantum system was prepared in. Unlike the PeresWootters example, these states are orthogonal, so the joint state could be identified with perfect reliability by a collective measurement on both particles. However, the nine states are not orthogonal as seen by Alice or Bob alone, and we prove that they cannot be reliably distinguished by any sequence of local measurements, even permitting an arbitrary amount of classical communication between Alice and Bob. We call such a set of states “locally immeasurable” and give other examples, e.g., a set of two mixed states of two twostate particles (qubits), and sets of four or eight pure states of three qubits, which apparently cannot be reliably distinguished by any local procedure despite being orthogonal and unentangled.
In what sense is a locally immeasurable set of states “nonlocal?” Surely not in the usual sense of exhibiting phenomena inexplicable by any local hidden variable (LHV) model. Because the are all product states, it suffices to take the local states and , on Alice’s and Bob’s side respectively, as the local hidden variables. The standard laws of quantum mechanics (e.g. Malus’ law), applied separately to Alice’s and Bob’s subsystems, can then explain any local measurement statistics that may be observed. However, an essential feature of classical mechanics, not usually mentioned in LHV discussions, is the fact that variables corresponding to real physical properties are not hidden, but in principle measurable. In other words, classical mechanical systems admit a description in terms of local unhidden variables. The locally immeasurable sets of quantum states we describe here are nonlocal in the sense that, if we believe quantum mechanics, there is no local unhidden variable model of their behavior. Thus a measurement of the whole can reveal more information about the system’s state than any sequence of classically coordinated measurements of the parts.
The inverse of local measurement is local preparation, the mapping from a classicallyprovided index to the designated state , by local operations and classical communication. If the states are unentangled, local preparation is always possible, but for any locallyimmeasurable set of states this preparation process is necessarily irreversible in the thermodynamic sense, i.e., possible only when accompanied by a flow of entropy into the environment. Of course if quantum communication or global operations were allowed during preparation, the preparation could be done reversibly, provided that the states being prepared are orthogonal.
By eliminating certain states from a locallyimmeasurable set (such as in Eq. (3) below), we obtain what appears to be a weaker kind of nonlocality, in which the remaining subset of states is both locally preparable and locally measurable, but in neither case (so far as we have been able to discover) by a thermodynamically reversible process. Curiously, in these situations, the entropy of preparation (by the best protocols we have been able to find) exceeds the entropy of measurement.
Besides entropies of preparation and measurement we have explored other quantitative measures of nonlocality for unentangled states. One obvious measure is the amount of quantum communication that would be needed to render an otherwise local measurement process reliable. Another is the mutual information deficit when one attempts to distinguish the states by the best local protocol. Finally one can quantify the amount of advice, from a third party who knows , that would be sufficient to guide Alice and Bob through an otherwise local measurement procedure.
The results of this paper also have a bearing on, and were directly motivated by, a question which arose recently in the context of a different problem in quantum information processing. This is the problem of entanglement purification, in which Alice and Bob have a large collection of identical bipartite mixed states that are partially entangled. Their object is to perform a sequence of operations locally, i.e., by doing quantum operations on their halves of the states and communicating classically, and end up with a smaller number of pure, maximally entangled states. Recently, bounds on the efficiency of this process have been studied by Rains[Rains] and by Vedral and Plenio[VP]; other constraints on entanglement purification by separable superoperators have recently been studied by Horodecki et al.[Horod].
In their work, they represent the sequence of operations using the theory of superoperators, which can describe any combination of unitary operations, interactions with an ancillary quantum system or with the environment, quantum measurement, classical communication, and subsequent quantum operations conditioned on measurement results. In the operatorsum representation of superoperators developed by Kraus and others, the general final state of the density operator of the system is written as a function of the initial state as:
(1) 
The operators appearing in this equation will be referred to as “operation elements.” A tracedecreasing superoperator satisfies the condition and is appropriate for describing the effect of arbitrary quantum measurements on the system ([Bar], Sec. III), while a tracepreserving superoperator specified by describes a general time evolution of the density operator if a measurement is not made or its outcomes are ignored[supops]. Reference [Niels] has a useful general review of the superoperator formalism.
To impose the constraint that Alice and Bob act only locally, Rains, and Vedral and Plenio, restricted their attention to separable superoperators, in which the operation elements have a direct product form involving an Alice operation and a Bob operation:
(2) 
We will show in Sec. II.2 (see also [Bar], Sec. IX.C) that all operations that Alice and Bob can perform during entanglement purification bilocally, in which they can perform local quantum operations and communicate classically, can be written in this separable form. This was enough for the derivation of valid upper bounds on the efficiency of entanglement purification. But the natural question which this led to is the converse, that is, can all separable superoperators be implemented by bilocal operations?
The answer to this question is definitely no, as a result of the examples which we analyze in this paper. Quantum measurements are a subset of the superoperators, and measurements involving only product states are separable superoperators. Thus, our proof that some unentangled states cannot be distinguished locally shows that some separable superoperators cannot be implemented by only separate operations by Alice and Bob with classical communication between them. This indicates that any further investigations of entanglement purification protocols involving separable superoperators will have to be performed with some caution.
This paper is organized as follows: Section II presents the example and sketches the proof that these states cannot be distinguished by local measurements. Appendix LABEL:appaa gives many of the important details of this proof, and Appendix LABEL:peter supplies a crucial technical detail, that all superoperators can be decomposed into a sequence of very weak measurements. Section III shows how the measurement can be done locally if some states are excluded, and presents the best measurement strategy we have found for distinguishing (imperfectly) all nine states. Section IV shows how the measurement can be done for the example if entanglement is supplied. Section V analyzes the thermodynamics of local state measurement, studying the heat generated in measurement and in state preparation; Appendix LABEL:appd gives some details. Section VI analyzes a threeparty example involving 8 pure states. Section VII gives other compact examples (4 pure states in a system, 2 mixed states in a system) and poses some questions for the future (Appendix LABEL:appc gives details of a specific problem considered there).
Ii A separable measurement which is not bilocal
ii.1 The ensemble of states in a 33 Hilbert space
We will consider the following complete, orthonormal set of product states . They live in a ninedimensional Hilbert space, with Alice and Bob each possessing three dimensions. We will use the notation , , and for the bases of Alice’s and Bob’s Hilbert spaces. The orthonormal set is
(3) 
Here stands for , etc. Figure 1 shows a suggestive graphical way to depict the 9 states of Eq. (3) in the Hilbert space of Alice and Bob. The four dominoes represent the four pairs of states that involve superpositions of the basis states. State is clearly special in that it involves no such superposition.
ii.2 The measurement
We will show that the separable superoperator consisting of the projection operators
(4) 
cannot be performed by local operations of Alice and Bob, even allowing any amount of classical communication between them. In Eq. (4), the output Hilbert space is different from the input; it is a space in which both Alice and Bob separately have a complete and identical record of the outcome of the measurement. See Sec. VII for a discussion of why we use the particular form of Eq. (4) for the operator; note that the input state need not be present at the output in Eq. (4).
Since this superoperator corresponds to a standard von Neumann measurement, we can equally well consider the problem in the form of the following game: Alice and Bob are presented with one of the nine orthonormal product states (for the time being, with equal prior probabilities, let us say—this is not important, it is only important that the prior probabilities of states through be nonzero). Their job is to agree on a measurement protocol with which they can determine, with vanishingly small error, which of the nine states it is, adhering to a bilocal protocol.
Let us characterize bilocal protocols a little more explicitly. Our discussion will apply both to bilocal measurements and to bilocal superoperators (in which the measurement outcomes may be traced out). By prior agreement one of the parties, let us say Alice, initiates the sequence of operations. The most general operation that she can perform locally is specified by the set of operation elements
(5) 
We will immediately specialize to the case where each value labels a distinct “round 1” measurement outcome which she will report to Bob, since no protocol in which she withheld any of this information from Bob could have greater power. She cannot act on Bob’s state, so her operators are always the identity on his Hilbert space. can also include any unitary operation that Alice may perform before or after the measurement. Note also that the operator may not be a square matrix; the final Hilbert space dimension may be smaller (but this would never be useful) or larger (because of the introduction of an ancilla) than the original.
After the record is reported to Bob, he does his own operation
(6) 
The only change from round 1 is that Bob’s operations can be explicit functions of the measurements reported in that round. Now, the process is repeated. The overall set of operation elements specifying the net operation after rounds is given by multiplying out a sequence of these operations:
(7)  
(8)  
(9) 
Here the label can be thought of as a concatenation of all the data collected through the rounds of measurement:
(10) 
Equations (79) demonstrate the fact that all bilocal operations are also separable operations. It is the converse statement that we are about to disprove for the operator corresponding to the ninestate measurement, Eq. (4).
We can get some intuitive idea of why it will be hard for Alice and Bob to perform Eq. (4) by local operations by noting the result if Alice and Bob perform simple, local von Neumann measurements in any of their rounds. These measurements can be represented on the “tictactoe” board of Fig. 1 as simple horizontal or vertical subdivisions of the board. The fact that any such subdivision cuts apart one of the dominoes shows very graphically that after such an operation the distinguishability of the states is spoiled. This spoiling occurs in any local bases, and is more formally just a reflection of the fact that the ensemble of states as seen by Alice alone, or by Bob alone, is nonorthogonal.
However, it is not sufficient to show the impossibility of performing Eq. (4) using a succession of local von Neumann measurements, as Alice and Bob have available to them an infinite set of weak measurement strategies[Aharonov]. Much more careful reasoning is required to rule out any such strategy. In the remainder of this section we present the details of this proof, which also results in a computation of an upper bound on the amount of information Alice can Bob can obtain when attempting to perform the ninestate measurement bilocally.
ii.3 Summary of the proof
We assume that Alice and Bob have settled on a bilocal protocol with which they will attempt to complete the measurement as well as possible. We identify the moment in the execution of this measurement when Alice and Bob have accumulated a specific amount of partial information. We will have to show that it is always possible to identify this moment, either in Alice and Bob’s protocol or in an equivalent protocol which can always be derived from theirs. We then show, based on the specific structure of the nine states, that at this moment the nine possible input states must have become nonorthogonal by a finite amount. We then present an informationtheoretic analysis of the mutual information obtainable in the complete measurement, and show, using an accessibleinformation bound, that the mutual information obtainable by Alice and Bob twolocally is less, by a finite amount, than the information obtained from a completely nonlocal measurement.
Now we present the steps of this proof in detail.
ii.4 Information accumulation and the modified continuous protocol
If the measurement has proceeded to a point where measurement record has been obtained, an inference can be made using Bayes’ theorem of the probability that the input state was :
(11) 
We take all prior probabilities to be equal to , so they will drop out of this equation. The measurement probabilities are given by the standard formula
(12) 
Here is the operation element of Eq. (7); the quantum state in Alice’s and Bob’s possession has been transformed to
(13) 
We imagine monitoring these prior probabilities every time a new round is added to the measurement record in Eq. (10). We will divide the entire measurement into two stages, I and II; “stage I” of the measurement is declared to be complete when , for some , equals a particular value (the choice of this value is discussed in detail in the next subsection). “Stage II” is defined as the entire operation from the end of stage I to the completion of the protocol.
There is a problem with this, however: the measurement record changes by discrete amounts on each round, and it is quite possible for these probabilities to jump discontinuously when a new datum is appended to this measurement record of Eq. (10). Thus, it is likely that the probabilities will never attain any particular value, but will jump past it at some particular round. The probabilities would evolve continuously only if Alice and Bob agree on a protocol involving only weak measurements, for which all the and of Eqs. (8,9) are approximately proportional to the identity operator. But, in an attempt to thwart the proof about to be given, Alice and Bob may agree on a protocol which has both weak measurements and strong measurements (for which the operators of Eqs. (8,9) are not approximately proportional to the identity).
However, such a strategy will never be helpful for Alice and Bob, because, for any bilocal measurement protocol which they formulate involving any combination of weak and strong measurements, a modified measurement protocol exists that involves only weak measurements for which the amount of information extracted by the overall measurement is exactly the same. For this modified protocol an appropriate completion point for “stage I” of the measurement can always be identified. Thus we can prove, by the steps described below, that the modified protocol cannot be completed successfully by bilocal operations, and we give a bound on the attainable mutual information of such a measurement. But, since the modified protocol is constructed to have the same measurement fidelity as the original one, this proves that any protocol, involving any combination of weak and strong measurements, also cannot attain perfect measurement fidelity.
The modified protocol is created in a very simple way: it proceeds through exactly the same steps as the original protocol, except that at the point where the result of a strong measurement is about to be reported to the other party by transmission through the classical channel, the strong measurement record, treated as a quantummechanical object, is itself subjected to a long sequence of very weak measurements. The outcomes of these weak measurements are reported, one at a time, to the other party and appended to the measurement record in Eq. (10).
The precise construction of this weakmeasurement sequence is described in Appendix LABEL:peter. The weak measurements are designed so that in their entirety they give almost perfect information about the outcome of the strong measurement (the strong measurement outcome itself can be reported at the end of this sequence as a confirmation). So, the recipient of this steam of reports from the outcomes of the weak measurements need only wait until they are done to know the actual (strong) measurement outcome in order to proceed with the next step of the original protocol. But, except in cases with vanishingly small probability, the information contained in the accumulating measurement record grows continuously.
To conclude this discussion of the modified measurement protocol, we can show how Alice and Bob can be duped into being unwitting participants in the modified protocol, and also give an illuminating if colloquial view of how the “continuumization” of the measurement can take place. What is required is a modification of the makeup of the classical channel between Alice and Bob. We imagine that when Alice transmits the results of a measurement, thinking that it is going directly into the classical channel to Bob, it is actually intercepted by another party (Alice’), who performs the necessary sequence of weak measurements. Here is a way that Alice’ can implement this operation: She examines the bit transmitted by Alice. If the bit is a 0, she selects a slightly headbiassed coin, flips it many times, each time transmitting the outcome into the classical channel. If the bit is a 1, she does the same thing with a slightly tailbiassed coin. At the other end of the channel there is another intercepting agent (Bob’) who, after studying a long enough string of coin flips sent by Alice’, can with high confidence deduce the coin bias and report the result to Bob. Alice and Bob are oblivious to this whole intervening process; nevertheless, as measured by the data actually passing through the channel, the modified protocol with nearly continuous evolution of the available information has been achieved.
ii.5 The state of affairs after stage I of the measurement
Having established that no matter what Alice and Bob’s measurement protocol, we can view the probabilities as evolving continuously in time, we can declare that stage I of the measurement is complete when
(14) 
that is, after the probabilities have evolved by a small but finite amount away from their initial value of . It should be noted that since some measurement outcomes might be much more informative than others, the time of completion of stage I is not fixed; it will in general require a greater number of rounds for one measurement record than for another.
The in Eq. (14) should be some definite, small, but noninfinitesimal number. Moreover, we will require that all posterior probabilities be nonzero. For this any value smaller than will be acceptable, since
(15) 
We now rewrite Bayes’ theorem from Eq. (11):
(16) 
Here we have introduced an abbreviated notation for several operators which will come up repeatedly in the upcoming derivations:
(17) 
Where there is no risk of confusion we will drop the index from , , and .
It is easy to bound the greatest possible spread in the probability distribution:
(18) 
An important technical consequence of declaring stage I complete at this point is that it is guaranteed that all the matrix elements and are nonzero; this condition will be used repeatedly in the analysis of Appendix LABEL:appaa (to be described shortly). The more crucial condition from Eq. (18) is that either the following equation is true,
(19) 
or the corresponding equation for is true. This says that either the operator or differs from being proportional to the identity operator by a finite amount. This will be the key fact in the analysis we are about to report.
The basic idea is that at the completion of stage I, from Alice’s and Bob’s point of view there is a nonzero probability that the initial state was any one of the nine. In order for Alice and Bob to complete the job of identifying which state they have been given—with a reliability approaching 100%—it is necessary that the nine states remaining after stage I, Eq. (13), still be almost perfectly distinguishable. That is, the states must still be nearly orthogonal. But we can show that, because of Eq. (19), these residual states cannot be sufficiently orthogonal to complete the task. In fact, we will be able to compute exactly to what extent they must be nonorthogonal. For we can show that, if we assume that the overlap of any two of these residual states is or less, i.e.,
(20) 
then both and will both be almost proportional to the identity operator, with relative corrections proportional to . This is done in Appendix LABEL:appaa where these corrections are derived precisely. The important consequence of this is that
(21) 
and the same for . Equations (19) and (21) cannot be both satisfied unless , that is, unless the residual states are nonorthogonal by a finite amount.
So, at this point we can conclude that the measurement Eq. (4) cannot be done bilocally, except with less that 100% accuracy; this is the main result that we set out to prove. We now proceed to a more quantitative analysis of bilocal approximations to this measurement.
ii.6 Informationtheoretic analysis of the twostage measurement
We can now perform an analysis of the precise effects of this nonorthogonality, and derive an upper bound on the information attainable by Alice and Bob from any bilocal protocol. We will use the standard classical quantifier of information, the mutual information[Abr], which gives the amount of knowledge of one random variable (in our case, the identity of quantum state ) gained by having a knowledge of another (here, the outcome of the measurement).
Recall that we have broken the measurement by Alice and Bob into two stages. We will call the random variable describing the stageI outcomes . The outcomes of all subsequent (stageII) measurements will be denoted by random variable . Alice and Bob’s object is to deduce perfectly the label of one of the nine states (Eq. (3)); we will use the symbol for this random variable (for “which wavefunction”). We quantify the information attainable in the measurement by the mutual information between and the composite measurement outcomes and . For a perfect measurement, the attainable mutual information is ; we will show that must be less than this. We first use the additivity property of mutual information ([Abr], p. 125) to write:
(22) 
This expression introduces the mutual information between and conditional on , which can be written as an average over all the possible outcomes of the measurement in stage I:
(23) 
Now, combining Eqs. (22,23) with the definition of the mutual information:
(24) 
and using the fact that the entropy of the initial distribution , we obtain
(25) 
To show that Eq. (25) must be less than it will be sufficient to show that each member of the sum is strictly positive. The conditions at the end stage I make it possible for us to do this.
To make things explicit, let us suppose that at the end of stage I the residual quantum states (recall Eq. (13)) occur with probabilities from Eq. (16). (There will be no confusion from leaving out the label.) Moreover, let us suppose that the measurement to be performed in stage II corresponds to a positive operatorvalued measure fixed by measurement outcome . Then the explicit expression for the mutual information becomes
(26)  
where . Note that .
Without loss of generality for the present set of manipulations, let us take and to be the two states assured to have a nonvanishing overlap (recall Eq. (20)). We may partition the density operator according to the two states that interest us most as follows. Let
(27) 
where and . We can think of this partition as generating two new “which wavefunction” random variables and —the probabilities associated with these random variables are just the renormalized ones appearing in Eq. (27). Note that . Then, by the classic converse to the concavity of the Shannon entropy ([OhyaPetz], p. 21), it follows that
(28) 
where is the binary entropy function. Hence, if we write
(29)  
(30) 
it follows that
(31) 
We can further bound this, so as to remove all dependence on states through , by noting that
(32) 
Combining Eqs. (31) and (32) gives
(33) 
Equation (33) can be further bounded so as to remove any explicit dependence on and by noting that, for fixed , the first term in the expression on the righthand side is minimized when . (One can verify this simply by taking a derivative respect to one of the free variables.) Making that restriction, one can see furthermore that the resultant term is monotonically increasing in . Thus the bound we are looking for can be found by taking to be its minimal allowed value, namely (recall Eq. (15)). With all that in place, we have that
(34) 
where now .
Finally it is a question of removing all dependence on the quantum measurement . This can be gotten by noting that the two rightmost terms in the righthand side Eq. (34) simply correspond to the mutual information given by the measurement about the two equiprobable nonorthogonal quantum states and (cf. Eq. (29)). Optimizing over all quantum measurements, we obtain the accessible information of those two states [LevitinFuchs]. Inserting that into Eq. (34) and recalling Eq. (20) we finally find,
(35) 
where is again the binary entropy.
The last bound can be made useful by establishing a quantitative link between and in Eq. (35). To do this, we must identify the value of for which, given all the constraints derived in Appendix LABEL:appaa, it is first possible to satisfy Eq. (19) for some values of and . It is this value of which must be used in the bound Eq. (35). We have exhaustively examined all pairs to determine which one allows the greatest ratio of (or ) matrix elements for a given value of . We find this to be the case for and in Eq. (3) (or other symmetryequivalent ones). For this choice we can write
(36) 
This ratio attains its maximum value when
(37) 
These are the extremal values permitted by Eqs. (LABEL:AsiaMinor) and (LABEL:thebiggie). The value this gives is
(38) 
The smallest value of for which Eqs. (19) and (38) are consistent is given by the solution to the equation
(39) 
Using Mathematica, we have found the choice of and consistent with Eq. (39) that gives the strongest bound on the mutual information in Eq. (35). We obtain:
(40) 
where the mutualinformation deficit . This upper bound is attained when , corresponding to a nonorthogonality parameter and a minimumprobability parameter . Thus, we bound the information attainable by bilocal operations by Alice and Bob away from that attainable in a fully nonlocal measurement by a minute but finite amount.
Iii Searching for Optimal Local Measurements
Equation (40) gives our upper bound on the mutual information one can obtain by means of local operations and classical communication. However, it is unlikely that this bound is a close approximation to the actual optimal mutual information accessible in this way; most likely the optimal value is significantly lower. In this section we explore specific measurement strategies for our ninestate ensemble in order to get a sense of how well one can in fact distinguish the states by local means. We will thereby obtain a lower bound on the mutual information.
We begin by considering a simpler problem, namely, distinguishing only eight of the nine states from each other. That is, we consider the case where the prior probability of one of the states is zero.
As we noted earlier, state from Eq. (3) is special. In fact, it is never used in the analysis of Appendix LABEL:appaa; thus, its presence or absence is irrelevant to the nonorthogonality conditions which we have derived. This means that this state is not necessary to make the measurement undoable bilocally. Thus, even if we take the prior probabilities of the states such that , we will still reach the conclusion that the full mutual information is unattainable by a bilocal procedure (the quantitative analysis will be different than that given above).
The same is not true for the other states: if the prior probability of any of the states  is zero, then the measurement can be completed successfully by Alice and Bob. Figures 2 and 3 illustrate this for the case when the state is left out. One way of explaining the strategy is that since the 45 domino of Fig. 2 is no longer complete, it can be cut by a von Neumann measurement, which will disturb state but still leave it distinguishable from all the other eight states. Thus, the protocol can begin with cut 1 of Fig. 2, which corresponds to an incomplete vonNeumann measurement by Bob which distinguishes his state from states or (but does not distinguish between and ). The next step to be taken by Alice depends on the reported outcome as received by her from Bob, as indicated by the tree of Fig. 3; likewise all four rounds of the measurement are similarly contingent on the measurement outcomes of preceding rounds. The object at every round is to move towards isolating a domino so that its pair of states can be distinguished by a measurement in the rotated basis.
We now turn to our original problem of distinguishing optimally among all nine states, assumed to have equal prior probabilities. The measurement strategy just described is a reasonable one to pursue even when all nine states are present. It accurately distinguishes states and , and it distinguishes these states from and ; it fails only to distinguish these last two states from each other. (In applying Fig. 3 to this case, one should imagine replacing “5” with “4 or 5.”) Thus if Alice and Bob use this measurement, then with probability they obtain the full bits of information, and with probability they are left one bit short; so the mutual information is = 2.9477 bits. One can, however, do better, and we now present a series of improvements over the above strategy.
We may express the improved measurements as sequences of positive operatorvalued measures (POVMs). For example, Bob could start with a POVM consisting of elements (these are 33 matrices that must satisfy the constraint ), after which Alice will perform a measurement , and so on. As it happens, all of our improved measurements can be represented in terms of POVMs whose elements are diagonal in the standard bases for Alice and Bob. It is therefore convenient to represent these POVM elements by their diagonal values. For example, in the measurement described above, Bob’s opening POVM (in this case a von Neumann measurement), which distinguishes his state from and , has two elements which we represent as and .
Our first improvement is to replace this von Neumann measurement by a more symmetric POVM whose elements are and . (If Bob were to perform this measurement when his part of the system was in the central state , the outcome would be random.) Note that each outcome of this measurement rules out one of the columns of Fig. 1; that is, it rules out one of Bob’s states or . Once this has been done, Alice may freely cut either the 67 domino or the 89 domino, and from this point Bob and Alice may proceed as above to find out (with no further damage) in which domino the actual state lies. However, Bob’s initial measurement damages both the 23 domino and the 45 domino, so that at the end, he will not be able to distinguish perfectly between and or between and . Thus, in order to evaluate the mutual information obtainable via this strategy, we need to know the effect of Bob’s initial POVM on these four states. This effect depends on what operation element we choose to associate with the POVM element . Any satisfying is allowed, but it is simplest to let be , where is the classical record of the outcome. To see how this measurement affects the states, let us suppose that the actual state is , so that Bob’s part of the system begins in the state . Then if Bob gets the outcome , the final state of Bob’s part of the system (not including the classical record) is , and if he gets the outcome the final state is and .) If the initial state had been , then the results would have been the same but with replaced by . Thus the first outcome renders and completely indistinguishable, while the second merely makes them nonorthogonal. In the latter case Bob can, at the end, try to determine whether the original state was or by performing the optimal measurement for distinguishing two equally likely nonorthogonal states [LevitinFuchs]. In this case the optimal measurement is simply the orthogonal measurement whose outcomes are B() and B(). Similar considerations apply to the states or . One finds that this strategy yields a mutual information of 2.9964 bits, which is an improvement over the strategy of Fig. 3. . (These states are automatically subnormalized so that their squared norms are the probabilities of the corresponding outcomes, namely,
A further improvement is gained by replacing Bob’s initial POVM by a less informative and less destructive one whose elements are and , where . The rest of the measurement is left unchanged. Optimizing over , one finds that this strategy can yield 3.009 bits of mutual information. Note, however, that in this case Bob’s initial measurement does not rule out any column of Figure 1, so that when Alice later cuts a domino, she may be cutting the actual state, in which case her action will cost them one bit. One may suspect that Alice should be more careful, and indeed the mutual information is improved if she makes a weaker measurement. In fact, the best strategy we have found delays until the fourth round a measurement that guarantees the complete cutting of a domino.
This best strategy consists of the following steps, in which the values of the parameters and are to be determined by optimization:

Bob: vs . Let us assume that Bob gets the first outcome. (In the other case all the POVM elements appearing in the succeeding steps have their diagonal values reversed; that is, the roles of states and are interchanged.)

Alice: vs . The first outcome cuts the 89 domino, and we go directly to step 5. The second outcome makes it safer for Bob to risk cutting the 45 domino, so we proceed to step 3.

Bob: vs . The first outcome cuts the 45 domino, and we go directly to step 5. The second outcome makes it safer for Alice to cut the 67 domino, so we proceed to step 4.

Alice: vs . Either outcome cuts the 67 domino.

At this point some domino has been cut, so that Alice and Bob can proceed as above to determine in which domino the actual state lies. If this domino contains two states that have not been collapsed into the same state, Alice and Bob then perform a measurement to try to distinguish them.
Optimizing over the values of the parameters, we find that the mutual information is bits. (One set of parameter values giving this result is .) Moreover, numerical evidence indicates that no further advantage is gained by allowing another round before making a firm cut (it would be a cut of the 23 domino, as we proceed clockwise around the grid). Thus it is conceivable that this value of the mutual information is indeed optimal, though we cannot rule out an entirely different strategy that does better.
Summarizing the results of this Section and the preceding one, we have
(41) 
Note that the results presented in this section can be seen as a realization of the ideas behind our proof in Section II. Alice and Bob begin by performing a sequence of POVMs aimed at determining in which domino the actual state lies; this sequence can be thought of as stage I of the measurement. At this point, just as in our proof, the states remaining to be distinguished have become nonorthogonal, so that the final mutual information must fall short of bits.
Iv A realization of the twoparty separable superoperator with shared qubits
Having established that the measurement can only be done approximately if Alice and Bob only communicate classically, it is natural to ask what quantum resources would permit them to complete the measurement. It is obvious that they can do it if Alice ships her entire 3state system to Bob and he performs the full operation in his lab, reporting the result classically back to Alice. In the case of all 9 states having equal prior probability, this requires the transmission of qubits. If state is left out and the other 8 states are equiprobable, the density matrix of the state held by Alice has less than maximal entropy, in fact it has
However, in the ninestate case we can exhibit a protocol for completing the measurement which requires a smaller overall number of qubits transmitted. It starts with the imperfect protocol involving only classical communication just discussed (Fig. 4), and adds a part to permit states 4 and 5 to be perfectly distinguished. This will require only qubits of transmission. In neither case do we know that the procedures which we discuss here are optimal. qubits (over many repetitions of the measurement). For the 8state case the protocol will actually be worse than the straightforward one, requiring
The modified protocol for the 9state case begins with Alice transmitting the component of her Hilbert space to Bob. It is obvious that she could do this by sending 1 qubit, if she adopts a 3qubit unary encoding of her Hilbert space, i.e., , , and . In fact the third qubit in this representation has less than maximal entropy, having entropy (it has higher entropy, , for the 8state case). Thus, again using Schumacher’s theorem[BS], the transmission can be compressed over many realizations of the measurement so that only of a qubit per measurement needs to be transmitted.
As indicated by the tree in Fig. 4, Bob’s possession of permits him to immediately do a measurement which distinguishes whether the state is , , or is one of the others. After this has been done the sequence of measurements proceeds identically as in the classical protocol (Fig. 4), except that some possibilities can be pruned off as they correspond to and cases which have already been distinguished. Before completing round 4, Alice must be again in possession of , which requires a qubit transmission back from Bob. This qubit is not compressible, but this transmission will only be required if the state is or , which will only happen of the time, and will count as qubits of transmission ( for the 8state case).
Adding up the qubit transmissions at the beginning and the end of Fig. 4 gives bbcjpw]. Note that even with the assistance of qubit transmissions, this protocol requires several rounds of classical transmission; it is a true “twoway” protocol, that is, requiring bidirectional classical communication[BDSW]. qubits as mentioned above. This transmission can be made unidirectional, since a qubit sent in one direction, if it is entangled with a qubit left behind, may always be used to teleport a qubit in the opposite direction[
V Thermodynamics of nonlocal measurements and state preparation
v.1 Irreversibility of measurement
We now explore another informationtheoretic feature of our twoparty measurement that illustrates in another way the nonlocality of this orthogonal measurement. If the parts of the quantum states are assembled in one location, then a measurement in any orthogonal basis, in addition to being doable with 100% fidelity, can be done reversibly. That is, the quantum state can be converted into classical data without any discarding of information to the environment. Therefore by Landauer’s principle [Landauer] no heat is generated during the measurement. The reversible method can be illustrated by a simple qubit example: if the measurement is to distinguish from , and the classical record of the bit is to be stored in the macro states and (containing, say, qubits), then the procedure involves starting the macro system in a standard state (so that the initial states of the system to be measured is either or ), then performing repeated quantum XOR operations[BDSW] with the qubit to be measured as the source and all the qubits of the macro state as the targets. In the end, the measured qubit may as well be considered to be part of the macro system containing the classical answer. Note that no interaction with any other environment is necessary to complete this, or any other, local orthogonal measurement.
The situation is rather different for our twoparty orthogonal measurement. Suppose that we consider a case in which the measurement can be achieved by Alice and Bob, for example the case in which state is promised not to be present. Although Alice and Bob can perform this measurement, they clearly cannot do so reversibly, i.e., as a finite sequence of local reversible operations and classical communications. In the protocol described in Fig. 2, the irreversibility arises in the first step, where, if the state is , it is irreversibly transformed to either state or to . Thus in this case 1 bit of entropy is produced. If each of the eight permitted states occurs with equal probability, then the average entropy generated is of a bit. We cannot prove this entropy of measurement is minimal, though we have found no more efficient protocol. Many other cases can be easily worked out; for example, if it is promised that the state is only one of four (say, , , , and ), then of a bit of entropy will be generated by the obvious protocol.
It appears that reversible measurements are only possible if the set of states can be progressively dissected by Alice and Bob without breaking any dominoes. To formalize this notion, we introduce a few definitions. Let be a set of pure product states shared between Alice and Bob, where . Given such a set, we define a splitting of by Alice as a partition of into two nonempty disjoint subsets such that for all and for all in , . A splitting by Bob is defined similarly. A set is dissectible if there is a tree each of whose interior nodes is a splitting by Alice or Bob and whose leaves are singletons. For example, using the numbering of Eq. (3) and Fig. 1, the set is dissectible, but is not. The dissectibility of an arbitrary set can be determined by examining finitely many possible splitting trees. Clearly any subset of a dissectible set is dissectible. It is evident that if an ensemble of states has support only on a dissectible set, then both its entropy of preparation and entropy of measurement are zero. It is tempting to argue that, conversely, nondissectible sets, if they are locally measurable at all, have positive entropies of measurement, but to be sure of this, one would have to exclude the (unlikelyseeming) possibility of multistep measurement procedures which, while not strictly reversible for any finite , would succeed in identifying each of the states in the nondissectible set with error probability and entropy production both tending to 0 in the limit of large .
A further analysis of this irreversibility reveals that it can be thought of as originating in the necessity for classical communication between Alice and Bob. In order to assure that the channel between them can convey only classical and no quantum information, the channel itself must possess a quantum environment (in order to dephase the data passing though it). This raises the possibility that Alice or Bob will be obliged to become entangled with the environment of the channel in the course of communicating the necessary classical information, thereby causing themselves to have a finite amount of entropy. Exactly the same amount must also appear in the channel environment. When, for example, Alice and Bob have been given state , and Bob sends the result of his first measurement in Fig. 3 (collapsing his state to a mixture of and ) to Alice, he has created entanglement between the measurement outcome and the environment, so that the joint system of message and environment is left an entangled state of the form , where and are two orthogonal states of the environment.
Note that measurement protocols requiring classical communication are not inevitably irreversible. For example, for the dissectible set considered previously, a bit of communication from Bob to Alice is required to complete the measurement; still no entropy is generated. This is so because this bit is guaranteed to be in one of the computational basis states, precisely the states with which the dephasing channel does not entangle. It is the necessity, in the above example, of delivering a bit to the channel which is in a superposition of basis states which leads to the entanglement and the irreversibility.
v.2 Irreversibility of state preparation
For dissectible sets of states, such as , the mapping
(42) 
(using the notation of Eq. (3)) between classical instructions and the state described is locally reversible and can be performed in either direction without the generation of waste information. Conversely, nondissectible sets, such as , cannot be prepared by any finite sequence of reversible operations, and we conjecture that even asymptotic multistep protocols could not reduce either the heat of preparation or the heat of measurement to zero. Perhaps surprisingly, the heats of preparation and measurement, by the best protocols we have been able to discover, are unequal.
To give an example of irreversible state preparation, consider the following method for the preparation for the nondissectible set mentioned above. The protocol, which is the best we know, will produce bits of entropy, considerably more than the entropy of measurement. The procedure works as follows: First, Bob computes a function of the preparation instruction which records whether the state to be synthesized is () or one of the others (), saving the result in a work bit. Then Alice and Bob reversibly prepare the modified four states of Fig. 5; that is, if the instruction is to prepare , is prepared, and in the other three cases exactly the desired state is produced.
This preparation can be carried out reversibly because the modified set is dissectible. Next, Bob performs a Hadamard rotation on his state (, , ) conditional upon the state of , which transforms into and leaves the other three states unchanged as desired. Finally Bob erases his work bit , which requires discarding bits of entropy into the environment. Similar reasoning shows that the equiprobable ninestate ensemble can be prepared at a cost of , and the equiprobable eightstate ensemble (without the center state) at a cost of bits of entropy.
It should perhaps be noted that the local preparation and measurement protocols we have described, while irreversible from the viewpoint of Alice and Bob, become reversible when viewed from a global perspective, including Bob, Alice and the environment. In the preparation protocol we have just described this global reversibility arises because the waste classical information discarded into the environment in the last step is not random, but instead is entirely determined by the joint state of Alice and Bob. Therefore discarding it, though it increases the entropy of the environment, does not increase the entropy of the universe. The global reversibility of the measurement protocol for this same set of four states arises because the information discarded into the environment in the final stage is merely the other half of the entanglement created at an earlier stage of the protocol, when one of the dominoes might have been collapsed. Thus the final act of discarding restores the environment to a pure state.
When speaking of the thermodynamic costs of local preparation and local measurement, it should be recalled that, although any set of product states can be locally prepared, not all sets can be locally measured. The full set of nine states of Eq. (3), for example, is not locally measurable at all, no matter how much heat generation is allowed. Conversely, there are sets of pure bipartite states that cannot be prepared locally, even with the generation of heat, because one or more states in the set is entangled. The concepts of entropy of preparation and entropy of measurement can nevertheless be extended to such sets, indeed to any orthogonal set of pure bipartite states, by allowing Alice and Bob to draw on a reservoir of prior entanglement (e.g., standard singlets shared between them) to help perform actions, such as teleportation[bbcjpw], that could not otherwise be done locally. In this fashion one can define an entanglementassisted entropy of local preparation, and an entanglementassisted entropy of local measurement. In entanglementassisted measurement, an otherwise immeasurable set like the original set of nine states is rendered measurable by teleporting quantum information as required, say, in the protocol of Fig. 4. However, each teleportation generates two bits of waste classical information per qubit teleported, thereby contributing to the entropy of measurement. Again we can calculate the amounts of entanglement consumed and entropy produced by simple protocols, without knowing whether more efficient ones exist. The protocols described earlier give an entanglementassisted entropy of measurement of 2.28304 bits for the equiprobable ninestate ensemble, and 2.40886 bits for the eightstate ensemble (omitting the central state), in each case twice the amount of entanglement consumed, because the protocols generate no other waste information aside from that associated with the teleportations. Turning now to entanglementassisted preparation, a typical set of states requiring entanglement to prepare from classical directions is set of four Bell states[BDSW] . The entropy of preparation by the obvious protocol in this case is two bits per state prepared (Bob reads the classical directions, applies an appropriate Pauli rotation to the standard to make the desired Bell state, then throws away the classical directions).
Finally suppose Alice and Bob are given an unknown member of the 9state set (or some other locallyimmeasurable set) and wish to determine which state they have without the help of entanglement, but with some hints from a person who knows which state they have been given. We define the “advice of measurement” as the minimal amount of advice needed (in conjunction with their own local actions) to guide Alice and Bob to the right answer. As we have seen above, a negative hint like “the state is not ” is sufficient. This might appear to be a lot of advice (as much as a totally informative positive hint like “the state is ,”) but in fact such negative hints are highly compressed by classical hashing techniques, asymptotically requiring only
We note, however, that the nonvon Neumann measurements discussed at the end of section III allow an even more efficient form of advice. There it was shown that an appropriate POVM yields bits of information about the unknown state in the 9state case; therefore, after Alice and Bob have performed their POVM, only 0.1575 bits of additional information need be provided asymptotically for them to identify the state exactly.
As an aside, we note that the value of advice, and the amount needed, may depend on its timing. Although in the 9state measurement problem the most efficient advice we know of can safely be given at the end, after the POVM has been completed, there are other situations in quantum information theory, not to mention in everyday life, when early advice is more useful than late advice. In the BB84 quantum key distribution protocol[BB84], for example, the basis information may be regarded as a form of advice that is delayed to make it less useful to the eavesdropper. In a deterministic setting, where the adviser can foresee all future events, nothing is lost by giving all necessary advice at the beginning. But when unforeseen events are possible, the most efficient kind of advice, better than prior or posterior advice, may be asneeded or concurrent advice. Suppose Alice and Bob are about to begin a long car trip. They ask their more experienced friend Eve which route to take. A few days later they telephone again, asking her how to repair a flat tire. To be helpful, the route advice must be given at the beginning, but it would be wasteful to give the repair advice then because the flat tire might not have happened. The prominent role of measurements, whether von Neumann or POVM, with unpredictable outcomes, in our analysis of the 9state problem suggests that asneeded advice might be the optimal kind here also.
The notion of advice of measurement can be extended to sets of entangled states as well, for example the set of four Bell states. Here one bit of advice is sufficient (e.g., whether the unknown Bell state is of the or type), since the other bit ( vs. ) can be learned by comparing the results of local measurements in the basis. The table summarizes the various measures of nonlocality for some of the ensembles we have been considering.
Ensemble 
9state  2468  246  4Bell  2Bell 

Locally Preparable  Yes  Yes  Yes  No  No 
Locally Measurable  No  Yes  Yes  No  Yes 
Dissectible  No  No  Yes  No  No 
Entropy of Prep.  0.764  0.811  0  2  1 
Entropy of Meas.  2.283  0.250  0  2  1 
Entanglement of Prep.  0  0  0  1  1 
Entanglement of Meas.  1.142  0  0  1  0 
Advice of Meas.  0.1575  0  0  1  0 
Notes: Entropies, entanglements, and advice for nonBell ensembles are upper bounds from known protocols—actual values could be less. Entropy of measurement for 9state and 4Bell ensembles are for entanglementassisted measurement, since these ensembles are otherwise not locally measurable. The ninestate ensemble consists of 9 equiprobable states of Eq. (3) and Fig. 1. The 2468 and 246 ensembles are equiprobable distributions over and respectively. The 4Bell ensemble consists of four equiprobable Bell states , and the 2Bell ensemble of two equiprobable Bell states, e.g., .
Vi A threeparty separable superoperator
We shall now show another example of a separable von Neumann measurement, this time involving three parties, Alice, Bob and Carol, each holding just a qubit (twostate system). While we have not performed a full analysis of this case, it appears to have the same properties as the 9state measurement above—that partial measurement causes indistinguishability of the residual states—suggesting that this is another case in which the measurement cannot be done locally by the three parties, even if the three can partake in any amount of classical communication among themselves. The superoperator involves a complete orthonormal set of eight product states living in the eightdimensional Hilbert space. This appears to be the smallest possible Hilbert space which still presents such behavior (it is easy to show, using a simple elimination process, that a qubittrit system or a qubitqubit system is not sufficient). The eight states are:
(43) 
(leaving out normalizations). On the right side of these equations we introduce an obvious shorthand for these states which we will use in the Discussion. We will indicate the evidence that the separable superoperator consisting of the projection operators
(44) 
cannot be performed by 3local operations, in which Alice, Bob, and Carol can only perform local quantum operations and broadcast classical information to each other.
The arguments are equivalent to those in the twotrit example, and again rely on considering any measurement as a twostage process. In the case where all prior probabilities are equal ( in this case), we declare stage I to be complete when
(45) 
with some positive smaller than , It is again simple to bound the greatest possible spread of the probability distribution:
(46) 
As before, this equation guarantees that all diagonal matrix elements of , , are nonzero, and it also guarantees that the maximum and minimum matrix elements are different. Also as before, we can show that the states after stage I become nonorthogonal, which should permit us to derive a definite mutualinformation deficit. We will not develop this proof here, but we will give a simple sketch of how we prove that the states are nonorthogonal. We will just show here that the states cannot be exactly orthogonal:
(47) 
This proof can be generalized step by step into a full analysis as in Appendix LABEL:appaa.
1) Writing the orthogonality condition for and gives condition that
(48) 
Since diagonal matrix elements of and must be nonzero by the arguments from Eq. (46), the factor must be zero; taking the real part gives
(49) 
2) Taking taking and and applying the same reasoning gives
(50) 
3) And taking and gives
(51) 
4) Now we write the four orthogonality conditions coming from all combinations of and :
(52) 
Adding these four equations gives