Information Theory Text Book
Information Theory Text Book
ALENCAR
FOR THE Marcelo S. Alencar
COMMUNICATIONS AND SIGNAL
ENGINEERING Information Theory covers the historical evolution of infor- PROCESSING COLLECTION
LIBRARY mation theory, along with the basic concepts linked to Orlando R. Baiocchi, Editor
information. It discusses the information associated to a cer-
Create your own
tain source and the usual types of source codes, information
Customized Content
transmission, joint information, conditional entropy, mutual
Bundle—the more
Information
information, and channel capacity.
books you buy,
The hot topic of multiple access systems for cooperative
the greater your
and noncooperative channels is also discussed, along with
discount!
code division multiple access (CDMA), the basic block of
most cellular and personal communication systems, and
THE CONTENT
Theory
the capacity of a CDMA system. Also presented is the
• Manufacturing
information theoretical aspects of cryptography, which
Engineering
are important for network security, a topic intrinsically
• Mechanical
Information Theory
connected to computer networks and the Internet.
& Chemical
Engineering To help the reader understand the theory, the text also
• Materials Science includes a review of probability theory, solved problems,
& Engineering illustrations, and graphics.
• Civil &
Marcelo Sampaio de Alencar received his bachelor‘s
Environmental
degree in Electrical Engineering from Federal University of
Engineering
Pernambuco (UFPE), Brazil in 1980; his master’s degree from
• Advanced Energy
Federal University of Paraiba (UFPB), Brazil in 1988; and
Technologies
his PhD from the University of Waterloo, Canada in 1993.
He is currently emeritus member of the Brazilian Tele-
THE TERMS
communications Society (SBrT), IEEE senior member, chair
• Perpetual access for
professor at the Department of Electrical Engineering,
a one time fee
Federal University of Campina Grande, Brazil. He is founder
• No subscriptions or
access fees and president of the Institute for Advanced Studies in
• Unlimited Communications (Iecom), and has been awarded several
concurrent usage scholarships and grants including three scholarships and
• Downloadable PDFs several research grants from the Brazilian National Council
• Free MARC records for Scientific and Technological Research (CNPq), two
grants from the IEEE Foundation, a scholarship from the
For further information, University of Waterloo, a scholarship from the Federal
a free trial, or to order,
contact:
University of Paraiba, and many others. He has published
over 350 engineering and scientific papers and 15 books.
Marcelo S. Alencar
[email protected]
ISBN: 978-1-60650-528-1
INFORMATION
THEORY
INFORMATION
THEORY
MARCELO S. ALENCAR
DOI: 10.5643/9781606505298
10 9 8 7 6 5 4 3 2 1
KEY WORDS
Code division multiple access, coding theory, cryptography, information
theory, multiple access systems, network security
vii
CONTENTS
List of Figures xi
List of Tables xv
Preface xvii
Acknowledgments xix
1. Information Theory 1
1.1 Information Measurement 2
1.2 Requirements for an Information Metric 4
2. Sources of Information 11
2.1 Source Coding 11
2.2 Extension of a Memoryless Discrete Source 12
2.3 Prefix Codes 14
2.4 The Information Unit 17
3. Source Coding 19
3.1 Types of Source Codes 19
3.2 Construction of Instantaneous Codes 23
3.3 Kraft Inequality 24
3.4 Huffman Code 27
4. Information Transmission 31
4.1 The Concept of Information Theory 32
4.2 Joint Information Measurement 32
4.3 Conditional Entropy 34
4.4 Model for a Communication Channel 34
4.5 Noiseless Channel 35
4.6 Channel with Independent Output and Input 36
4.7 Relations Between the Entropies 37
4.8 Mutual Information 38
4.9 Channel Capacity 41
ix
x • CONTENTS
References 139
Index 149
LIST OF FIGURES
xi
xii • LIST OF FIGURES
xv
PREFACE
xvii
xviii • PREFACE
and engineers who could afford to change and learn, as the market changes.
The market needs professionals who can model and design.
Few books have been published covering the subjects needed to under-
stand the very fundamental concepts of Information Theory. Most books,
which deal with the subject, are destined to very specific audiences.
The more mathematically oriented books are seldom used by people with
engineering, economics, or statistical background, because the authors are
more interested in theorems and related conditions than in fundamental
concepts and applications. The books written for engineers usually lack
the required rigour, or skip some important points in favor of simplicity
and conciseness.
The idea is to present a seamless connection between the more abstract
advanced set theory, the fundamental concepts from measure theory and
integration and probability, filling in the gaps from previous books and
leading to an interesting, robust, and, hopefully, self-contained exposition
of the Information Theory.
xix
CHAPTER 1
INFORMATION THEORY
Hartley has found several reasons why the natural logarithm should
measure the information:
The choice of the logarithm base defines the information unit. If base 2 is
used, the unit is the bit, an acronym suggested by John W. Tukey for binary
digit, which is a play of words that can also mean a piece of information.
The information transmission is informally given in bit(s), but a unit has
been proposed to pay tribute to the scientist who developed the concept, it
is called the shannon, or [Sh] for short. This has a direct correspondence
with the unit for frequency, hertz or [Hz], for cycles per second, which was
adopted by the International System of Units (SI).1
Aleksandr Yakovlevich Khinchin (1894–1959) put the Information
Theory in solid basis, with a more a precise and unified mathematical
discussion about the entropy concept, which supported Shannon’s intuitive
and practical view (Khinchin 1957).
The books by Robert B. Ash (1965) and Amiel Feinstein (1958) give
the mathematical reasons for the choice of the logarithm to measure infor-
mation, and the book by J. Aczél and Z. Daróczy (1975) presents several
of Shannon’s information measures and their characterization, as well as
Alfréd Rényi’s (1921–1970) entropy metric.
A discussion on generalized entropies can be found in the book edited
by Luigi M. Ricciardi (1990). Lotfi Asker Zadeh introduced the concept
of fuzzy set, and efficient tool to represent the behavior of systems that
depend on the perception and judgement of human beings, and applied it
to information measurement (Zadeh 1965).
X = {x1 , x2 , . . . , xn },
N
in which xk = , (1.1)
k=1
P = {p1 , p2 , . . . , pn },
N
in which pk = 1. (1.2)
k=1
Example: suppose the sample space is partitioned into two equally prob-
able spaces. Then
that is, the choice between two equally probable events requires one unit
of information, when a base 2 logarithm is used.
Considering the occurrence of 2N equiprobable symbols, then the self-
information of each event is given by
N
H (X ) = E[I (xi )] = − pi log pi . (1.6)
i=1
Symbol Probability
1
x1 4
3
x2 4
Example: consider a source that emits two symbols, with unequal proba-
bilities, given in Table 1.1.
The source entropy is calculated as
1 1 3 3
H (X ) = − log − log = 0.81 bits per symbol.
4 4 4 4
H (p1 , p2 , . . . , pN ) is continuous in pk , k = 1, 2, . . . , N ,
0 ≤ pk ≤ 1. (1.7)
• The maximum of the entropy is obtained when the events are equally
probable. That is, when nothing is known about the set of events,
or about what message has been produced, the assumption of a
uniform distribution gives the highest information quantity, that
corresponds to the highest level of uncertainty.
1 1 1
Maximum of H (p1 , p2 , . . . , pN ) = H , ,..., . (1.9)
N N N
INFORMATION THEORY • 5
Symbol Probability
1
x1 4
1
x2 4
1
x3 4
1
x4 4
Symbol Probability
1
x1 2
1
x2 4
1
x3 8
1
x4 8
Example: consider two sources that emit four symbols. The first
source symbols, shown in Table 1.2, have equal probabilities, and
the second source symbols, shown in Table 1.3, are produced with
unequal probabilities.
The mentioned property indicates that the first source attains the
highest level of uncertainty, regardless of the probability values of
the second source, as long as they are different.
• Consider that and adequate measure for average uncertainty was
found H (p1 , p2 , . . . , pN ) associated to a set of events. Assume that
event {xN } is divided into M disjoint sets, with probabilities qk ,
such that
M
pN = qk , (1.10)
k=1
Then, the creation of new events from the original set modifies the
entropy to
H (p1 , p2 , . . . , pN −1 , q1 , q2 , . . . , qM ) = H (p1 , . . . , pN −1 , pN )
q 1 q2 qM
+ pN H , ,..., , (1.12)
pN pN pN
with
M
pN = qk .
k=1
Notice that, for all independent random variables, the set of probabilities
p1 , p2 , . . . , pN −1 and also (1 − p1 − p2 − . . . − pN −1 ) are contiguous in
[0, 1], and that the logarithm of a continuous function is also continuous.
The entropy is clearly symmetric.
The maximum value property can be demonstrated, if one considers
that all probabilities are equal and that the entropy is maximized by that
condition
p1 = p2 = · · · = pN . (1.14)
Taking into account that according to intuition the uncertainty is
maximum for a system of equiprobable states, it is possible to arbitrar-
ily choose a random variable with probability pN depending on pk , and
k = 1, 2, . . . , N − 1. Taking the derivative of the entropy in terms of each
probability
dH
N
∂H ∂pi
=
dpk ∂pi ∂pk
i=1
d d ∂pN
=− (pk log pk ) − (pN log pN ) . (1.15)
dpk dpN ∂pk
But, probability pN can be written as
pN = 1 − (p1 + p2 + · · · + pk + · · · + pN −1 ). (1.16)
INFORMATION THEORY • 7
dH
= −( log2 e + log pk ) + ( log2 e + log pn ), (1.17)
dpk
dH pk
= − log . (1.18)
dpk pn
But,
dH
= 0, which gives pk = pN . (1.19)
dpk
As pk was chosen in an arbitrary manner, one concludes that to obtain
a maximum for the entropy function, one must have
1
p1 = p2 = · · · = pN = . (1.20)
N
H (1, 0, 0, . . . , 0) = 0. (1.21)
H (p1 , p2 , . . . , pN −1 , q1 , q2 , . . . , qM )
N −1
M
=− pk log pk − qk log qk
k=1 k=1
N
M
=− pk log pk + pN log pN − qk log qk
k=1 k=1
= H (p1 , p2 , . . . , pN ) + pN log pN
M
− qk log qk . (1.23)
k=1
8 • INFORMATION THEORY
But, the second part of the last term can be written in a way to display
the importance of the entropy in the derivation
M
M
qk
M
pN log pN − qk log qk = pN log pN − qk log qk
pN
k=1 k=1 k=1
M
qk qk
= − pN log
pN pN
k=1
q1 q 2 qM
= pN H , ,..., , (1.24)
pN pN pN
and this demonstrates the mentioned property.
The entropy is non-negative, which guarantees that the partitioning of
one event into several other events does not reduce the system entropy, as
shown in the following
that is, if one splits a symbol into two or more, the entropy always increases,
and that is the physical origin of the word.
Example: for the binary source, consider that the symbol probabilities
are p = 1/8 and q = 7/8, and compute the entropy of the source.
The average information per symbol is given by
H(p)
0 p 1
Symbol Probability
1
x1 2
1
x2 4
1
x3 8
1
x4 8
10 • INFORMATION THEORY
NOTES
1 The author of this book proposed the adoption of the shannon [Sh]
unit during the IEEE International Conference on Communications
(ICC 2001), in Helsinki, Finland, shortly after Shannon’s death.
CHAPTER 2
SOURCES OF INFORMATION
K
L= pk lk . (2.1)
k=1
The parameter L represents the average number of bits per symbol from
that is used in the source coding process. Let Lmin be the smallest possible
12 • INFORMATION THEORY
xk bk
Discrete memoryless Binary
Source encoder
source sequence
L ≥ H (X ).
H (X N ) = NH (X ). (2.5)
SOURCES OF INFORMATION • 13
X = {x1 , x2 , x3 }.
X 2 = {x1 x1 , x1 x2 , x1 x3 , x2 x1 , x2 x2 , x2 x3 , x3 x1 , x3 x2 , x3 x3 }.
η1 = 0.811.
and
η4 = 0.991.
Symbol Code
x1 1
x2 01
x3 001
x4 000
SOURCES OF INFORMATION • 15
Symbol Code
x1 1
x2 10
x3 100
x4 1000
Figure 2.2 illustrates the decision tree for the prefix code pointed in
Table 2.5.
The tree has one initial state and four final states, which correspond to
the symbols x1 , x2 , and x3 . From the initial state, for each received bit, the
decoder searches the tree until a final state is found.
The decoder, then, emits a corresponding decoded symbol and returns
to the initial state. Therefore, from the initial state, after receiving a 1,
the source decoder decodes symbol x1 and returns to the initial state. If it
receives a 0, the decoder moves to the lower part of the tree, in the following,
after receiving another 0, the decoder moves further to the lower part of
the tree and, after receiving a 1, the decoder retrieves x2 and returns to the
initial state.
Considering the code from Table 2.5, with the decoding tree from
Figure 2.2, the binary sequence 011100010010100101 is decoded into the
output sequence x1 x0 x0 x3 x0 x2 x1 x2 x1 .
By construction, a prefix code is always unequivocally decodable, which
is important to avoid any confusion at the receiver.
Consider a code that has been constructed for a discrete source with
alphabet {x1 , x2 , . . . , xK }. Let {p1 , p1 , . . . , pK } be the source statistics,
and lk be the codeword length for symbol xk , k = 1, . . . , K. If the binary
code constructed for the source is a prefix one, then one can use the
x0
1
Initial x1
state 1
0
x2
1
0
0
x3
Kraft-McMillan inequality
K
2−lk ≤ 1, (2.6)
k=1
H (X ) ≤ L < H (X ) + 1. (2.7)
The left hand side equality is obtained on the condition that symbol xk be
emitted from the source with probability pk = 2−lk , in which lk is the length
of the codeword assigned to symbol xk .
Consider the N th order extension of a memoryless discrete source. For
the N th order extension of a code, the encoder operates on blocks of N
symbols from the original source, instead of individual ones, and the source
alphabet X N has an entropy that is N times the entropy of the original
source.
Let LN be the average length for the extended prefix code. For an
unequivocally decodable code, LN is as small as possible, from Equa-
tion 2.7 it follows that
H (X N ) ≤ LN < H (X N ) + 1, (2.8)
therefore,
NH (X ) ≤ LN < NH (X ) + 1, (2.9)
SOURCES OF INFORMATION • 17
LN 1
H (X ) ≤ < H (X ) + . (2.10)
N N
In the limit, as N goes to infinity, the inferior and superior limitants
converge, and therefore,
1
lim LN = H (X ). (2.11)
N →∞ N
SOURCE CODING
Example: a binary block code is presented in Table 3.1, and a ternary block
code is shown in Table 3.2.
Let a block code map symbols from a source alphabet S into fixed symbol
sequences of a code alphabet X . The source can be an extension of another
source, which is composed of symbols from the original alphabet. The
n-ary extension of a block code that maps symbols xi into codewords Xi is
the block code that maps symbol sequences from the source (xi1 xi2 . . . xin )
into the codeword sequences (Xi1 Xi2 . . . Xin ) (Abramson 1963).
SOURCE CODING • 21
From the previous definition, the n-ary extension of a block code is also
a block code. The second order extension of the block code presented in
Table 3.4 is the block code of Table 3.5.
A block code is said to be uniquely decodable if and only if the n-ary
extension of the code is nonsingular for all finite n.
Example: if the bit stream 100000 is received, for example, it is not pos-
sible to decide if it corresponds to symbol x5 , unless the next symbol is
available. If the next symbol is 1, then the sequence is 100000, but if it is 0,
then it is necessary to inspect one more symbol to know if the sequence
corresponds to x6 (1000000) or x7 (10000000).
A uniquely decodable code is instantaneous if it is possible to decode
each codeword in a sequence with no reference to subsequent symbols
(Abramson 1963). Code A and B are instantaneous, and C is not.
It is possible to devise a test to indicate when a code is instantaneous.
Let Xi = xi1 xi2 . . . xim be a word from a certain code. The sequence of sym-
bols (xi1 xi2 . . . xij ), with j ≤ m, is called the prefix of the codeword Xi .
Example: the codeword 10000 has five prefixes: 1, 10, 100, 1000, and
10000. A necessary condition for a code to be instantaneous is that no
codeword is a prefix of another codeword.
SOURCE CODING • 23
Codes
Block Nonblock
Nonsingular Singular
Uniquely Nonuniquely
decodable decodable
Instantaneous Noninstantaneous
x3 ← 1110
and
x4 ← 1111.
24 • INFORMATION THEORY
In the previously constructed code, note that if one begins the code
construction by making x0 to correspond to 0, this restricts the available
number of codewords, because the remaining codewords had to, necessar-
ily, begin with 1.
On the other hand, if a two-digit word had been chosen to represent x0 ,
there would be more freedom to choose the others, and there would be no
need to assign very long codewords to the last ones.
A binary instantaneous code can be constructed to represent the five
symbols (Abramson 1963). The first assignment is
x0 ← 00.
x1 ← 01
and two unused prefixes of length two are saved to the following codeword
assignment:
x2 ← 10
x3 ← 110
x4 ← 111.
The question of which code is the best is postponed for the next section,
because it requires the notion of average length of a code, that depends on
the symbol probability distribution.
S = {x0 , x1 , . . . , xK−1 }
X = {x0 , x1 , . . . , xM −1 }.
3
2−li = 2−2 + 2−2 + 2−2 + 2−2 = 1.
i=0
Therefore, the codeword lengths of this code are acceptable for an instan-
taneous code. But, the Kraft inequality does not tell if A is an instanta-
neous code. It is only a necessary condition that has to be fulfilled by the
lengths.
For the example, the inequality states that there is an instantaneous
code with four codewords of length 2. In this case, it is clear that the
binary codewords of code A satisfy the Kraft inequality and also form an
instantaneous code.
For code B,
3
2−li = 2−1 + 2−3 + 2−3 + 2−3 = 7/8 ≤ 1.
i=0
x0 ← 0
x1 ← 10
x2 ← 11
x3 ← 20
x4 ← 21
x5 ← 220
x6 ← 221
x7 ← 222
For a source with 11 symbols, if the codeword lengths are 1, 2, 2, 2,
2, 2, 2, 3, 3, 3, 3, it is not possible to obtain a ternary instantaneous code,
because
10
1 1 1 31
3−li = + 6 + 4 = > 1.
3 9 27 27
i=0
SOURCE CODING • 27
1. The most frequent symbols, those with higher probability, are rep-
resented by shorter codewords.
2. The least frequent symbols are assigned codewords of same length.
Symbols Probabilities
x0 0.4
x1 0.2
x2 0.2
x3 0.1
x4 0.1
Symbol Phase I
x0 0.4 0.4
x1 0.2 0.2
x2 0.2 0.2
x3 0.1 0 0.2
x4 0.1
1
Figure 3.3. Huffman code. At each phase, the two least probable symbols
are combined.
SOURCE CODING • 29
Figure 3.4. (a) Example of the Huffman coding algorithm to obtain the
codewords. (b) Resulting code.
For the example, the average codeword length for the Huffman code is
given by
4
L= pk lk = 0, 4(2) + 0, 2(2) + 0, 2(2) + 0, 1(3) + 0, 1(3) = 2.2 bits.
i=0
4
1
H (X ) = pk log2
pk
i=0
1 1 1
H (X ) = 0, 4 log2 + 0, 2 log2 + 0, 2 log2
0, 4 0, 2 0, 2
1 1
+ 0, 1 log2 + 0, 1 log2 ,
0, 1 0, 1
or,
H (X ) = 2.12193 bits.
H (X ) 2.12193
η= = ,
L 2.2
on the way the bits are assigned. But, in spite of how the probabilities are
positioned, the average length is always the same, if the rules are followed.
The difference is the variance of the codeword lengths, defined as
K−1
V[L] = pk (lk − L)2 , (3.3)
k=0
INFORMATION TRANSMISSION
X = x1 , x2 , . . . , xN ,
Y = y1 , y2 , . . . , yM . (4.1)
The events from may jointly occur with events from . Therefore,
the following matrix contains the whole set of events in the product
Noise
space ,
⎡ ⎤
x 1 y1 x 1 y2 ··· x 1 yM
⎢ x2 y 1 x 2 y2 ··· x 2 yM ⎥
[XY ] = ⎢
⎣ ···
⎥ (4.2)
··· ··· ··· ⎦
xN y1 xN y2 ··· xN yM
Figure 4.2 shows the relation between the input and output alphabets,
which are connected by the joint probability distribution matrix [P(X , Y )].
The joint entropy between the random variables from sources X and Y
is given by
N M
H (X , Y ) = − pk,j log pk,j , (4.4)
k=1 j=1
and
H (Y ) = − p(y) log p(y). (4.7)
Y
X [ P(X,Y) ] Y
The expected value of the conditional entropy, for all possibles values
of y, provides the average conditional entropy of the system,
H (X |Y ) = E[H (X |y)] = p(y)[H (X |y)]
Y
=− p(y) p(x|y) log p(x|y), (4.9)
Y X
or
H (X |Y ) = − p(x, y) log p(x|y). (4.11)
Y X
In the same way, the mean conditional entropy of source Y , given the
information about source X , is
H (Y |X ) = − p(x)p(y|x) log p(y|x), (4.12)
X Y
or
H (Y |X ) = − p(x, y) log p(y|x). (4.13)
X Y
transmits the information to the destiny using a certain channel. The sys-
tem maybe described by a joint probability matrix, which gives the joint
probability of occurrence of a transmitted symbol and a received one,
⎡ ⎤
p(x1 , y1 ) p(x1 , y2 ) · · · p(x1 , yN )
⎢ p(x2 , y1 ) p(x2 , y2 ) · · · p(x2 , yN ) ⎥
[P(X , Y )] = ⎢
⎣ ···
⎥ (4.14)
··· ··· ··· ⎦
p(xM , y1 ) p(xM , y2 ) · · · p(xM , yN )
⎡ ⎤
1 0 ··· 0
⎢ 0 1 ··· 0⎥
[P(X |Y )] = [P(Y |X )] = ⎢
⎣· · ·
⎥ (4.16)
··· ··· · · ·⎦
0 0 ··· 1
N
H (X , Y ) = H (X ) = H (Y ) = − p(xi , yi ) log p(xi , yi ), (4.17)
i=1
H (Y |X ) = H (X |Y ) = 0. (4.18)
It is possible to show, using Bayes rule for the conditional probability, that
the joint entropy can be written in terms of the conditional entropy, in the
following way
H (X , Y ) = H (X |Y ) + H (Y ), (4.26)
H (X , Y ) = H (Y |X ) + H (X ). (4.27)
H (X ) ≥ H (X |Y ), (4.28)
But, the right hand side of the inequality is zero, as shown in the
following
(p(x) · p(y) − p(x, y)) log e = (p(y) − p(y)) log e
Y X Y
= 0. (4.30)
38 • INFORMATION THEORY
Therefore,
H (X ) ≥ H (X |Y ). (4.31)
H (Y ) ≥ H (Y |X ). (4.32)
1
I (xi ) = I (xi ; xi ) = log , (4.35)
p(xi )
I (X ; Y ) = H (X ) + H (Y ) − H (X , Y ), (4.40)
I (X ; Y ) = H (X ) − H (X |Y ), (4.41)
I (X ; Y ) = H (Y ) − H (Y |X ). (4.42)
Put that way, the average mutual information gives a measure of the
information that is transmitted by the channel. Because of this, it is called
transinformation, or information transferred by the channel. It is always
non-negative, even if the individual information quantities are negative for
certain pairs of symbols.
For a noiseless channel, the average mutual information equals the joint
entropy.
I (X ; Y ) = H (X ) = H (Y ), (4.43)
I (X ; Y ) = H (X , Y ). (4.44)
On the other hand, for a channel in which the output is independent of the
input, the average mutual information is null, implying that no information
is transmitted by the channel.
I (X ; Y ) = H (X ) − H (X |Y ),
= H (X ) − H (X ) = 0. (4.45)
m(A) ←→ H (X ), (4.46)
m(B) ←→ H (Y ). (4.47)
In the same way, one can associate the joint and conditional entropies
the union and intersection of sets, respectively,
m(A ∪ B) ←→ H (X , Y ), (4.48)
m(AB) ←→ H (X |Y ), (4.49)
m(BA) ←→ H (Y |X ). (4.50)
m(A ∩ B) ←→ I (X ; Y ). (4.51)
Those relations can be seen in Figure 4.3, which also serve as a means
to memorize the entropy and mutual information properties.
Therefore, the fundamental inequalities can be written as a result of set
operations (Reza 1961),
and, finally,
which is equivalent to
H (X , Y ) = H (X |Y ) + H (Y |X ) + I (X ; Y ). (4.56)
A A∩B B
For a noiseless channel, the two sets coincide, and the relations can be
written as:
and, finally,
For a channel with output symbols independent from the input symbols,
the sets A and B are considered mutually exclusive, therefore:
H (X , Y , Z) ≤ H (X ) + H (Y ) + H (Z), (4.65)
H (Z|X , Y ) ≤ H (Z|Y ). (4.66)
I (X ; Y , Z) = I (X ; Y ) + I (X ; Z|Y ), (4.67)
I (Y , Z; X ) = I (Y ; X ) + I (Z; X |Y ). (4.68)
can be associated to the input symbol alphabet, that is, for all memoryless
sources,
C = max I (X ; Y ) = max [H (X ) − H (X |Y )]. (4.69)
Example: the entropy attains a maximum when all symbols are equiprob-
able. Therefore, for the memoryless discrete channel, the capacity is
N
1 1
C = max − log ,
N N
i=1
which gives
C = log N bits per symbol. (4.71)
The channel capacity can also be expressed in bits per second, or shannon
(Sh), and corresponds to the information transmission rate of the channel,
for symbols with duration T seconds,
C
CT = bits per second, or Sh. (4.72)
T
C 1
CT = = log N bits per second, or Sh. (4.73)
T T
The ratio between the absolute redundancy and the channel capacity is
defined as the system relative redundancy,
log N − H (X )
Relative redundancy for a noiseless channel, D =
log N
H (X )
=1 − . (4.75)
log N
I (X ; Y ) H (X )
Efficiency of the noiseless channel, E = =
log N log N
= 1 − D. (4.76)
When the transmitted symbols do not occupy the same time interval, it
is still possible to define the average information transmission rate for the
noiseless channel, as
N
− i=1 p(xi ) log p(xi )
RT = N , (4.77)
i=1 p(xi )Ti
in which the maximum is over p(xi ). It must be noticed that the maxi-
mization in relation to the input probabilities do not always leads to an
admissible set of source probabilities.
Bayes’ rule defines the relation between the marginal probabilities p(yj )
and the a priori probabilities p(xi ),
N
p(yj ) = p1 (xi )p(yj |xi ), (4.79)
i=1
44 • INFORMATION THEORY
p(xi ) ≥ 0 i = 1, 2, . . . , N , (4.80)
N
p1 (xi ) = 1.
i=1
C = max I (X ; Y ),
1
1
pij
I (X ; Y ) = pij log . (4.81)
pi qj
i=0 j=0
p00 = r(1 − p), p01 = rp, p10 = (1 − r)p, p11 = (1 − r)(1 − p).
1−p
0 0
X Y
p
1 1
1−p
that can be put in the following way, after the simplification with logarithm
properties,
I (X ; Y ) = p log p + (1 − p) log (1 − p)
− [r(1 − p) + (1 − r)p] log [r(1 − p) + (1 − r)p]
+ [rp + (1 − r)(1 − p)] log [rp + (1 − r)(1 − p).
1
2
pij
I (X ; Y ) = pij log , (4.83)
pi qj
i=0 j=0
and the probabilities pij are the following, for p0 = r and p1 = v, with
r + v = 1,
p00 = rp, p01 = r(1 − p), p10 = 0, p11 = (1 − r)(1 − p), p02 = 0,
p12 = (1 − r)p.
46 • INFORMATION THEORY
C(p)
0 p 1
Figure 4.5. Graph for the capacity of the memoryless binary symmetric
channel.
p
0 0
1−p
1−p
1 1
p
C(p)
0 p 1
Figure 4.7. Graph of the capacity for the binary erasure channel.
5.1 INTRODUCTION
The multiple access channel can be analyzed through the use of an infor-
mation theoretic approach. In this type of communications system, several
users transmit simultaneously to a common receiver. The main multiplex-
ing strategies that fall in that category include frequency division multiple
access (FDMA), time division multiple access (TDMA), and code division
multiple access (CDMA).
The main objective of the information theoretic approach of the multi-
ple access channel is to find its capacity region, that is, the set of information
rates at which the simultaneous reliable communication of the messages
of each user is possible (Verdú 1989b).
The problem of finding the capacity region of a multiple access chan-
nel can be traced to a pioneering work on two-way communication chan-
nels by Shannon, in 1961 (Shannon 1961). A similar approach led to
the multi-way solution for the discrete memoryless channel (Ahlswede
1971). The related problem of simultaneous communication from one
source to several receivers was proposed in 1971, and solutions were
given to the cases of orthogonal, Gaussian, and compound channels (Cover
1972).
The usual assumption for the Shannon channel was a strategic coopera-
tion between the senders and the receiver. This approach changed in 1981,
when the lack of synchronization was proven not to reduce the capacity
region for the multiple access channel (Cover, McEliece, and Posner 1981).
Later it was shown, for the two-user case, that the only effect of frame asyn-
chronism on the discrete memoryless channel is the removal of the convex
hull operation from the expression of the capacity region.
50 • INFORMATION THEORY
The same result could be extended to the multiuser case and implies that
the maximum achievable rate sum, or total capacity, of the memoryless mul-
tiple access channel is never decreased by the lack of frame synchronism.
This is a direct implication of the convexity property, because the rate sum
of any convex combination of rate pairs is equal to the convex combination
of the respective rate sums (Hui and Humblet 1985).
Further study on the asynchronous Gaussian multiple access channel
established the nonoptimality of the conventional receiver (independent
detection of the signals), which is a result of the additive component of
multiple access interference (MAI). Fortunately for CDMA systems, in
which the designer is allowed to choose a signal constellation with large
bandwidth, the cross-correlations between the signals can be kept low for
all relative delays and an acceptable performance can be achieved (Verdú
1986).
A paper published in 1975 showed, by means of an example, that the
use of feedback could increase the capacity region of the multiple access
memoryless channel. But the problem of determining the actual capac-
ity region, when feedback is available, remained unsolved (Gaarder and
Wolf 1975). A point to be stressed here is that, despite the efforts of many
researchers in the field, the capacity region of the multiterminal commu-
nication channel remains, in general, unknown. A general formula for the
capacity of such channels is expressed by (Mandell and McEliece 1991).
The role of the memory in the multiple access environment was stressed
in the article (Verdú 1989a). In fact, the study of asynchronous multiple
access channels can be enriched with the theory of multiple access channels
with memory, in which the effect on the current symbol depends on the
previous state of the channel. The usual approach to the design of codes
for channels with memory has been the assumption of interleaving, which
provides an interesting trade-off between decoding complexity and storage
capacity.
In one of the first works to suggest this technique, interleaving appeared
as a type of time division multiplexing (Elliott 1965). The issue there was
that interleaving could enhance the error-control effectiveness of error-
correcting codes, when the separation between bits of a code word is on
the order of several hundred bits. The purpose of interleaving is to provide
randomization, in order to mitigate potential burst errors.
There are many different types of channels that can be characterized by
the nature of errors that are generated between transmitter and receiver.
On a bursty channel, given a burst of symbols, an interleaving scheme
can be used to spread the errors to many codewords. A suitable error-
correcting code could then be utilized to correct the remaining errors in
MULTIPLE ACCESS SYSTEMS • 51
each codeword. An optimum interleaver is then one which utilizes the least
amount of memory and has the smallest average delay between the input
and output of a symbol from the interleaver, besides providing the neces-
sary randomization of the data.
The symbol-asynchronous Gaussian multiple access channel encom-
passes many interesting models. The direct sequence spread spectrum mul-
tiple access channel represents a possible application for some of those
models. The capacity region of the symbol-asynchronous Gaussian mul-
tiple access channel was obtained recently (Verdú 1989b). However, fur-
ther study is necessary to obtain the capacity region for the asynchronous
Gaussian multiple access channel with memory. This could be helpful in
modeling the wireless digital communications channel as introduced in the
last chapter.
Moreover, for channels with memory, frame asynchronism may drasti-
cally reduce the capacity region and, in particular, the maximum achiev-
able rate sum. A useful model for a channel with memory will be
introduced in the last section of this chapter and analyzed in some detail.
X1
X2 CHANNEL Y
XM
I (X1 , · · · , XM ; Y )
= H (Y ) − H (Y |X1 , · · · , XM ) (5.1)
= H (Y ) − H (Z). (5.2)
M
V [Y ] = V [Xi ] + V [Z] (5.3)
i=1
M
≤ σi2 + σZ2 . (5.4)
i=1
Furthermore,
which is bounded as
in which si2 is
σi2
si2 = . (5.11)
σZ2
A multiuser discrete multiple access channel satisfies
Example: the set of all achievable rates must lie inside the region defined by
the inequalities R1 + · · · + RM ≤ C and Ri ≤ Ci , as depicted in Figure 5.2,
for the case of two users.
For the case of three users, the set of equations above define an upper
bound on the capacity, because other constraints can be imposed on the
partial sums of transmission rates.
The channel capacity, or the maximum possible transmission rate, can
only be attained on the convex hull, or contour surface. Thus time-sharing
could be necessary to achieve some of the points in the capacity region, for
54 • INFORMATION THEORY
R1
C1 R1 + R2 = C
0 C2 R2
Figure 5.2. Capacity region for the Gaussian multiple access channel,
M = 2.
r − 2γr22
pR (r) = e u(r). (5.13)
γ2
MULTIPLE ACCESS SYSTEMS • 55
in which the units will be bits per channel use and the parameter s2 will
be referred to as SNR. The equation results from the consideration of the
parametric equations defining the reliability function for random coding
for this channel and is valid for any positive valued random variable.
It is of interest here to know how this expression is related to the equation
1
C(s2 ) = log (1 + γ 2 s2 ), (5.15)
2
which might be easier to use in many situations.
From the convexity of the logarithm function and Jensen’s inequality
(Blahut 1987) it is clear that for all SNRs, s2 , C(s2 ) ≤ C(s2 ). The magnitude
of the difference C(s2 ) − C(s2 ) will be shown, graphically, to be small
for the case of a Rayleigh fading distribution. Implicit in the form of the
capacity expression is the fact that Y should be a Gaussian random variable
to achieve capacity. Consequently U = RX should be a Gaussian random
variable.
The question of determining the distribution of X , for a given distribu-
tion of R to yield U a Gaussian distribution is considered. It is noted that
56 • INFORMATION THEORY
since U and R are correlated with unknown joint distribution, this prob-
lem is quite different from the usual transformation problem for random
variables. It is solved for the case of R Rayleigh distributed. The general
problem seems difficult.
From the form of the equation for the capacity of the channel, it is
argued that to achieve capacity, Y must be a zero mean Gaussian random
variable. For a given fading random variable R, the signal random variable X
should be chosen so that U = RX is Gaussian. Since U and R are correlated
it does not seem a simple matter to compute the distribution of X = U /R,
using the standard techniques.
Likewise, it does not seem a simple matter to compute I (X , Y ) and
the maximizing distribution to achieve capacity, making the technique of
(Ericson 1970) the more interesting.
(2k)! 2k
E[R2k ] = k!γ 2k E[U 2k ] = σ , (5.16)
2k k! U
and so
2k
E[U 2k ] 2k σ
E[X 2k ] = = √ , k = 1, 2, 3, · · · . (5.17)
E[R2k ] k 2γ
√
Let a = σ/( 2γ ). Let pX (x) be the density of X and φX (v) the characteristic
function. The series expansion of φX (v) is
∞
E[X 2k ]
φX (v) = ( − 1)k
(2k)!
k=0
∞ 2k
= ( − 1)k k
(5.18)
(2k)!
k=0
∞
(av)2k
= ( − 1)k
(k!)2
k=0
Example: introducing the Rayleigh pdf, from Equation 5.13, into the pre-
vious formula gives
1 ∞ r − r2
C= log (1 + r 2 s2 ) 2 e 2γ 2 u(r)dr. (5.21)
2 −∞ γ
58 • INFORMATION THEORY
5
2.0
4
1.0
C 3
0.5
0
-10 0 10 20 30
SNR
Figure 5.3. Average and actual capacity, for γ = 0.5, 1.0, and 2.0.
MULTIPLE ACCESS SYSTEMS • 59
M
C= Ci (5.34)
i=1
or
2
1
M
S +1
C= log (5.35)
2
i=1
Si2 + 1
Example: the last expression is maximized when all the transmitters carry
the same power, that is, σi2 = σ 2 . This assumption simplifies the expression
of the INR to
M −1 2
Si2 = S (5.37)
M
and also simplifies the formula for the channel capacity
M
1 S2 + 1
C = log M −1 2
(5.38)
2 M S +1
or
M
1 M (S 2 + 1)
C= log . (5.39)
2 M (S 2 + 1) − S 2
Example: an interesting result can be obtained by taking the limit of the
capacity when the number of sources goes to infinity. First, the formula is
rearranged to
−M
1 S2
C = log 1 − (5.40)
2 M (S 2 + 1)
MULTIPLE ACCESS SYSTEMS • 61
R1
C'1 R1 + R2 = C
C1
0 C2 C'2 R2
and
1 σ22 1 σ 2
C2 = log 1 + 2 C2 = log 1 + 22 . (5.46)
2 σ1 + σZ2 2 σZ
αk–1 αk
k–1 k k +1
βk βk+1
M
C = sup πk C(k). (5.48)
{πk } k=1
or
2
1
k
S +1
Ck = log (5.50)
2
i=1
Si2 + 1
MULTIPLE ACCESS SYSTEMS • 65
in which S 2 represents the total SNR. The expression can be put in the form
k
1 S2 + 1
Ck = log . (5.51)
2 S2 + 1
i=1 i
The last expression reaches a maximum when all the transmitters carry
the same power, that is, σi2 = σX2 . This assumption simplifies the formula
of the interference to noise ratio (INR) to
k −1 2
Si2 = S (5.52)
k
and also simplifies the formula for the channel conditional capacity, giving
k
1 S2 + 1
Ck = log (5.53)
2 k−1 2
k S +1
or
k
1 k(S 2 + 1)
Ck = log , (5.54)
2 k(S 2 + 1) − S 2
which can be further simplified by noticing that the total SNR is S 2 = ks2 ,
in which s2 = σX2 /σZ2 .
Two special cases for the AB model are treated here. First, the case
in which the birth and death parameters are considered constant for any
state, and the access is unrestricted. Second, the case in which the users
are discouraged by the amount of noise in the system. This occurs because
every new user can be thought to degrade the system to a certain extent,
by decreasing the SNR. For the first case, αk = λ and βk = μ. For the
second case, αk = λ/(k + 1) and βk = μ, in which λ and β are fixed
probabilities.
Example: the first case produces a Geometric distribution for the proba-
bilities.
This distribution shows the likelihood of being in a certain state in the
long run, after the transient dies out and is given by the equation
4 0.0
0.1
0.2
3 0.3
0.4
C 0.5
0.6
2
0.7
0.8
0.9
1
log e/2
0.99
0.999
0
-10 0 10 20 30
SNR
more than L users at a time in the system is given by ρ L+1 . Using Equa-
tion 5.48, and substituting the appropriate terms, the capacity for this case
is defined as the maximum, over the set of utilization factors
∞
k
(k + 1)σX2 + σN2
C = max (1 − ρ)ρ k log , (5.56)
ρ
k=1
kσX2 + σN2
Example: the solution for the second case seems to be a better approach
to reality. A Poisson distribution is obtained for the stationary probabilities
of the ISC model, given by
MULTIPLE ACCESS SYSTEMS • 67
4 SNR=30dB
3
SNR=20dB
C
2
SNR=10dB
1
log e/2
SNR=0dB
0 SNR=-10dB
0.2 0.4 0.6 0.8 1
Utilization factor
ρ k −ρ
pk = e , k = 0, 1, 2, . . . (5.57)
k!
with mean and variance equal to ρ.
The capacity, therefore, is given by the maximum, over the range of
utilization factors, of the following formula
∞
k
ρk (k + 1)σX2 + σN2
−ρ
C = max e log . (5.58)
ρ
k=1
k! kσX2 + σN2
The results are shown in Figures 5.8 and 5.9 as a function of the SNR
and utilization factor, respectively. Figure 5.9, in particular, gives a good
example of the system behavior as the utilization factor increases. It can
be seen that, as the distribution becomes more uniform, the sum capacity
tends to the already computed limit log e/2. As the factor increases there
is a more pronounced tendency for the occupation of higher states in the
Markov chain.
Several models were described in this chapter. The first modeled the
case when the users acknowledge a certain protocol for transmission and
cooperate to achieve the maximum transmission rates. In the second model,
the users do not cooperate, in the sense that they transmit their signals
68 • INFORMATION THEORY
2
0
1.8
1
1.6
1.4 2
1.2
3
C
1 4
5
0.8
log e/2
0.6
0.4
0.2
-10 0 10 20 30
SNR
Figure 5.8. Capacity for the channel with Poisson accessibility, as a func-
tion of the signal-to-noise ratio, for ρ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
C
1.4 SNR = 20dB
1.2
1 SNR = 10dB
0.8
log e/2
SNR = 0dB
0.6
0.4
SNR = -10dB
0.2
0
0 2 4 6 8 10
Utilization factor
Figure 5.9. Capacity for the channel with Poisson accessibility, as a func-
tion of the utilization factor, for some values of the signal-to-noise ratio.
6.1 INTRODUCTION
This chapter presents some of the main characteristics of spread spectrum
systems, including the indication of possible types of interference, anti-
jamming advantages of the technique in terms of signal-to-noise (SNR)
ratio and possible applications of the techniques. The usefulness of spread
spectrum to overcome interference is stressed. Among others, the main
sources of interference are multipath, multiple media, multiple user, and
multiple cell (Viterbi 1991).
The first developments using concepts related with the current tech-
niques of spread spectrum occurred in the mid-forties, and the applications
involved military problems (Scholtz 1982). The techniques acquired great
popularity in the communications community, for the last 20 years, since
the papers on the subject began to appear after two decades of secrecy due
to military interests.
The main feature of spread spectrum signals is related to their band-
widths, which are much larger than those assigned to baseband or mod-
ulated signals. The way in which the spectrum is spread is essential, and
is often realized by a code that is independent of the transmitted data.
The receiver needs a replica of the code, synchronized in time, in order to
recover the data.
The spreading process introduces a great deal of redundancy in the
transmitted signal. This allows the receiver to suppress part of the interfer-
ence embedded in the signal, which is the chief advantage of the process.
It is possible to devise other advantages of the spread spectrum systems
in relation to personal communications (Pickholtz, Schilling, and Milstein
1982; Scholtz 1982).
72 • INFORMATION THEORY
One of the strategies in use to spread the spectrum is the Direct Sequence
modulation (DS), in which a fast pseudo-random sequence causes ampli-
tude switching in the data carrying signal, as suggested in Figure 6.1. In this
Figure, the data signal is represented by b(t), the pseudo-noise sequence is
c(t), Tb is the bit interval, and Tc is the chip interval.
Figure 6.2 shows a block diagram for the direct sequence spread spec-
trum system. The signal is randomized before reaching the modulator. At
the receiver, a synchronization circuit retrieves the PN code to produce the
original signal.
Another technique used is called Frequency Hopping (FH), in which
the signal is subject to frequency switching, according to a predetermined
pseudo-random sequence.
Figure 6.3 illustrates the FH technique, in which a pseudo-noise gener-
ator is used as input to a synthesizer to produce a pseudo-random sequence
of frequencies that shifts the originally modulated signal frequency.
b(t)
+1
Tb 2Tb t
-1
c(t)
+1
Tc Tb 2Tb t
-1
In the long run, though, it’s the author’s opinion that we must stand
back and question the wisdom of squandering a precious resource
such as bandwidth for reasons of expediency (Viterbi 1985).
interval [0, Tb ] will be y(t) = xj (t) + nI (t) + n(t), in which n(t) is the addi-
tive white Gaussian noise (AWGN) with power spectral density of N0 /2
W /Hz and nI (t) is the multiple access interference, also assumed to be
Gaussian distributed.
It can be demonstrated that it is possible to represent the whole signal set
by the use of no more than L (L ≤ M ) orthogonal basis functions. It must
also be pointed out that only those noise waveforms inside the signal space
are relevant. In theory, a noise signal would require an infinite number of
components. It is said that the above defined set has dimension N , if this
is the number of basis functions needed to characterize the set of signals
(Wozencraft and Reiffen 1961). It can be shown that L ≈ 2WTb , in which
W is the approximate bandwidth shared by the set of signals. The optimum
receiver for an AWGN channel consists of a bank of correlators, or a bank of
matched filters. The decision strategy is to choose the maximum correlator
output. The performance of this system depends only on the ratio of the bit
energy to the noise power spectrum density.
The key operation in the whole process is the spectral compression of the
interfering signal accomplished by the receiver. This is done by correlating
the received signal with a known spreading code. The information bear-
ing signal is compacted and, simultaneously, the interfering signal power
is spread over the frequency range. This implies an effective advantage
for the data signal in relation to the interfering signal, but requires perfect
synchronization of the pseudo-random code in the receiver. For a signal
of duration Tb and bandwidth W , the approximate number of dimensions
is 2WTb and the processing gain, or sequence code length, can be writ-
ten as
Tb
N= (6.1)
Tc
M
U= . (6.2)
V (1 + λ)
A more formal approach can be followed with the use of algebraic coding
theory and the definition of balanced codes. An (n, w, λ) binary balanced
code is a set of binary n-tuples, each of weight w, such that any two words
agree in at most λ positions. If the maximum number of words in a balanced
code is U , it follows that (Blake and Mullin 1976; Chung and Kumar 1990)
n
λ+1
U≤ . (6.3)
w
λ+1
if the reflected signal is delayed, compared to the direct signal, by more than
one code chip, the reflected signal is treated as any other uncorrelated input
signal. The higher the code chip rate, for a particular system, the smaller
its multipath problem. In the mobile environment, the result of multipath
reception is that a moving receiver is subject to rapid fading and peaking of
its input signal as the the direct and reflected signals cancel and reinforce
each other. Stationary receivers are also subject to multipath due to the
reflecting and refracting surfaces (Dixon 1984).
A few topics have interested researchers, industries, and users of CDMA
systems. The first one is the design of pseudo-noise sequences with good
autocorrelation and cross-correlation properties. The second concerns the
evaluation of the system capacity, or how many users a CDMA system can
support under given circumstances. The number of allocated users dictates
the type of distribution to characterize the Multiple Access Interference
(MAI). A large number of subscribers favor the Gaussian approximation.
For a small number of subscribers, considering that the MAI is additively
composed of cross-correlations limited to a few chips of coincidence, the
maximum interference is too small to apply the central limit theorem (Brady
1991).
The evaluation of system performance can be done in two ways: using
the average probability of error or computing the outage of the system. The
probability of error is averaged over the ensemble of channels obtained
from the assumed model. The outage measures the probability of trans-
mitting through a channel when the probability of error is greater then a
predefined threshold. In the case of time-varying channel and stationary
terminal, outage indicates the fraction of time the terminal operates with
an error probability above the limit. For the time-invariant channel and
mobile terminal, outage indicates the fraction of building area in which the
terminal can operate (Kavehrad and McLane 1987).
One interesting peculiarity of Direct Sequence Spread Spectrum (DSSS)
is that error detection and correction may not be advisable in some applica-
tions. This is due to the effect of the coding overhead, which increases the
signal apparent data transmission rate and reduces the available processing
gain (Dixon 1984). On the other hand, a substantial improvement in the
capture performance of coded DSSS can be achieved, in comparison to an
uncoded system. Capture phenomena characterize the ability of a receiver
to successfully receive a packet, even though part or all of the packet arriv-
ing at the receiver overlaps with the other packets. The basic mechanism
for capture is the ability of the receiver to synchronize with and lock on
to one packet and subsequently reject other overlapping packets as noise
(Soroushnejad and Geraniotis 1991).
CODE DIVISION MULTIPLE ACCESS • 79
N −1
rX (i) = x(k)x(k + i) (6.4)
k=0
N −1
rXY (i) = x(k)y(k + i). (6.5)
k=0
1 2 ... m-1 m
Example:
√ for large values of N and M the bound is well approximated
as N .
1 2 ... m-1 m
+
+
1 2 ... m-1 m
+ +
Gold codes are useful for CDMA applications for they form a large set
of codes. But, with the exception of a and b, the set of Gold sequences
are not maximum-length shift-register sequences of length N . Hence their
autocorrelation functions are not two-valued. Such codes can be viewed
as the duals of the cyclic codes generated by the product of two primitive
polynomials.
The advantage of Gold sequence generators is the possibility of pro-
ducing a new sequence with every change in phase position between the
two primary generators. In addition to this, the Gold codes may be chosen
so that, over a set of codes available from a given generator, the cross-
correlation between the codes is bounded. Thus, Gold codes are attractive
for applications in CDMA.
Furthermore, the maximal sequences of the same length are not guar-
anteed to have bounded cross-correlation. But,√ for a given value of m, the
Gold codes exhibit cross-correlation that is 2 greater than the maximal
length sequences of the same length (Dixon 1984).
A method for choosing the linear maximal codes, to be used as compo-
nents of Gold sequences, was introduced in 1967. The outputs form a set
of sequences whose members display cross-correlation and autocorrelation
side-lobes bounded by 2(m+1)/2 + 1 for m odd, and by 2(m+1)/2 − 1 for m
even (Gold 1967).
in which the operator T denotes a cyclic shift by one bit position. The
Kasami sequences are the elements of the Kasami code, each sequence
having block-length N .
The cross-correlation functions and off-peak autocorrelation functions
of elements of a Kasami code only take values in the set {−1, 2m/2 − 1,
−2m/2 − 1}. Hence the largest magnitude of any cross-correlation function
CODE DIVISION MULTIPLE ACCESS • 85
pairs of sequences from the code is 2m/2 + 1 (Blahut 1990). The Kasami
sequences also satisfy the Welch bound, and are in this sense optimal
(Proakis 1989).
It might be noted that if the sequences are allowed to be defined over the
complex mth roots of unity, for some positive integer m, then the auto and
cross-correlation properties of the sequences could be improved greatly,
and still maintain the balanced and run properties of the m-sequences, that
is, there is approximately an even distribution of each of the 2m elements
of the Galois Field, and short runs of identical symbols are more likely to
appear than longer runs (Komo and Liu 1990). An extensive work on the
subject of sequence design is found in (Sarwate and Pursley 1980).
Some attributes of spread spectrum, employed by CDMA systems, are
important for personal communications systems, including the utilization
of the voice activity factor, no need for equalization, use of one radio
per site, soft handoff, no need for guard time, sectorization for capacity,
less fading for the wide-band signal, easy transition from current systems,
capacity advantage, no frequency management or assignment needed, soft
capacity, coexistence with other systems and suitability for microcell and
in-building systems (Lee 1991).
Modeling a spread spectrum system implies the generation of random
sequences. Those sequences must exhibit the same properties, in order to
be eligible for CDMA use. In real life, practical constraints dictate the use
of pseudo-random, or pseudo-noise sequences (PN), whose properties are
well-known.
CHAPTER 7
7.1 INTRODUCTION
The area of Personal Communications Network (PCN) is a huge success in
the communications field. The main technology used in mobile communi-
cations is code division multiple access (CDMA) that has some advantages
over the previous techniques. This chapter describes and analyzes a model
for the discrete-input continuous-output, Direct-Sequence Spread Spec-
trum (DSSS), Gaussian multiple access channel.
In this chapter, the modulation scheme is specialized, with the choice
of DPSK modulation. This modulation scheme deserves some comments.
DPSK is regarded by many authors as a natural choice for CDMA, espe-
cially if one considers a fading multipath environment, in which syn-
chronous carrier recovery is a difficult task (Misser, Wijffels, and Prasad
1991; Turin 1984b).
As a differentially coherent modulation technique, DPSK also compares
well to similar systems using coherent PSK modulation and is less trouble-
some to demodulate in a complex environment such as an indoor wireless
channel (Kavehrad and McLane 1985; Kavehrad and Ramamurthi 1987).
The capacity of the PSK scheme, for the coding channel resulting from
hard decision detection, was computed in Sousa (1992). In this chapter,
upper and lower bounds on the sum capacity are obtained and compared.
techniques, CDMA has its capacity solely limited by the amount of inter-
ference into the system, which allows any reduction in interference to be
converted directly and linearly into an increase in capacity (Gilhousen et al.
1991).
The use of spread spectrum allows new users to share the spectrum
with existing ones. Because CDMA is interference limited, the number
of users that share the same spectrum, and still maintain an acceptable
performance, is determined by the interference generated by the set of
users.
In CDMA systems, each user is given a distinct pseudo-random code,
which either identifies the user to the central station in a star network, or
addresses another user in a more fully connected network (Turin 1984b).
Hence, the users utilize the same bandwidth and therefore interfere, mak-
ing a received signal appear noisier as the traffic increases in the system.
The effect of the interfering energy depends on the processing gain of the
spread spectrum technique.
As implied, spread spectrum systems are less subject to multipath signal
variations than conventional systems. In a direct sequence receiver, if the
reflected signal is delayed compared to the direct signal by more than one
code chip, the reflected signal is treated as uncorrelated to the direct signal.
The higher the code chip rate, for a particular system, the smaller its mul-
tipath problem. In the mobile environment, the result of multipath reception
is that a moving receiver is subject to rapid fading and peaking of its input
signal as the direct and reflected signals cancel and reinforce each other.
Stationary receivers are also subject to multipath, due to the reflecting and
refracting surfaces (Dixon 1984).
Researchers, industries, and prospective users of CDMA systems have
been concerned with three major issues regarding the design of those sys-
tems. The first one is the design of pseudo-noise sequences with good
autocorrelation and cross-correlation properties.
The second issue, is the evaluation of system performance, that can be
done in two ways: using the average probability of error or computing the
outage of the system. The probability of error is averaged over the ensemble
of channels obtained from the assumed model.
In the case of time-varying channel and stationary terminal, outage indi-
cates the fraction of time the terminal operates with an error probability
above the limit. For the time-invariant channel and mobile terminal, out-
age indicates the fraction of building area where the terminal can operate
(Kavehrad and Ramamurthi 1987).
The evaluation of the system capacity, or which transmission rate the sys-
tem can support under given circumstances, is the third issue. The number
THE CAPACITY OF A CDMA SYSTEM • 89
The output is the combination of the M inputs plus noise and can be assumed
Gaussian, using the central limit theorem for a large number of users
(Sousa 1989). The channel is assumed memoryless and, for the assumptions
made, the optimum receiver for a particular signal will be the maximum
likelihood receiver.
In the following, one considers the performance analysis of an asyn-
chronous CDMA system using direct-sequence codes. Let bij = ±1, i =
0, 1, . . . , M , j = −∞, . . . , ∞ be a binary bit string that modulates the
ith transmitter and let the codes ci (t), i = 1, . . . , M , 0 ≤ t ≤ Tb form a set
of quasi-orthogonal sequences, with cross-correlation given by
Tb
rij (τ ) = ci (t)cj (t − τ ) dt, |τ | ≤ Tb , i = j (7.1)
0
in which {φj (t)} is a set of orthonormal functions and [j/N ] represents the
greatest integer less than or equal to j/N . If modulation is used, along with
rectangular chip pulses, the functions are given by
M
y(t) = xi (t) + n(t) (7.6)
i=1
M
nI (t) = xk (t) (7.7)
k=1,k =i
M ∞
nI (t) = bkj/N ck (t − jTb − τk ) cos [ωc (t − jTb − τk )],
k=1,k =i j=−∞
(7.8)
in which {τk } is a set of independent random variables distributed over
the interval [0, Tb ). The total received signal can be written as an explicit
function of the MAI and Gaussian noise
∞
1
M
yi = bi Eb + bi[j/N ]
2
k=1,k =i j=−∞
Tb
× ci (t)ck (t − jTb − τk ) dt cos φkj
0
Tb
+ ci (t)n(t) cos ωc t dt (7.11)
0
in which
φkj = θk − ωc τk − jωc Tb (7.12)
mean and variance of the third term are 0 and N0 Eb /2, respectively, in which
Tb
Eb = ci2 (t) cos2 ωc t dt, i = 1, . . . , M . (7.13)
0
The second term deserves some attention. It can be written as (Turin
1984a)
1
M
Ni = nk cos φkj (7.14)
2
k=1,k =i
in which
∞
Tb
nk = bi[j/N ] ci (t)ck (t − jTb − τk ) dt. (7.15)
j=−∞ 0
But, the variance of the term nk , defined as σk2 , inside the summation
can be expressed as
Tb /N
Nτ 2 Tb2 Tb2
σk = 2
2
1− (Tb − τ )dτ = 2 − . (7.16)
0 Tb 3N 12N 2
in which
q−1
p(y) = p(y|xj )P(xj ), (7.20)
j=0
the logarithms are to the base 2, unless otherwise stated, implying that
information rates are given in bits and q represents the number of signal
levels.
THE CAPACITY OF A CDMA SYSTEM • 93
q−1
∞ p(y|xj )
Ci = max p(y|xj )P(xj ) log dy. (7.21)
P(xj ) −∞ p(y)
j=0
(y∓Eb )2
1 −
2σI2
p(y| ± Eb ) = √ e . (7.23)
σI 2π
The noise variance σI2 was obtained assuming that the interuser inter-
ference amounts to a Gaussian distribution, for a CDMA system using
direct-sequence codes (Turin 1984b). It can be simplified by noting that for
cik (t) = ±1, k = 1, . . . , N , the code energy is Ec = Tb = 2Eb , and for usual
values of the processing gain (Pursley 1977; Geraniotis and Pursley 1985)
(M − 1)Eb2 N 0 Eb
σI2 = + (7.24)
3N 2
in which M is the number of users in the channel, N is the processing gain,
N0 /2 is the Gaussian noise power spectrum density, and Tb is the bit time.
It can reasonably be argued, for the present model, that the sum capacity
is a one-feature function containing most of the needed information about
the behavior of the system. This is fair to say for spread spectrum tech-
niques, which define a noncooperative type of channel whose capacity is
given by the sum of the individual capacities,
M
C= Ci . (7.25)
i=1
94 • INFORMATION THEORY
The expansion is valid for the case in which the number of users in the
system is much larger than the code length, or when the thermal noise is
preponderant in the system, according to Equation 7.13.
Substituting into Equation 7.26 and integrating, gives
Eb2 Eb4 Eb6 Eb8 Eb10
Ci = log e − + + +O (7.28)
2σI2 4σI4 2σI6 12σI8 σI10
Using the first term in Equation 7.27 as an approximation for the indi-
vidual capacity and substituting the formula for the MAI variance one finds
a formula for the capacity, under the assumption of very low SNR
⎡ ⎤
⎢ Eb2 ⎥
Ci ≈ log e ⎣ ⎦ . (7.31)
(M −1)Eb2 N 0 Eb
2 3N + 2
Using now Equation 5.34, maximized for the case in which all the trans-
mitters carry the same power, one obtains
3 log eMN
C = MCi ≈ . (7.32)
2(M − 1) + 3N NEb0
Example: interesting results are obtained when one varies the SNR Eb /N0
or number of users in the system. One gets, as the SNR goes to zero
lim C = 0 (7.33)
Eb
N0 →0
Eb
C ≈ log eM (7.34)
N0
3 log eN
lim C ≈ (7.35)
M →∞ 2
96 • INFORMATION THEORY
160
140
120
100
C
80
60
40 Upper bound
Lower bound
20
As seen from the analysis undergone in the last section, the integrand in
the above equation is suitable for a series expansion, as long as
Eb /N0 1. In order to adjust the integrand, so that the expansion can hold
for Eb /N0 1 as well, one can resort to the Fourier transform properties.
The functions inside the integral
y2
1 −
2σI2
g(y) = √ e (7.39)
σI 2π
and
yEb
2yEb
− −
σI2 σI2
f (y) = e ln 1 + e (7.40)
and
π jπσI2 ω
π csc 2 + 2Eb
F(ω) = F[f (y)] = Eb
. (7.42)
σI2
+ jω
Therefore, applying the Fourier transform property of the integral of the
product of two functions,
∞ ∞
f (y)G(y)dy = F(ω)g(ω) dω (7.43)
−∞ −∞
one obtains
π jπσI2 ω
−
E2
b ∞ σ 2 ω2 π csc 2 + 2Eb
2σI2 − I2
Ci = 1 − log e e e Eb
dω. (7.44)
−∞
σI2
+ jω
Both expressions are good approximations for the actual value of the
capacity if Eb /σI2 1, which implies N M . For very long code sequences,
the hyperbolic cosine can also be simplified to
2
πσI2 ω 1 πσI2 ω
cosh =1 + + ···≈1 (7.53)
2Eb 2 2Eb
Using this simplification in Equation 7.51, one obtains the lower bound
and approximation
E2 ∞
− b
σI2 σI2 ω2
Ci ≥ 1 − 2π log e e 2σI2
e− 2 dω. (7.54)
Eb 0
and finally
E2
σI − b2
Ci ≥ 1 − π log e e 2σI . 3/2
(7.55)
Eb
This exponential approximation, referred to as Bound 1, is plotted along
with the exact capacity in Figure 7.2, for M = 20 and N = 100, after the
substitution into the formula for the sum capacity.
On the other hand, another approximation can be attempted using Equa-
tion 7.52, which can be put in the following form
σI2 σI4
E2 ∞
−( +
E2
)ω2
σI2
2
− b2 e b
Ci ≤ 1 − 2π log e e 2σI
dω. (7.56)
Eb 0 πσI2 ω
cosh 2Eb
100 • INFORMATION THEORY
20
10
0
Exact
C Bound
-10
-20
-30
0 5 10 15 20
Eb/No, dB
Figure 7.2. Bound 1 on the capacity for the channel, as a function of the
SNR (Eb /N0 ).
√ E2
2π 3/2 σI 1 − b
2σI2
Ci ≈ 1 − log e e (7.57)
2 Eb 1 σI2
2 + Eb2
It is interesting to observe that both 7.55 and 7.57 converge to the same
limit as the SNR increases. This approximation, referred to as Bound 2, is
plotted to compare with the exact capacity in Figure 7.3, for M = 20 and
N = 100, after the substitution into the formula for the sum capacity.
Equation 7.51 can lead to another lower bound on the individual capacity.
Consider for a while only the integral term of the mentioned equation
σI2 ω2
∞ e− 2
I= 2 dω. (7.58)
0 πσ ω
cosh 2EI b
THE CAPACITY OF A CDMA SYSTEM • 101
20
10
0
Exact
C Bound
-10
-20
0 5 10 15 20
Eb/No, dB
Figure 7.3. Bound 2 on the capacity for the channel, as a function of the
SNR (Eb /N0 ).
Using the usual assumption Eb /σI2 1, and the inequality arctan (x) ≤ x
it is possible to upper bound the integrand, obtaining
2Eb ∞ − σI2 ω2 π σI2 ω
I≤ ωe 2 sinh dω. (7.61)
π 0 2Eb
102 • INFORMATION THEORY
(M − 1)Eb2 N 0 Eb
σI2 = + (7.64)
3N 2
into Equation 7.63. This yields
√ 3/2 M −1 N0
Ci ≥ 1 − 2π log e +
3N 2Eb
3N π2 M − 1 N0
× exp − + + (7.65)
2(M − 1) + 3N NEb0 8 3N 2Eb
and N → ∞
√ N0 Eb π 2 N0
C ≥ M − 2π 3/2 log eM exp − + (7.68)
2Eb N0 16 Eb
THE CAPACITY OF A CDMA SYSTEM • 103
20
C
0
-20
-40
Exact
Bound
-60
-80
-100
-120
0 5 10 15 20
Eb/No, dB
Figure 7.4. Bound 3 on the capacity for the channel, as a function of the
SNR (Eb /N0 ).
The bound for the sum capacity can be expressed in terms of the prob-
ability of error, using Equation 7.82
2
1 π 1
C ≥ M − 2π 3/2
log ePe M exp (7.69)
ln (1/2Pe ) 8 2 ln (1/Pe )
The following derivation presents a lower bound for the channel capac-
ity, considering a large SNR, or a large ratio of processing gain to number
of users. This is likely to be the case in real systems, because both assump-
tions decrease the probability of error. In order to set the stage for the
evaluation of the lower bound, it is interesting to present a slightly different
formulation from that one presented in the past section. One begins with
Equation 7.47, repeated here for reference
E2 σI2 ω2
− b
2σI2
∞ e− 2
Ci = 1 − π log e e dω. (7.70)
πσI2 ω Eb
−∞ cosh 2Eb σI2
+ jω
The integral inside Equation 7.50, in last section, is not amenable for
an analytic solution. One form of attacking the problem is to find a bound
for the capacity, by a convenient substitution implied by the inequality
2
cosh (x) ≥ 1 + x2 . Then,
E2 σI2 ω2
− b
Eb ∞ e− 2
Ci ≥ 1 − 2π log e e 2σI2
dω.
σI2 0 πσI2 ω 2 Eb2
1+ 1
2 2Eb σI4
+ ω2
(7.71)
Therefore,
E2
− b
2π 2Eb
2σI2
Ci ≥ 1 − log e e 1 − erf
π2 − 8 π σI
⎤
4E 2 Eb2
π b
Eb
2e 2σI ⎦
2 2 2
× √ e π σI − 1 − erf √ (7.74)
2 2σI
Equation 7.77 is plotted in Figure 7.5, along with the exact solution
(upper curve), as a function of the SNR (SNR = Eb /N0 , dB) for M = 20
and sequence length N = 100. It is remarkable to notice that the difference
does not exceed 5 percent from −10 dB to 20 dB and actually converges
to zero as the SNR increases. The lower bound will thus be used as an
approximation for the actual capacity in the following, for values of SNR
of −10 dB or more.
One of the reasons for having a bound on the capacity, of the form
expressed by Equation 7.77, is the possibility of interpreting correctly the
role of the many parameters on the performance of the system, because
106 • INFORMATION THEORY
20
18
16
14
12
Exact
C Bound
10
2
-10 -5 0 5 10 15 20
Eb/No, dB
Figure 7.5. Capacity for the channel, compared with the lower bound,
as a function of the SNR (Eb /N0 ), for M = 20 and sequence length
N = 100.
20
18
16
14 N=200
12 N=100
C
10 N=80
N=60
8
N=40
6
N=20
4
2
-10 -5 0 5 10 15 20
Eb/No, dB
be noticed that the capacity of the system is achieved for a sequence length
of N = 100 and N = 200, for M = 10 and M = 20, respectively.
In Figure 7.8, the capacity is shown in terms of the number of users,
for a given SNR (10 dB), and two values of the processing gain N = 100
and N = 200 (upper curve). This result shows that the capacity for the
binary case, in bits per channel use, is numerically equal to the number
of users, as long as this number is small compared to the processing gain.
Any increase in the number of users above M = 0.2N produces a transmis-
sion rate that falls short of the system capacity. This is in agreement with
previous results (Turin 1984b; Pickholtz, Schilling, and Milstein 1982),
contradicting (Hui 1984b), which suggested that using long PN sequences
would decrease the sum capacity. Figure 7.9 illustrates the previous point,
by showing a decrease in the capacity with an increase in the number of
users, for a fixed value of N . It is important to mention that if both M and
N increase to infinity, the limiting capacity will be given by log e/2, as
predicted in Chapter 2 (Sousa 1989).
The importance of finding a lower bound on the capacity is due to the
nature of the noncooperativeness of CDMA. In this case, the lower bound
108 • INFORMATION THEORY
M=100
80
M=80
60
M=60
C
40
M=40
20
M=20
M=10
0
Figure 7.7. Capacity for the channel, using the approximate formula, as
a function of the sequence length (N ), for different values of M .
represents the worst possible scenario for the system, which is incidentally
close to reality.
lim C = M . (7.78)
N →∞
lim C = M . (7.79)
Eb
N0 →∞
Finally, if the number of users increases without bound the sum capacity
displays a peculiar behavior. If M ≥ 36N
lim C = 0 (7.80)
M →∞
otherwise
lim C = ∞. (7.81)
M →∞
THE CAPACITY OF A CDMA SYSTEM • 109
160
140
120
100
N=200
C 80
N=100
60
40
20
0
20 40 60 80 100 120 140 160 180 200
M
Figure 7.8. Capacity for the channel, as a function of the number of users
(M ), using the approximation for the capacity.
The relation between the number of users and the sequence length,
χ = M /N , has been explored in a recent paper (Sousa 1989). It is remark-
able that χ = 36 sets a threshold for the capacity, as the number of users
increases to infinity. This channel behavior demands further study. It is
rather interesting to observe that the capacity can be achieved, for M → ∞,
as long as the sequence length increases accordingly.
It is useful to express the sum capacity in terms of the error probabil-
ity. A complete analysis of a DSSS with DPSK modulation, in terms of
probability of error, can be found in Kavehrad and Ramamurthi (1987) or
in Geraniotis and Pursley (1986). In the following, a simplified analysis is
pursued. This can be done, by rewriting the formula for the probability of
error for the DPSK system, considering now the effect of multiple access
interference (Turin 1984b).
1 1 M −1 N0 −1
Pe = exp − + (7.82)
2 2 3N 2Eb
solving the expression for Eb /N0
−1
1 M −1 N0
ln 2Pe = − + (7.83)
2 3N 2Eb
110 • INFORMATION THEORY
300
250
200
N=200
C 150
N=100
100
50
0
500 1000 1500 2000 2500 3000 3500
Figure 7.9. Capacity for the channel, as a function of the number of users
(M ), including the case M N .
and finally
−1 −1
Eb 1 2(M − 1)
= ln − . (7.84)
N0 2Pe 3N
Substituting this last result into the formula for the sum capacity one
finds the relation between the capacity, in bits per channel use, and the
probability of error for high values of the signal to noise error. This relation
is expressed in the following formula
2πM π 2 1
C ≥ M − log e 2 √ 1 − erf 2 ln (7.85)
π −8 2 π 2Pe
8−π 2 1
ln 2P1e
×e π 2
− 2 1 − erf ln ,
2Pe
100
99.5
99
C
98.5
98
97.5
for a given probability of error, which also implies the worst case for the
system operation.
In order to conclude this section, it is useful to express Equation 7.77 in a
more simplified form, as an exponential bound, to sum up the results from
the previous section. It is not difficult to recognize in Equation 7.77 the
complementary error function, erfc(x) = 1 − erf (x), which can be approx-
imated by (Schwartz 1970)
e−x
2
erfc(x) ≈ √ , x 1. (7.86)
x π
20
18
16
14
12 Exact
C Bound
10
0 5 10 15 20
Eb/No, dB
20 a
b
0 c
d
e
-20
a: Exact
-40 b: Bound 4
C c: Bound 2
-60 d: Bound 1
e: Bound 3
-80
-100
-120
0 5 10 15 20
Eb/No, dB
Figure 7.12. Bounds on the capacity, as a function of the SNR (Eb /N0 ).
1 1 −Eb /σI
C ≈ −M log + e , (7.88)
2 2
M Eb M Eb M 5π e
C≥ log 1 + − log 1 + − log (7.89)
2 σI 2 5σI 2 24
and
3π Eb −1/2
C ≥ M − Me−3Eb /4σI
4σI
−1/2
−3Eb /4σI 3π Eb
− Mh e , (7.90)
4σI
114 • INFORMATION THEORY
.8
.6 Numerical Solution
Equation 11
C Equation 12
Equation 13
.4 Equation 14
.2
0 5 10 15 20 25 30
Eb/No, dB
M 6NEb /N0
C≥ log 1 +
2 2(M − 1)Eb /N0 + 3N
M 1 6NEb /N0 (7.92)
− log 1 +
2 5 2(M − 1)Eb /N0 + 3N
M 5π e
− log
2 24
THE CAPACITY OF A CDMA SYSTEM • 115
and
−1/2
− 34
6NEb /N0
3π 6NEb /N0
C ≥ M − Me 2(M −1)Eb /N0 +3N
4 2(M − 1)Eb /N0 + 3N
6NEb /N0
−3
− Mh e 4 2(M −1)Eb /N0 +3N
−1/2 ⎤
3π 6NEb /N0 ⎦
× (7.93)
4 2(M − 1)Eb /N0 + 3N
Equation 7.77 compares favorably with respect to the former bounds and
approximations represented by Equations 7.91, 7.92, and 7.93. In fact,
Equation 7.77 fits the exact solution very well, even for SNRs as low
as 0 dB.
The objective of this chapter was the evaluation of the sum capacity of
a CDMA system with a fixed number of users. A more realistic analysis
of a CDMA system should include a variable allocation of users (Alencar
and Blake 1993a).
New approaches, for the analysis of CDMA systems, were introduced
in this chapter. One of the important new results is related to the evaluation
of the sum capacity for the CDMA channel. Also, new limits were found
with explicit formulas for the upper and lower bounds on the capacity of
the channel in terms of the probability of error. A tight bound was found,
that allows a computer efficient evaluation of the capacity.
CHAPTER 8
THEORETICAL CRYPTOGRAPHY
8.1 INTRODUCTION
Cryptography is the basis for network security. It comes from the Greek
word kryptos, for tomb, hidden, or secret, combined with graphein, or
writing. A free translation is hidden writing, which defines the proposal of
cyphering a message. Cryptology, combines the same previous prefix with
logia, meaning study, to give the translation of study of secret coding.
Cryptography is the practice and study of techniques for secure commu-
nication in the presence of third parties. It is also a collection of techniques
for construction and analysis of protocols to overcome the influence of jam-
mers and which are related to various aspects of information security, such
as data confidentiality, data integrity, authentication, and nonrepudiation.
Along the centuries, governments used cryptography to cypher mes-
sages, supposed to remain in secrecy, while the spies tried to decipher them.
During the wars, cryptography becomes more important, considering the
nature of the operations.
Bletchley Park was the secret information and counter-information cen-
ter in Great Britain during the Second World War. It was so secret that it
remained as a legend after the end of the war, while England continued
to break the codes of other countries, either enemies or friends (Haykin
1999).
The first computer, the Collossus, was built in Bletchley Park, devel-
oped by Alan Mathison Turing (1912–1954), one of the greatest computer
geniuses of the World. Destroyed after the end of the war, the Collosus was
rebuilt from the original drawings.
The German Enigma code was broken there with the help of Turing,
who benefited from the information passed by a Hungarian mathematician
118 • INFORMATION THEORY
Cryptanalysis
Key generation
are passive, and the information is only monitored. Other attacks are active,
and the information is altered to corrupt or destroy the data or the network
itself.
M
H (K) = − p(km ) log p(km ), (8.4)
m=1
N
H (Y ) = − p(yn ) log p(yn ), (8.5)
n=1
M
N
H (K|Y ) = p(km , yn ) log p(km |ym ). (8.6)
m=1 n=1
M
L
N
H (K|X , Y ) = p(km , xl , yn ) log p(km |xl , ym ), (8.8)
m=1 l=1 n=1
L
N
M
H (X |Y , K) = p(xm , yl , kn ) log p(xm |yl , km ). (8.9)
l=1 n=1 m=1
122 • INFORMATION THEORY
As expected, when the ciphertext and the key are available it is possible
to retrieve the plaintext correctly, because the uncertainty with respect to
X is null. There is no loss of information whatsoever at the receiver, and
the full original message is recovered.
H (X , Y , K) = H (X |Y , K) + H (Y , K) = H (K|X , Y ) + H (X , Y ). (8.11)
H (Y , K) = H (K|Y ) + H (Y ), (8.12)
and
H (X , Y ) = H (X |Y ) + H (Y ). (8.13)
which leads to
H (K|X , Y ) = H (K|Y ) − H (X |Y ). (8.16)
THEORETICAL CRYPTOGRAPHY • 123
I (X ; Y ) = H (X ) − H (X |Y ) = H (Y ) − H (Y |X ). (8.17)
H (X |Y ) = H (X ),
and the mutual information between the plaintext and the encoded mes-
sage is zero, or I (X ; Y ) = 0. This is referred to as the absolutely secure
cryptosystem.
It is possible to obtain a lower limit for the mutual information between
the plaintext and the ciphertext. First, consider Equation 8.16, and since
H (K|X , Y ) ≥ 0, it follows that
H (K|Y ) ≥ H (X |Y ). (8.18)
and therefore,
H (K) ≥ H (X |Y ). (8.20)
Substituting this last inequality into Equation 8.17 gives
I (X ; Y ) ≥ H (X ) − H (K). (8.21)
124 • INFORMATION THEORY
H (K) ≥ H (X ). (8.22)
M
H (K) = − p(km ) log p(km ) = log (M ), (8.23)
m=1
L
H (X ) = − p(yl ) log p(yl ) = log (L). (8.24)
l=1
Therefore,
PROBABILITY THEORY
There are two really important relations in set theory: the belonging
relation, denoted as a ∈ A, in which a is an element of the set A, and the
inclusion relation, A ⊂ B, which is read “A is a subset of the set B”, or B
is a super set of the set A. Sets may also be specified by propositions. For
example, the empty set can be written as ∅ = {a | a = a}, that is, the set in
which the elements are different from themselves.
A universal set contains all other sets of interest. An example of a uni-
versal set is provided by the sample space in probability theory, usually
denoted as S or . The empty set is that set which contains no element,
usually denoted as ∅ or { }. It is implicit that the empty set is contained in
any set, that is, ∅ ⊂ A, for any given set A. However, the empty set is not in
general an element of any other set.
A usual way to represent sets is by means of the Venn diagram, as
illustrated in Figure A.1.
Sets are said to be disjoint if they have no element in common, as
illustrated in Figure A.2. Thus, for example, the set of even natural numbers
and the set of odd natural numbers are disjoint.
PROBABILITY THEORY • 127
A A∩B B
A B
• Idempotent
A ∪ A = A, A ∩ A = A
128 • INFORMATION THEORY
Ai A
A Ai
• Associative
(A ∪ B) ∪ C = A ∪ (B ∪ C), (A ∩ B) ∩ C = A ∩ (B ∩ C)
• Commutative
A ∪ B = B ∪ A, A ∩ B = B ∩ A
• Distributive
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C),
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
• Identity
A ∪ ∅ = A, A ∩ U = A
A ∪ U = U, A ∩ ∅ = ∅
• Complementary
A ∪ A = U , A ∩ A = ∅ (A) = A
U = ∅, ∅ = U
• de Morgan laws
A ∪ B = A ∩ B, A ∩ B = A ∪ B
For the construction of an algebra of sets or, equivalently, for the con-
struction of a field over which operations involving sets make sense, a few
properties have to be obeyed.
The properties guarantee the closure of the algebra with respect to finite
operations over sets. It is noticed that the universal set always belongs to
the algebra, that is, ∈ F, because = A ∪ A. The empty set also belongs
to the algebra, that is, ∅ ∈ F, since ∅ = , follows by property 1.
Example: the family {∅, } complies with the above properties and there-
fore represents an algebra. In this case, ∅ = {} and ∅ = . The union is also
represented, as can be easily checked.
Example: given the sets {CH } and {CT }, representing the faces of a coin,
respectively, if {CH } ∈ F then {CH } = {CT } ∈ F. It follows that {CH , CT } ∈
F ⇒ ∈ F ⇒ ∅ ∈ F.
130 • INFORMATION THEORY
• I = {1, . . . , k} ⇒ CI = k
• N = {0, 1, . . .} ⇒ CN or ℵ0
• Z = {. . . , −2, −1, 0, 1, 2, . . .} ⇒ CZ
• Q = {. . . , −1/3, 0, 1/3, 1/2, . . .} ⇒ CQ
• R = ( − ∞, ∞) ⇒ CR or ℵ
For the above examples the following relations are verified: CR > CQ =
CZ = CN > CI . The notation ℵ0 , for the cardinality of the set of natural
numbers was first employed by Cantor.
The cardinality of the power set, that is, of the family of sets consisting
of all subsets of a given set I , F = 2I , is 2CI .
The Borel algebra, established by Félix Edouard Juston Émile Borel (1871–
1956), and written as B, or σ -algebra, is an extension of the algebra so far
discussed to operate with limits at infinity. The following properties are
required from a σ -algebra.
1. A ∈ B ⇒ A ∈ B
!
2. Ai ∈ B ⇒ ∞ i=1 Ai ∈ B
A∈B and B ∈ B ⇒ A ∪ B ∈ B,
and
A∈B and B ∈ B ⇒ A ∪ B ∈ B,
PROBABILITY THEORY • 131
and finally
A ∪ B ∈ B ⇒ A ∩ B ∈ B.
In summary, any combination of unions and intersections of sets belongs
to the Borel algebra. In other words, operations of union or intersection of
sets, or a combination of these operations, produce a set that belongs to the
σ -algebra.
∅ ∪ = ,
In the case of sets A and B which are not disjoint, it follows that
B Ai
P(A)
• If A ⊂ B, then P(A|B) = P(B) ≥ P(A).
• If A and B are independent events then P(A ∩ B) = P(A) · P(B).
• If P(A) = 0 or P(A) = 1, then event A is independent of itself.
• If P(B) = 0, then P(A|B) can assume any arbitrary value. Usually
in this case one assumes P(A|B) = P(A).
• If events A and B are disjoint, and nonempty, then they are depen-
dent.
B = B ∩ = B ∩ ∪M
i=1 Ai = ∪i=1 B ∩ Ai .
N
N
P(B) = P( ∪N
i=1 B ∩ Ai ) = P(B ∩ Ai ),
i=1
P(Ai ∩ B) P(B|Ai ) · P(Ai ) P(B|Ai ) · P(Ai )
P(Ai |B) = = N = N .
i=1 P(B ∩ Ai ) i=1 P(B|Ai ) · P(Ai )
P(B)
Consider that X and Y represent a pair of real random variables, with joint
pdf pXY (x, y), as illustrated in Figure A.6. The probability of x and y being
simultaneously in the region defined by the polygon [abcd] is given by the
expression (Alencar 2014).
b d
Prob(a < x < b, c < y < d) = pXY (x, y)dxdy. (A.15)
a c
and +∞
pY (y) = pXY (x, y)dx. (A.17)
−∞
PROBABILITY THEORY • 137
pXY(x,y)
0 X
or ∞
pZ (z) = pX (z − ρ)pY (ρ)dρ. (A.23)
−∞
138 • INFORMATION THEORY
139
140 • REFERENCES
Alencar, M.S., and I.F. Blake. 1993c. “The Capacity of a Code Division
Multiple Access Channel.” In 1993 International Symposium on 139
Communications – ISCOM’93, Vol.1, pp. 1.1–1.9. Hsinchu,Taiwan.
Ash, R.B. 1965. Information Theory. New York: Dover Publications Inc.
Bierbaum, M., and H.-M. Wallmeier. 1979. “A Note on the Capacity Region
of the Multiple Access Channel.” IEEE Transactions on Information
Theory 25, no. 4, p. 484. doi: http://dx.doi.org/10.1109/tit.1979.
1056064
Blahut, R.E. 1987. Principles and Practice of InformationTheory. Reading,
MA: Addison-Wesley Publishing Co.
Blahut, R.E. 1990. Digital Transmission of Information. Reading, MA:
Addison-Wesley Publishing Co.
Blake, I.F. 1987. An Introduction to Applied Probability. Malabar, FL:
Robert E. Krieger Publishing Co.
Blake, I.F., and R.C. Mullin. 1976. An Introduction to Algebraic and Com-
binatorial Coding Theory. New York: Academic Press Inc.
Borth, D.E., and M.B. Pursley. 1979. “Analysis of Direct-Sequence Spread-
Spectrum Multiple-Access Communications OverRician Fading Chan-
nels.” IEEE Transactions on Communications 27, no. 10, pp. 1566–77.
doi: http://dx.doi.org/10.1109/tcom.1979.1094291
Boyer, C. 1974. História da Matemática. São Paulo, Brasil: Editora Edgard
Blucher Ltda.
Brady, D.P. 1991. “A Semiclassical Analysis of Optical Code Division
Multiple Access.” IEEE Transactions on Communications 39, no. 1,
pp. 85–93. doi: http://dx.doi.org/10.1109/26.68279
Cheng, R.S., and S. Verdú. 1991. “The Effect of Asynchronism on the
Total Capacity of Gaussian Multiple-access Channels.” In Proceedings
of the IEEE International Symposium on Information Theory, p. 211.
Budapest, Hungary.
Chung, H., and P.V. Kumar. 1990. “Optical Orthogonal Codes – New
Bounds and an Optimal Construction.” IEEE Transactions on Infor-
mation Theory 36, no. 4, pp. 866–73. doi: http://dx.doi.org/10.1109/
18.53748
Cook, C.E., F.W. Ellersick, L.B. Milstein, and D.L. Schilling. 1983. Spread-
Spectrum Communications. New York: IEEE Press.
Cook, C.E., and H.S. Marsh. 1983. “An Introduction to Spread Spec-
trum.” IEEE Communications Magazine 21, no. 2, pp. 8–16. doi: http://
dx.doi.org/10.1109/mcom.1983.1091346
Cover, T.M. 1972. “Broadcast Channels.” IEEE Transactions on Infor-
mation Theory 18, no. 1, pp. 2–14. doi: http://dx.doi.org/10.1109/tit.
1972.1054727
REFERENCES • 141
Shamai, S., L.H. Ozarow, and A.D. Wyner. 1991. “Information Rates for
a Discrete-Time Gaussian Channel with Intersymbol Interference and
Stationary Inputs.” IEEE Transactions on Information Theory 37, no. 6,
pp. 1527–39.
Simon, M.K., J.K. Omura, R.A. Scholtz, and B.K. Levitt. 1985. Vol. 1.
Spread Spectrum Communications. Rockville, MD: Computer Science
Press.
Sommer, R.C. 1966. “Asynchronously Multiplexed Channel Capacity.”
Proceeding of the IEEE 54, no. 1, pp. 79–80.
Soroushnejad, M., and E. Geraniotis. 1991. “Probability of Capture and
Rejection of Primary Multiple-Access Interference in Spread-
Spectrum Networks.” IEEE Transactions on Communications 39, no. 6,
pp. 986–94.
Sousa, E.S. 1989. “Throughput of Spread-Spectrum Systems with a Large
Number of Users.” IEE Proceedings 136, no. 3, pp. 220–26.
Sousa, E.S. 1990. “Interference Modeling in a Direct-Sequence Spread-
Spectrum Packet Radio Network.” IEEE Transactions on Communica-
tions 38, no. 9, pp. 1475–82.
Sousa, E.S. 1992. “Performance of a Spread Spectrum Packet Radio
Network Link in a Poisson Field of Interferers.” IEEE Transactions
on Information Theory 38, no. 6, pp. 1743–54.
Stallings, W. 1999. Cryptography and Network Security—Principles and
Practice. Upper Saddle River, NJ: Prentice Hall.
Tanenbaum, A.S. 2003. Computer Networks. Englewood Cliffs, NJ:
Prentice-Hall, PTR.
Turin, G.L. 1980. “Introduction to Spread-Spectrum Antimultipath Tech-
niques and Their Application to Urban Digital Radio.” Proceedings of
the IEEE 68, no. 3, pp. 328–53.
Turin, G.L. 1984a. “Commutation Signaling—An Antimultipath Tech-
nique.” IEEE Journal on Selected Areas in Communications 2, no. 4,
pp. 548–62.
Turin, G.L. 1984b. “The Effects of Multipath and Fading on the Perfor-
mance of Direct-Sequence CDMA Systems.” IEEE Journal on Selected
Areas in Communications 2, no. 4, pp. 597–603.
Van der Lubbe, J.C.A. 1997. Information Theory. Cambridge, UK: Cam-
bridge University Press.
Verdú, S. 1986. “Minimum Probability of Error for Asynchronous
Gaussian Multiple-Access Channels.” IEEE Transactions on Informa-
tion Theory 32, no. 1, pp. 85–96.
Verdú, S. 1989a. “Multiple-Access Channels with Memory with and with-
out Frame Synchronism.” IEEE Transactions on Information Theory 35,
no. 3, pp. 605–19.
146 • REFERENCES
147
INDEX
A bound, 107
Absolutely secure event, 124 channel, 41
Algebra, 127, 129 sum, 94
Borel, 130 Cardinal
closure, 129 number, 130
Algebraic coding theory, 77 numbers, 125
Alzheimer, Aloysius, 32 Cardinality, 130
Appearance Carrier suppression, 80
equivocation, 121
CDMA, 72, 76, 87
Asymmetric key, 120
non-cooperative, 89
Axiom
performance analysis, 76,
choice, 126
90
Peano, 126
Channel
specification, 126
binary symmetric, 44
Axioms, 126
Ayer, Alfred Jules, 1 communication, 34
discrete, 43
B independent output, 36
Belonging, 126 noiseless, 35
Binary noisy, 43
Huffman, 27 non-cooperative, 89
Bit, 17 Ciphertext
Bletchley Park, 117 entropy, 121
Borel Clausius, Rudolph, 31
algebra, 130
Closure, 129
Borel, Félix Edouard Juston
CMOS, 17
Émile, 130
Code
BSC, 44
chip rate, 88
C classification, 19
Cantor, Georg, 125 comma, 21
Capacity decision tree, 14
149
150 • INDEX
channel, 34 M
entropy, 2, 3 MAI, 78
joint, 32 Matched filter, 91
measure of, 2 Maximal length
mutual, 38 linear sequence, 80
semantics, 1 sequence, 79
theory, 1 Maximum likelihood receiver,
Internet 91
backbone, 118 Measure, 126
protocol, 118 Lebesgue, 40
probability, 126
J Message
Jammer, 74 equivocation, 121
broad band noise, 74 uncertainty, 121
partial band, 74 Moments, 134
partial time, 74 Multipath, 72, 88
repeater, 74 Multiple access, 71
tone, 74 DSSS, 78
Jamming, 72 interference, 78
Joint entropy
cryptosystem, 122 N
Joint random Variables, 136 Non-cooperative channel, 89
Non-repudiation, 118
K Nyquist, Harry, 1, 31
Kasami
sequence, 84 P
Key Password, 118
entropy, 120 PCN, 87
equivocation, 121 Peano
Khinchin, Aleksandr axiom, 126
Yakovlevich, 2 Peirce, Charles Sanders, 1
Kolmogorov, Andrei Personal communications, 71
Nikolaevich, 31, 131 Photon, 17
Kraft Pierce, John Robinson, 1
inequality, 24 Plaintext
Kraft-McMillan entropy, 120
inequality, 16 Prefix
Kronecker, Leopold, 125 code, 14, 16
Probability, 126
L joint random variables, 136
Laser, 18 moments, 134
Lebesgue, Henri Léon, 40 random variables, 133
152 • INDEX
Transformation Venn
cryptography, 119 diagram, 126
Transposition, 120 Venn diagram, 126
TTL, 17
Turing, Alan Mathison, 117 W
Welsh bound, 82
U
Universal, 126 Z
University of Halle, 125 Zadeh, Lotfi Asker, 2
Zenon, 125
V Zorn
Vectorial lemma, 126
space, 74 Zorn’s lemma, 126
FORTHCOMING TITLES IN OUR COMMUNICATIONS AND
SIGNAL PROCESSING COLLECTION
Orlando Baiocchi, University of Washington Tacoma, Editor
Signal Integrity: The Art of Interconnect Design For High Speed and
High Reliability Circuits by Joel Jorgenson
Momentum Press publishes several other collections, including: Industrial, Systems, and
Innovation Engineering; Manufacturing and Processes; Engineering Management; Electrical
Power; Fluid Mechanics; Acoustical Engineering; Aerospace Engineering; Biomedical
Engineering; and Healthcare Administration
Momentum Press is actively seeking collection editors as well as authors. For more
information about becoming an MP author or collection editor, please visit
http://www.momentumpress.net/contact
The Momentum Press digital library is very affordable, with no obligation to buy in future years.
For more information, please visit www.momentumpress.net/library or to set up a trial in the US,
please contact [email protected].
EBOOKS Information Theory
ALENCAR
FOR THE Marcelo S. Alencar
COMMUNICATIONS AND SIGNAL
ENGINEERING Information Theory covers the historical evolution of infor- PROCESSING COLLECTION
LIBRARY mation theory, along with the basic concepts linked to Orlando R. Baiocchi, Editor
information. It discusses the information associated to a cer-
Create your own
tain source and the usual types of source codes, information
Customized Content
transmission, joint information, conditional entropy, mutual
Bundle—the more
Information
information, and channel capacity.
books you buy,
The hot topic of multiple access systems for cooperative
the greater your
and noncooperative channels is also discussed, along with
discount!
code division multiple access (CDMA), the basic block of
most cellular and personal communication systems, and
THE CONTENT
Theory
the capacity of a CDMA system. Also presented is the
• Manufacturing
information theoretical aspects of cryptography, which
Engineering
are important for network security, a topic intrinsically
• Mechanical
Information Theory
connected to computer networks and the Internet.
& Chemical
Engineering To help the reader understand the theory, the text also
• Materials Science includes a review of probability theory, solved problems,
& Engineering illustrations, and graphics.
• Civil &
Marcelo Sampaio de Alencar received his bachelor‘s
Environmental
degree in Electrical Engineering from Federal University of
Engineering
Pernambuco (UFPE), Brazil in 1980; his master’s degree from
• Advanced Energy
Federal University of Paraiba (UFPB), Brazil in 1988; and
Technologies
his PhD from the University of Waterloo, Canada in 1993.
He is currently emeritus member of the Brazilian Tele-
THE TERMS
communications Society (SBrT), IEEE senior member, chair
• Perpetual access for
professor at the Department of Electrical Engineering,
a one time fee
Federal University of Campina Grande, Brazil. He is founder
• No subscriptions or
access fees and president of the Institute for Advanced Studies in
• Unlimited Communications (Iecom), and has been awarded several
concurrent usage scholarships and grants including three scholarships and
• Downloadable PDFs several research grants from the Brazilian National Council
• Free MARC records for Scientific and Technological Research (CNPq), two
grants from the IEEE Foundation, a scholarship from the
For further information, University of Waterloo, a scholarship from the Federal
a free trial, or to order,
contact:
University of Paraiba, and many others. He has published
over 350 engineering and scientific papers and 15 books.
Marcelo S. Alencar
[email protected]
ISBN: 978-1-60650-528-1