100% found this document useful (7 votes)

793 views

Information Theory Text Book

A complete book for concept information coding

Uploaded by

vipin chandran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (7 votes)

793 views

Information Theory Text Book

A complete book for concept information coding

Uploaded by

vipin chandran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 178

EBOOKS Information Theory

ALENCAR
FOR THE Marcelo S. Alencar
COMMUNICATIONS AND SIGNAL
ENGINEERING Information Theory covers the historical evolution of infor- PROCESSING COLLECTION
LIBRARY mation theory, along with the basic concepts linked to Orlando R. Baiocchi, Editor
information. It discusses the information associated to a cer-
Create your own
tain source and the usual types of source codes, information
Customized Content
transmission, joint information, conditional entropy, mutual
Bundle—the more

Information
information, and channel capacity.
books you buy,
The hot topic of multiple access systems for cooperative
the greater your
and noncooperative channels is also discussed, along with
discount!
code division multiple access (CDMA), the basic block of
most cellular and personal communication systems, and
THE CONTENT

Theory
the capacity of a CDMA system. Also presented is the
• Manufacturing
information theoretical aspects of cryptography, which
Engineering
are important for network security, a topic intrinsically
• Mechanical

Information Theory
connected to computer networks and the Internet.
& Chemical
Engineering To help the reader understand the theory, the text also
• Materials Science includes a review of probability theory, solved problems,
& Engineering illustrations, and graphics.
• Civil &
Marcelo Sampaio de Alencar received his bachelor‘s
Environmental
degree in Electrical Engineering from Federal University of
Engineering
Pernambuco (UFPE), Brazil in 1980; his master’s degree from
• Advanced Energy
Federal University of Paraiba (UFPB), Brazil in 1988; and
Technologies
his PhD from the University of Waterloo, Canada in 1993.
He is currently emeritus member of the Brazilian Tele-
THE TERMS
communications Society (SBrT), IEEE senior member, chair
• Perpetual access for
professor at the Department of Electrical Engineering,
a one time fee
Federal University of Campina Grande, Brazil. He is founder
• No subscriptions or
access fees and president of the Institute for Advanced Studies in
• Unlimited Communications (Iecom), and has been awarded several
concurrent usage scholarships and grants including three scholarships and
• Downloadable PDFs several research grants from the Brazilian National Council
• Free MARC records for Scientific and Technological Research (CNPq), two
grants from the IEEE Foundation, a scholarship from the
For further information, University of Waterloo, a scholarship from the Federal
a free trial, or to order,
contact:
University of Paraiba, and many others. He has published
over 350 engineering and scientific papers and 15 books.
Marcelo S. Alencar
[email protected]

ISBN: 978-1-60650-528-1
INFORMATION
THEORY
INFORMATION
THEORY

MARCELO S. ALENCAR

MOMENTUM PRESS, LLC, NEW YORK

Information Theory
Copyright © Momentum Press , LLC, 2015 .
All rights reserved. No part of this publication may be reproduced, stored
in a retrieval system, or transmitted in any form or by any means—
electronic, mechanical, photocopy, recording, or any other—except for
brief quotations, not to exceed 400 words, without the prior permission
of the publisher.

First published by Momentum Press , LLC

222 East 46th Street, New York, NY 10017
www.momentumpress.net

ISBN-13: 978-1-60650-528-1 (print)

ISBN-13: 978-1-60650-529-8 (e-book)

Momentum Press Communications and Signal Processing Collection

DOI: 10.5643/9781606505298

Cover and interior design by Exeter Premedia Services Private Ltd.,

Chennai, India

10 9 8 7 6 5 4 3 2 1

Printed in the United States of America

This book is dedicated to my family.
ABSTRACT

The book presents the historical evolution of Information Theory, along

with the basic concepts linked to information. It discusses the informa-
tion associated to a certain source and the usual types of source codes, the
information transmission, joint information, conditional entropy, mutual
information, and channel capacity. The hot topic of multiple access sys-
tems, for cooperative and noncooperative channels, is discussed, along
with code division multiple access (CDMA), the basic block of most cel-
lular and personal communication systems, and the capacity of a CDMA
system. The information theoretical aspects of cryptography, which are
important for network security, a topic intrinsically connected to computer
networks and the Internet, are also presented. The book includes a review
of probability theory, solved problems, illustrations, and graphics to help
the reader understand the theory.

KEY WORDS
Code division multiple access, coding theory, cryptography, information
theory, multiple access systems, network security

vii
CONTENTS

List of Figures xi
List of Tables xv
Preface xvii
Acknowledgments xix

1. Information Theory 1
1.1 Information Measurement 2
1.2 Requirements for an Information Metric 4

2. Sources of Information 11
2.1 Source Coding 11
2.2 Extension of a Memoryless Discrete Source 12
2.3 Preﬁx Codes 14
2.4 The Information Unit 17

3. Source Coding 19
3.1 Types of Source Codes 19
3.2 Construction of Instantaneous Codes 23
3.3 Kraft Inequality 24
3.4 Huffman Code 27

4. Information Transmission 31
4.1 The Concept of Information Theory 32
4.2 Joint Information Measurement 32
4.3 Conditional Entropy 34
4.4 Model for a Communication Channel 34
4.5 Noiseless Channel 35
4.6 Channel with Independent Output and Input 36
4.7 Relations Between the Entropies 37
4.8 Mutual Information 38
4.9 Channel Capacity 41

ix
x • CONTENTS

5. Multiple Access Systems 49

5.1 Introduction 49
5.2 The Gaussian Multiple Access Channel 51
5.3 The Gaussian Channel with Rayleigh Fading 54
5.4 The Noncooperative Multiple Access Channel 59
5.5 Multiple Access in a Dynamic Environment 62
5.6 Analysis of the capacity for a Markovian Multiple Access
Channel 63

6. Code Division Multiple Access 71

6.1 Introduction 71
6.2 Fundamentals of Spread Spectrum Signals 74
6.3 Performance Analysis of CDMA Systems 76
6.4 Sequence Design 79

7. The Capacity of a CDMA System 87

7.1 Introduction 87
7.2 Analysis of a CDMA System with a Fixed Number of Users
and Small SNR 87
7.3 CDMA System with a Fixed Number of Users and High
SNR 97
7.4 A Tight Bound on the Capacity of a CDMA System 103

8. Theoretical Cryptography 117

8.1 Introduction 117
8.2 Cryptographic Aspects of Computer Networks 118
8.3 Principles of Cryptography 119
8.4 Information Theoretical Aspects of Cryptography 120
8.5 Mutual Information for Cryptosystems 123

Appendix A Probability Theory 125

A.1 Set Theory and Measure 125
A.2 Basic Probability Theory 131
A.3 Random Variables 133

References 139

About the Author 147

Index 149
LIST OF FIGURES

Figure 1.1. Graph of an information function 9

Figure 2.1. Source encoder 12
Figure 2.2. Decision tree for the code in Table 2.5 16
Figure 3.1. Classes of source codes 23
Figure 3.2. Probabilities in descending order for the Huffman
code 28
Figure 3.3. Huffman code. At each phase, the two least probable
symbols are combined 28
Figure 3.4. (a) Example of the Huffman coding algorithm to obtain
the codewords. (b) Resulting code. 29
Figure 4.1. Model for a communication channel 32
Figure 4.2. A probabilistic communication channel 33
Figure 4.3. Venn diagram corresponding to the relation between
the entropies 40
Figure 4.4. Memoryless binary symmetric channel 44
Figure 4.5. Graph for the capacity of the memoryless binary sym-
metric channel 46
Figure 4.6. Binary erasure channel 46
Figure 4.7. Graph of the capacity for the binary erasure channel 47
Figure 5.1. The multiple access channel 52
Figure 5.2. Capacity region for the Gaussian multiple access
channel, M = 2 54
Figure 5.3. Average and actual capacity, for γ = 0.5, 1.0, and 2.0 58
Figure 5.4. Capacity region for the noncooperative channel,
M =2 61
Figure 5.5. Markov model for the multiple access channel 64

xi
xii • LIST OF FIGURES

Figure 5.6. Capacity for the channel with Geometric accessibility,

as a function of the signal-to-noise ratio, for different
values of ρ 66
Figure 5.7. Capacity for the channel with Geometric accessibil-
ity, as a function of the utilization factor, for different
values of the signal-to-noise ratio 67
Figure 5.8. Capacity for the channel with Poisson accessibility, as
a function of the signal-to-noise ratio, for ρ = 0, 1, 2,
3, 4, 5, 6, 7, 8, 9 68
Figure 5.9. Capacity for the channel with Poisson accessibility, as
a function of the utilization factor, for some values of
the signal-to-noise ratio 69
Figure 6.1. Data signal and pseudo-noise sequence 72
Figure 6.2. Direct sequence spread spectrum system 73
Figure 6.3. Frequency hopped spread spectrum system 73
Figure 6.4. Spread spectrum using random time windows 73
Figure 6.5. Spectra of transmitted and received signals 75
Figure 6.6. Pseudo-noise sequence generator 82
Figure 6.7. Gold sequence generator 83
Figure 7.1. Capacity approximations for the channel, as a function
of the signal-to-noise ratio, for M = 500 and N = 100 96
Figure 7.2. Bound 1 on the capacity for the channel, as a function
of the signal-to-noise ratio (Eb /N0 ) 100
Figure 7.3. Bound 2 on the capacity for the channel, as a function
of the signal-to-noise ratio (Eb /N0 ) 101
Figure 7.4. Bound 3 on the capacity for the channel, as a function
of the signal-to-noise ratio (Eb /N0 ) 103
Figure 7.5. Capacity for the channel, compared with the lower
bound, as a function of the signal-to-noise ratio
(Eb /N0 ), for M = 20 and sequence length N = 100 106
Figure 7.6. Approximate capacity for the channel, as a function of
the signal-to-noise ratio (Eb /N0 ), for M = 20, having
N as a parameter 107
Figure 7.7. Capacity for the channel, using the approximate for-
mula, as a function of the sequence length (N ), for
different values of M 108
LIST OF FIGURES • xiii

Figure 7.8. Capacity for the channel, as a function of the number of

users (M ), using the approximation for the
capacity 109
Figure 7.9. Capacity for the channel, as a function of the number
of users (M ), including the case M N 110
Figure 7.10. Capacity for the channel, as a function of the proba-
bility of error (Pe ) 111
Figure 7.11. Bound 4 on the capacity for the channel, as a function
of the signal-to-noise ratio (Eb /N0 ) 112
Figure 7.12. Bounds on the capacity, as a function of the signal-to-
noise ratio (Eb /N0 ) 113
Figure 7.13. Comparison between the new and existing bounds,
as a function of the signal-to-noise ratio (Eb /N0 ), for
M = 20 and N = 100 114
Figure 8.1. General model for a cryptosystem 119
Figure A.1. A Venn diagram that represents two intersecting sets 127
Figure A.2. A Venn diagram representing disjoint sets 127
Figure A.3. Increasing sequence of sets 128
Figure A.4. Decreasing sequence of sets 128
Figure A.5. Partition of set B by a family of sets {Ai } 133
Figure A.6. Joint probability density function 137
LIST OF TABLES

Table 1.1. Symbol probabilities of a two-symbol source 4

Table 1.2. Identically distributed symbol probabilities 5
Table 1.3. Unequal symbol probabilities 5
Table 1.4. Symbol probabilities of a certain source 9
Table 2.1. A compact code 13
Table 2.2. A compact code for an extension of a source 14
Table 2.3. A prefix code for a given source 14
Table 2.4. A source code that is not prefix 15
Table 2.5. Example of a prefix code 15
Table 3.1. A binary block code 20
Table 3.2. A ternary block code 20
Table 3.3. A nonsingular block code 20
Table 3.4. A nonsingular block code 21
Table 3.5. The second extension of a block code 21
Table 3.6. Uniquely decodable codes. 22
Table 3.7. Another uniquely decodable code 22
Table 3.8. Selected binary codes 25
Table 3.9. Discrete source with five symbols and their probabilities 28
Table 3.10. Four distinct Huffman codes obtained for the source of
Table 3.9 30
Table 6.1. Number of maximal sequences 81
Table 6.2. Relative peak cross-correlations of m-sequences,
Gold sequences and Kasami sequences 83

xv
PREFACE

Information Theory is a classic topic in the educational market that evolved

from the amalgamation of different areas of Mathematics and Probability,
which includes set theory, developed by Georg Cantor, and measure theory,
fostered by Henri Lebesgue, as well as the axiomatic treatment of probabil-
ity by Andrei Kolmogorov in 1931, and finally the beautiful development
of Communication Theory by Claude Shannon in 1948.
InformationTheory is fundamental to several areas of knowledge, includ-
ing the Engineering, Computer Science, Mathematics, Physics, Sciences,
Economics, Social Sciences, and Social Communication. It is part of the
syllabus for most courses in Computer Science, Mathematics, and Engi-
neering.
For Electrical Engineering courses it is a pre-requisite to some disci-
plines, including communication systems, transmission techniques, error
control coding, estimation, and digital signal processing. This book is self-
contained, it is a reference and an introduction for graduate students who
did not have information theory before. It could also be used as an under-
graduate textbook. It is addressed to a large audience in Electrical and
Computer Engineering, and Mathematics and Applied Physics. The book’s
target audience is graduate students in these areas, who may not have taken
basic courses in specific topics, who can find a quick and concise way to
obtain the knowledge they need to succeed in advanced courses.

REASONS FOR WRITING THE BOOK

According to a study by the Institute of Electrical and Electronics Engi-
neers (IEEE), the companies, enterprises, and industry are in need of pro-
fessionals with a solid background on mathematics and sciences, instead
of the specialized professional of the previous century. The employment
market in this area is in demand of information technology professionals

xvii
xviii • PREFACE

and engineers who could afford to change and learn, as the market changes.
The market needs professionals who can model and design.
Few books have been published covering the subjects needed to under-
stand the very fundamental concepts of Information Theory. Most books,
which deal with the subject, are destined to very speciﬁc audiences.
The more mathematically oriented books are seldom used by people with
engineering, economics, or statistical background, because the authors are
more interested in theorems and related conditions than in fundamental
concepts and applications. The books written for engineers usually lack
the required rigour, or skip some important points in favor of simplicity
and conciseness.
The idea is to present a seamless connection between the more abstract
advanced set theory, the fundamental concepts from measure theory and
integration and probability, ﬁlling in the gaps from previous books and
leading to an interesting, robust, and, hopefully, self-contained exposition
of the Information Theory.

DESCRIPTION OF THE BOOK

The book begins with the historical evolution of Information Theory.
Chapter 1 deals with the basic concepts of information theory, and how
to measure information. The information associated to a certain source is
discussed in Chapter 2. The usual types of source codes are presented in
Chapter 3. Information transmission, joint information, conditional entropy,
mutual information, and channel capacity are the subject of Chapter 4. The
hot topic of multiple access systems, for cooperative and noncooperative
channels, is discussed in Chapter 5.
Chapter 6 presents code division multiple access (CDMA), the basic
block of most cellular and personal communication systems in operation.
The capacity of a CDMA system is the subject of Chapter 7. The infor-
mation theoretical aspects of cryptography are presented in Chapter 8,
which are important for network security, a topic intrinsically connected
to computer networks and the Internet. The appendix includes a review of
probability theory. Solved problems, illustrations, and graphics help the
reader understand the theory.
ACKNOWLEDGMENTS

The author is grateful to all the members of the Communications Research

Group, certiﬁed by the National Council for Scientiﬁc and Technological
Development (CNPq), at the Federal University of Campina Grande, for
their collaboration in many ways, helpful discussions and friendship, as
well as our colleagues at the Institute for Advanced Studies in Communi-
cations (Iecom).
The author also wishes to acknowledge the contribution of profes-
sors Francisco Madeiro, from the State University of Pernambuco, and
Waslon T. A. Lopes, from the Federal University of Campina Grande,
Brazil, to the chapter on source coding.
The author is also grateful to professor Valdemar Cardoso da Rocha Jr.,
from the Federal University of Pernambuco, Brazil, for technical commu-
nications, long-term cooperation, and useful discussions related to infor-
mation theory.
The author is indebted to his wife Silvana, sons Thiago and Raphael,
and daughter Marcella, for their patience and support during the course of
the preparation of this book.
The author is thankful to professor Orlando Baiocchi, from the Uni-
versity of Washington, Tacoma, USA, who strongly supported this project
from the beginning and helped with the reviewing process.
Finally, the author registers the support of Shoshanna Goldberg,
Destiny Hadley, Charlene Kronstedt, Jyothi, and Millicent Treloar from
Momentum Press, in the book preparation process.

xix
CHAPTER 1

INFORMATION THEORY

Information Theory is a branch of Probability Theory, which has applica-

tion and correlation with many areas, including communication systems,
communication theory, Physics, language and meaning, cybernetics, psy-
chology, art, and complexity theory (Pierce 1980). The basis for the theory
was established by Harry Theodor Nyqvist (1889–1976) (Nyquist 1924),
also known as Harry Nyquist, and Ralph Vinton Lyon Hartley (1888–1970),
who invented the Hartley oscillator (1928). They published the first arti-
cles on the subject, in which the factors that influenced the transmission of
information were discussed.
The seminal article by Claude E. Shannon (1916–2001) extended the
theory to include new factors, such as the noise effect in the channel and
the savings that could be obtained as a function of the statistical struc-
ture of the original message and the information receiver characteristics
(Shannon 1948). Shannon defined the fundamental communication prob-
lem as the possibility of, exactly or approximately, reproducing, at a certain
point, a message that has been chosen at another one.
The main semantic aspects of the communication, initially established
by Charles Sanders Peirce (1839–1914), a philosopher and creator of Semi-
otic Theory, are not relevant for the development of the Shannon informa-
tion theory. What is important is to consider that a particular message is
selected from a set of possible messages.
Of course, as mentioned by John Robinson Pierce (1910–2002), quoting
the philosopher Alfred Jules Ayer (1910–1989), it is possible to communi-
cate not only information, but knowledge, errors, opinions, ideas, experi-
ences, desires, commands, emotions, feelings. Heat and movement can be
communicated, as well as, force, weakness, and disease (Pierce 1980).
2 • INFORMATION THEORY

Hartley has found several reasons why the natural logarithm should
measure the information:

• It is a practical metric in Engineering, considering that various

parameters, such as time and bandwidth, are proportional to the
logarithm of the number of possibilities.
• From a mathematical point of view, it is as adequate measure,
because several limit operations are simply stated in terms of
logarithms.
• It has an intuitive appeal, as an adequate metric, because, for
instance, two binary symbols have four possibilities of occurrence.

The choice of the logarithm base defines the information unit. If base 2 is
used, the unit is the bit, an acronym suggested by John W. Tukey for binary
digit, which is a play of words that can also mean a piece of information.
The information transmission is informally given in bit(s), but a unit has
been proposed to pay tribute to the scientist who developed the concept, it
is called the shannon, or [Sh] for short. This has a direct correspondence
with the unit for frequency, hertz or [Hz], for cycles per second, which was
adopted by the International System of Units (SI).1
Aleksandr Yakovlevich Khinchin (1894–1959) put the Information
Theory in solid basis, with a more a precise and unified mathematical
discussion about the entropy concept, which supported Shannon’s intuitive
and practical view (Khinchin 1957).
The books by Robert B. Ash (1965) and Amiel Feinstein (1958) give
the mathematical reasons for the choice of the logarithm to measure infor-
mation, and the book by J. Aczél and Z. Daróczy (1975) presents several
of Shannon’s information measures and their characterization, as well as
Alfréd Rényi’s (1921–1970) entropy metric.
A discussion on generalized entropies can be found in the book edited
by Luigi M. Ricciardi (1990). Lotfi Asker Zadeh introduced the concept
of fuzzy set, and efficient tool to represent the behavior of systems that
depend on the perception and judgement of human beings, and applied it
to information measurement (Zadeh 1965).

1.1 INFORMATION MEASUREMENT

The objective of this section is to establish a measure for the information
content of a discrete system, using Probability Theory. Consider a discrete
random experiment, such as the occurrence of a symbol, and its associated
sample space , in which X is a real random variable (Reza 1961).
INFORMATION THEORY • 3

The random variable X can assume the following values

X = {x1 , x2 , . . . , xn },

N
in which xk = , (1.1)
k=1

with probabilities in the set P

P = {p1 , p2 , . . . , pn },

N
in which pk = 1. (1.2)
k=1

The information associated to a particular event is given by

1
I (xi ) = log , (1.3)
pi
because the sure event has probability one and zero information, by a
property of the logarithm, and the impossible event has zero probability
and inﬁnite information.

Example: suppose the sample space is partitioned into two equally prob-
able spaces. Then

I (x1 ) = I (x2 ) = − log 12 = 1bit, (1.4)

that is, the choice between two equally probable events requires one unit
of information, when a base 2 logarithm is used.
Considering the occurrence of 2N equiprobable symbols, then the self-
information of each event is given by

I (xk ) = − log pk = − log 2−N = N bits. (1.5)

It is possible to deﬁne the source entropy, H (X ), as the average infor-

mation, obtained by weighing of all the occurrences

N
H (X ) = E[I (xi )] = − pi log pi . (1.6)
i=1

Observe that Equation 1.6 is the weighing average of the logarithms

of the probabilities, in which the weights are the real values of the prob-
abilities of the random variable X , and this indicates that H (X ) can be
interpreted as the expected value of the random variable that assumes the
value log pi , with probability pi (Ash 1965).
4 • INFORMATION THEORY

Table 1.1. Symbol probabilities of a

two-symbol source

Symbol Probability
1
x1 4
3
x2 4

Example: consider a source that emits two symbols, with unequal proba-
bilities, given in Table 1.1.
The source entropy is calculated as
1 1 3 3
H (X ) = − log − log = 0.81 bits per symbol.
4 4 4 4

1.2 REQUIREMENTS FOR AN INFORMATION

METRIC
A few fundamental properties are necessary for the entropy in order to
obtain an axiomatic approach to base the information measurement (Reza
1961).

• If the event probabilities suffer a small change, the associated

measure must change in accordance, in a continuous manner, which
provides a physical meaning to the metric

H (p1 , p2 , . . . , pN ) is continuous in pk , k = 1, 2, . . . , N ,
0 ≤ pk ≤ 1. (1.7)

• The information measure must be symmetric in relation to the prob-

ability set P. That is, the entropy is invariant to the order of events.

H (p1 , p2 , p3 , . . . , pN ) = H (p1 , p3 , p2 , . . . , pN ). (1.8)

• The maximum of the entropy is obtained when the events are equally
probable. That is, when nothing is known about the set of events,
or about what message has been produced, the assumption of a
uniform distribution gives the highest information quantity, that
corresponds to the highest level of uncertainty.

1 1 1
Maximum of H (p1 , p2 , . . . , pN ) = H , ,..., . (1.9)
N N N
INFORMATION THEORY • 5

Table 1.2. Identically distributed

symbol probabilities

Symbol Probability
1
x1 4
1
x2 4
1
x3 4
1
x4 4

Table 1.3. Unequal symbol proba-

bilities

Symbol Probability
1
x1 2
1
x2 4
1
x3 8
1
x4 8

Example: consider two sources that emit four symbols. The ﬁrst
source symbols, shown in Table 1.2, have equal probabilities, and
the second source symbols, shown in Table 1.3, are produced with
unequal probabilities.
The mentioned property indicates that the ﬁrst source attains the
highest level of uncertainty, regardless of the probability values of
the second source, as long as they are different.
• Consider that and adequate measure for average uncertainty was
found H (p1 , p2 , . . . , pN ) associated to a set of events. Assume that
event {xN } is divided into M disjoint sets, with probabilities qk ,
such that
M
pN = qk , (1.10)
k=1

and the probabilities associated to the new events can be normalized

in such a way that
q1 q2 qm
+ + ··· + = 1. (1.11)
pn pn pn
6 • INFORMATION THEORY

Then, the creation of new events from the original set modiﬁes the
entropy to

H (p1 , p2 , . . . , pN −1 , q1 , q2 , . . . , qM ) = H (p1 , . . . , pN −1 , pN )

q 1 q2 qM
+ pN H , ,..., , (1.12)
pN pN pN
with

M
pN = qk .
k=1

It is possible to show that the function deﬁned by Equation 1.6 satisﬁes

all requirements. To demonstrate the continuity, it sufﬁces to do (Reza
1961)

H (p1 , p2 , . . . , pN ) = − [p1 log p1 + p2 log p2 + · · · + pN log pN ]

= − [p1 log p1 + p2 log p2 + · · · pN −1 log pN −1
+ (1 − p1 − p2 − · · · − pN −1 )
· log (1 − p1 − p2 − · · · − pN −1 )] . (1.13)

Notice that, for all independent random variables, the set of probabilities
p1 , p2 , . . . , pN −1 and also (1 − p1 − p2 − . . . − pN −1 ) are contiguous in
[0, 1], and that the logarithm of a continuous function is also continuous.
The entropy is clearly symmetric.
The maximum value property can be demonstrated, if one considers
that all probabilities are equal and that the entropy is maximized by that
condition
p1 = p2 = · · · = pN . (1.14)
Taking into account that according to intuition the uncertainty is
maximum for a system of equiprobable states, it is possible to arbitrar-
ily choose a random variable with probability pN depending on pk , and
k = 1, 2, . . . , N − 1. Taking the derivative of the entropy in terms of each
probability

dH
N
∂H ∂pi
=
dpk ∂pi ∂pk
i=1
d d ∂pN
=− (pk log pk ) − (pN log pN ) . (1.15)
dpk dpN ∂pk
But, probability pN can be written as

pN = 1 − (p1 + p2 + · · · + pk + · · · + pN −1 ). (1.16)
INFORMATION THEORY • 7

Therefore, the derivative of the entropy is

dH
= −( log2 e + log pk ) + ( log2 e + log pn ), (1.17)
dpk

that, using a property of logarithms, simpliﬁes to,

dH pk
= − log . (1.18)
dpk pn

But,
dH
= 0, which gives pk = pN . (1.19)
dpk
As pk was chosen in an arbitrary manner, one concludes that to obtain
a maximum for the entropy function, one must have

1
p1 = p2 = · · · = pN = . (1.20)
N

The maximum is guaranteed because

H (1, 0, 0, . . . , 0) = 0. (1.21)

On the other hand, for equiprobable events, it is possible to verify that

the entropy is always positive, for it attains its maximum at (Csiszár and
Kórner 1981)

1 1 1
H , ,..., = log N > 0. (1.22)
N N N
To prove additivity, it sufﬁces to use the deﬁnition of entropy,
computed for a two set partition, with probabilities {p1 , p2 , . . . , pN −1 } and
{q1 , q2 , . . . , qM },

H (p1 , p2 , . . . , pN −1 , q1 , q2 , . . . , qM )

N −1
M
=− pk log pk − qk log qk
k=1 k=1

N
M
=− pk log pk + pN log pN − qk log qk
k=1 k=1
= H (p1 , p2 , . . . , pN ) + pN log pN

M
− qk log qk . (1.23)
k=1
8 • INFORMATION THEORY

But, the second part of the last term can be written in a way to display
the importance of the entropy in the derivation

M
M
qk
M
pN log pN − qk log qk = pN log pN − qk log qk
pN
k=1 k=1 k=1

M
qk qk
= − pN log
pN pN
k=1

q1 q 2 qM
= pN H , ,..., , (1.24)
pN pN pN
and this demonstrates the mentioned property.
The entropy is non-negative, which guarantees that the partitioning of
one event into several other events does not reduce the system entropy, as
shown in the following

H (p1 , . . . , pN −1 , q1 , q2 , . . . , qM ) ≥ H (p1 , . . . , pN −1 , pN ), (1.25)

that is, if one splits a symbol into two or more, the entropy always increases,
and that is the physical origin of the word.

Example: consider a binary source, X , that emits symbols 0 and 1 with

probabilities p and q = 1 − p. The average information per symbol is given
by H (X ) = −p log p − q log q, that is known as entropy function.

H (p) = −p log p − (1 − p) log (1 − p). (1.26)

Example: for the binary source, consider that the symbol probabilities
are p = 1/8 and q = 7/8, and compute the entropy of the source.
The average information per symbol is given by

H (X ) = −1/8 log 1/8 − 7/8 log 7/8,

which gives H (X ) = 0.544.

Note that even though 1 bit is produced for each symbol, the actual
average information is 0.544 bits due to the unequal probabilities.
The entropy function has a maximum, when all symbols are equiprob-
able, of p = q = 1/2, for which the entropy is 1 bit/symbol. The function
attains a minimum of p = 0 or p = 1.
This function plays an essential role in determining the capacity of a
binary symmetric channel. Observe that the entropy function is concave,
that is
p1 + p2
H (p1 ) + H (p2 ) ≤ 2H . (1.27)
2
INFORMATION THEORY • 9

H(p)

0 p 1

Figure 1.1. Graph of an information function.

The entropy function is illustrated in Figure 1.1, in which it is possi-

ble to notice the symmetry, concavity, and the maximum for equiprobable
symbols. As consequence of the symmetry, the sample spaces with proba-
bility distributions obtained from permutations of a common distribution
provide the same information quantity (van der Lubbe 1997).

Example: consider a source that emits symbols from an alphabet X =

{x1 , x2 , x3 , x4 } with probabilities given in Table 1.4. What is the entropy of
this source?
The entropy is computed using Formula 1.6, for N = 4 symbols, as

4
H (X ) = − pi log pi ,
i=1
or
1 1 1 1 2 1
H (X ) = − log − log − log = 1.75 bits per symbol.
2 2 4 4 8 8

Table 1.4. Symbol probabilities of

a certain source

Symbol Probability
1
x1 2
1
x2 4
1
x3 8
1
x4 8
10 • INFORMATION THEORY

NOTES
1 The author of this book proposed the adoption of the shannon [Sh]
unit during the IEEE International Conference on Communications
(ICC 2001), in Helsinki, Finland, shortly after Shannon’s death.
CHAPTER 2

SOURCES OF INFORMATION

2.1 SOURCE CODING

The efﬁcient representation of data produced by a discrete source is called
source coding. For a source coder to obtain a good performance, it is nec-
essary to take the symbol statistics into account. If the symbol probabili-
ties are different, it is useful to assign short codewords to probable symbols
and long ones to infrequent symbols. This produces a variable length code,
such as the Morse code.
Two usual requirements to build an efﬁcient code are:

1. The codewords generated by the coder are binary.

2. The codewords are unequivocally decodable, and the original mes-
sage sequence can be reconstructed from the binary coded sequence.

Consider Figure 2.1, which shows a memoryless discrete source, whose

output xk is converted by the source coder into a sequence of 0s and 1s,
denoted by bk . Assume that the source alphabet has K different symbols,
and that the k-ary symbol, xk , occurs with the probability pk , k = 0, 1,
. . . , K − 1.
Let lk be the average length, measured in bits, of the binary word
assigned to symbol xk . The average length of the words produced by the
source coder is deﬁned as (Haykin 1988)

K
L= pk lk . (2.1)
k=1

The parameter L represents the average number of bits per symbol from
that is used in the source coding process. Let Lmin be the smallest possible
12 • INFORMATION THEORY

xk bk
Discrete memoryless Binary
Source encoder
source sequence

Figure 2.1. Source encoder.

value of L. The source coding efﬁciency is deﬁned as (Haykin 1988)

Lmin
η= . (2.2)
L
Because L ≥ Lmin , η ≤ 1. The source coding efﬁciency increases as η
approaches 1.
Shannon’s ﬁrst theorem, or source coding theorem, provides a means to
determine Lmin (Haykin 1988).
Given a memoryless discrete source with entropy H (X ), the average
length of the codewords is limited by

L ≥ H (X ).

Entropy H (X ), therefore, represents a fundamental limit for the aver-

age number of bits per source symbol L, which is needed to represent a
memoryless discrete source, and this number can be as small as, but never
smaller than, the source entropy H (X ). Therefore, for Lmin = H (X ), the
source coding efﬁciency can be written as (Haykin 1988)
H (X )
η= . (2.3)
L
The code redundancy is given by (Abramson 1963).
L − H (X )
1−η= . (2.4)
L

2.2 EXTENSION OF A MEMORYLESS

DISCRETE SOURCE
It is useful to consider the encoding of blocks of N successive symbols
from the source, instead of individual symbols. Each block can be seen as
a product of an extended source with an alphabet X N that has K N distinct
blocks. The symbols are statistically independent, therefore, the probability
of an extended symbol is the product of the probabilities of the original
symbols, and it can be shown that

H (X N ) = NH (X ). (2.5)
SOURCES OF INFORMATION • 13

Example: consider the discrete memoryless source with alphabet

X = {x1 , x2 , x3 }.

The second order extended source has an alphabet

X 2 = {x1 x1 , x1 x2 , x1 x3 , x2 x1 , x2 x2 , x2 x3 , x3 x1 , x3 x2 , x3 x3 }.

For the second order extended source of the example, p(xi xj ) =

p(xi )p(xj ). In particular, if all original source symbols are equiprobable,
then H (X ) = log2 3 bits. The second order extended source has nine equi-
probable symbols, therefore, H (X 2 ) = log2 9 bits. It can be noticed that
H (X 2 ) = 2H (X ).

2.2.1 IMPROVING THE CODING EFFICIENCY

The following example illustrates how to improve the coding efﬁciency

using extensions of a source (Abramson 1963).

Example: consider a memoryless source, S = {x1 , x2 }, with p(x1 ) = 34 and

p(x2 ) = 14 . The source entropy is given by 14 log2 4 + 34 log2 34 = 0.811 bit.
A compact code for the source is presented in Table 2.1.
The average codeword length is one bit, and the efﬁciency is

η1 = 0.811.

Example: to improve the efﬁciency, the second extension of the source is

encoded, as shown in Table 2.2.
The average codeword length is 2716 bits. The extended source entropy is
2 × 0.811 bits, and the efﬁciency is
2 × 0.811 × 16
η2 = = 0.961.
27
The efﬁciency improves for each new extension of the original source
but, of course, the codes get longer, which implies that they take more time
to transmit or process.

Table 2.1. A compact code

xi p(x i ) Compact code

3
x1 4 0
1
x2 4 1
14 • INFORMATION THEORY

Table 2.2. A compact code for an extension of

a source

Symbol Probability Compact code

x 1 x1 9/16 0
x1 x2 3/16 10
x2 x1 3/16 110
x2 x2 1/16 111

Example: the efﬁciencies associated to the third and fourth extensions of

the source are
η3 = 0.985

and
η4 = 0.991.

As higher order extensions of the source are encoded, the efﬁciency

approaches 1, a result that is proved in the next section.

2.3 PREFIX CODES

For a prefix code, no codeword is a prefix, of the first part, of another
codeword. Therefore, the code shown in Table 2.3 is prefix. On the other
hand, code shown in Table 2.4 is not prefix, because the binary word 10,
for instance, is a prefix for the codeword 100.
To decode a sequence of binary words produced by a prefix encoder,
the decoder begins at the first binary digit of the sequence, and decodes a
codeword at a time. It is similar to a decision tree, which is a representation
of the codewords of a given source code.

Table 2.3. A preﬁx code for a

given source

Symbol Code
x1 1
x2 01
x3 001
x4 000
SOURCES OF INFORMATION • 15

Table 2.4. A source code that

is not preﬁx

Symbol Code
x1 1
x2 10
x3 100
x4 1000

Figure 2.2 illustrates the decision tree for the prefix code pointed in
Table 2.5.
The tree has one initial state and four final states, which correspond to
the symbols x1 , x2 , and x3 . From the initial state, for each received bit, the
decoder searches the tree until a final state is found.
The decoder, then, emits a corresponding decoded symbol and returns
to the initial state. Therefore, from the initial state, after receiving a 1,
the source decoder decodes symbol x1 and returns to the initial state. If it
receives a 0, the decoder moves to the lower part of the tree, in the following,
after receiving another 0, the decoder moves further to the lower part of
the tree and, after receiving a 1, the decoder retrieves x2 and returns to the
initial state.
Considering the code from Table 2.5, with the decoding tree from
Figure 2.2, the binary sequence 011100010010100101 is decoded into the
output sequence x1 x0 x0 x3 x0 x2 x1 x2 x1 .
By construction, a prefix code is always unequivocally decodable, which
is important to avoid any confusion at the receiver.
Consider a code that has been constructed for a discrete source with
alphabet {x1 , x2 , . . . , xK }. Let {p1 , p1 , . . . , pK } be the source statistics,
and lk be the codeword length for symbol xk , k = 1, . . . , K. If the binary
code constructed for the source is a prefix one, then one can use the

Table 2.5. Example of a preﬁx code

Source symbol Probability of occurrence Code

x0 0.5 1
x1 0.25 01
x2 0.125 001
x3 0.125 000
16 • INFORMATION THEORY

x0
1

Initial x1
state 1
0
x2
1
0

0
x3

Figure 2.2. Decision tree for the code in Table 2.5.

Kraft-McMillan inequality

K
2−lk ≤ 1, (2.6)
k=1

in which factor 2 is the radix, or number of symbols, of the binary alphabet.

For a memoryless discrete source with entropy H (X ), the codeword
average length of a preﬁx code is limited to

H (X ) ≤ L < H (X ) + 1. (2.7)

The left hand side equality is obtained on the condition that symbol xk be
emitted from the source with probability pk = 2−lk , in which lk is the length
of the codeword assigned to symbol xk .
Consider the N th order extension of a memoryless discrete source. For
the N th order extension of a code, the encoder operates on blocks of N
symbols from the original source, instead of individual ones, and the source
alphabet X N has an entropy that is N times the entropy of the original
source.
Let LN be the average length for the extended preﬁx code. For an
unequivocally decodable code, LN is as small as possible, from Equa-
tion 2.7 it follows that

H (X N ) ≤ LN < H (X N ) + 1, (2.8)

therefore,

NH (X ) ≤ LN < NH (X ) + 1, (2.9)
SOURCES OF INFORMATION • 17

or, in an equivalent way,

LN 1
H (X ) ≤ < H (X ) + . (2.10)
N N
In the limit, as N goes to inﬁnity, the inferior and superior limitants
converge, and therefore,
1
lim LN = H (X ). (2.11)
N →∞ N

Therefore, for a preﬁx extended encoder, as the order N increases, the

code represents the source as efﬁciently as desired, and the average code-
word length can be as close to the source entropy as possible, according to
Shannon’s source coding theorem. On the other hand, there is a compromise
between the reduction on the average codeword length and the increase in
complexity of the decoding process (Haykin 1988).

2.4 THE INFORMATION UNIT

There is some confusion between the binary digit, abbreviated as bit, and
the information particle, also baptized as bit by John Tukey and Claude
Shannon.
In a meeting of the Institute of Electrical and Electronics Engineers
(IEEE) the largest scientiﬁc institution in the World, the author of this book
proposed the shannon [Sh] as a unit of information transmission, equivalent
to bit per second. It is instructive to say that the bit, as used today, is not a
unit of information, because it is not approved by the International System
of Units (SI).
What is curious about that meeting was the misunderstanding that
surrounded the units, in particular regarding the difference between the
concepts of information unit and digital logic unit.
To make things clear, the binary digit is associated to a certain state of a
digital system, and not to information. A binary digit 1 can refer to 5 volts,
in TTL logic, or 12 volts, for CMOS logic.
The information bit exists independent of any association to a particular
voltage level. It can be associated, for example, to a discrete information
or to the quantization of an analog information.
For instance, the information bits recorded on the surface of a com-
pact disk are stored as a series of depressions on the plastic material,
which are read by an optical beam, generated by a semiconductor laser.
But, obviously, the depressions are not the information. They represent a
18 • INFORMATION THEORY

means for the transmission of information, a material substrate that carries

the data.
In the same way, the information can exist, even if it is not associated
to light or other electromagnetic radiation. It can be transported by several
means, including paper, and materializes itself when it is processed by a
computer or by a human being.
CHAPTER 3

SOURCE CODING

3.1 TYPES OF SOURCE CODES

This chapter presents the classiﬁcation of source codes, block codes, non-
singular codes, uniquely decodable codes, and instantaneous codes.

3.1.1 BLOCK CODES

Let S = {x0 , x1 , . . . , xK−1 } be a set of symbols of a given source alphabet.

A code is deﬁned as a mapping of all possible symbol sequences from S into
sequences of another set X = {x0 , x1 , . . . , xM −1 }, called the code alphabet.
A block code maps each of the symbols from the source set into a
sequence of the code alphabet. The ﬁxed sequences of symbols xj are
called codewords Xi . Note that Xi denotes a sequence of xj ’s (Abramson
1963).

Example: a binary block code is presented in Table 3.1, and a ternary block
code is shown in Table 3.2.

3.1.2 NONSINGULAR CODES

A block code is said to be nonsingular if all codewords are distinct

(Abramson 1963). Table 3.2 shows an example of a nonsingular code.
The code shown in Table 3.3 is also nonsingular, but although the code-
words are distinct there is a certain ambiguity between some symbol
sequences of the code regarding the source symbol sequences.
20 • INFORMATION THEORY

Table 3.1. A binary block code

Source symbols Code

x0 0
x1 11
x2 00
x3 1

Table 3.2. A ternary block code

Source symbols Code

x0 0
x1 1
x2 2
x3 01

Table 3.3. A nonsingular block code

Source symbols Code

x0 0
x1 01
x2 1
x3 11

Example: the sequence 1111 can correspond to x2 x2 x2 x2 , or x2 x3 x2 , or

even x3 x3 . Which indicates that it is necessary to deﬁne a more strict con-
dition than nonsingularity for a code, to guarantee that it can be used in a
practical situation.

3.1.3 UNIQUELY DECODABLE CODES

Let a block code map symbols from a source alphabet S into ﬁxed symbol
sequences of a code alphabet X . The source can be an extension of another
source, which is composed of symbols from the original alphabet. The
n-ary extension of a block code that maps symbols xi into codewords Xi is
the block code that maps symbol sequences from the source (xi1 xi2 . . . xin )
into the codeword sequences (Xi1 Xi2 . . . Xin ) (Abramson 1963).
SOURCE CODING • 21

Table 3.4. A nonsingular block code

Source symbols Code

x0 1
x1 00
x2 11
x3 10

Table 3.5. The second extension of a block code

Source symbols Code Source symbols Code

x0 x0 11 x 2 x0 111
x0 x1 100 x 2 x1 1100
x0 x2 111 x 2 x2 1111
x0 x3 110 x 2 x3 1110
x1 x0 001 x 3 x0 101
x1 x1 0000 x 3 x1 1000
x1 x2 0011 x 3 x2 1011
x1 x3 0010 x 3 x3 1010

From the previous deﬁnition, the n-ary extension of a block code is also
a block code. The second order extension of the block code presented in
Table 3.4 is the block code of Table 3.5.
A block code is said to be uniquely decodable if and only if the n-ary
extension of the code is nonsingular for all ﬁnite n.

3.1.4 INSTANTANEOUS CODES

Table 3.6 presents two examples of uniquely decodable codes. Code A

is a simpler method to construct a uniquely decodable set of sequences,
because all codewords have the same length and it is a nonsingular
code.
Code B is also uniquely decodable. It is also called a comma
code, because the digit zero is used to separate the codewords (Abram-
son 1963).
Consider the code shown in Table 3.7. Code C differs from A and B
from Table 3.6 in an important aspect. If a binary sequence composed of
codewords from code C occurs, it is not possible to decode the sequence.
22 • INFORMATION THEORY

Table 3.6. Uniquely decodable codes

Source symbols Code A Code B

x0 000 0
x1 001 10
x2 010 110
x3 011 1110
x4 100 11110
x5 101 111110
x6 110 1111110
x7 111 11111110

Table 3.7. Another uniquely decodable code

Source symbols Code C

x0 1
x1 10
x2 100
x3 1000
x4 10000
x5 100000
x6 1000000
x7 10000000

Example: if the bit stream 100000 is received, for example, it is not pos-
sible to decide if it corresponds to symbol x5 , unless the next symbol is
available. If the next symbol is 1, then the sequence is 100000, but if it is 0,
then it is necessary to inspect one more symbol to know if the sequence
corresponds to x6 (1000000) or x7 (10000000).
A uniquely decodable code is instantaneous if it is possible to decode
each codeword in a sequence with no reference to subsequent symbols
(Abramson 1963). Code A and B are instantaneous, and C is not.
It is possible to devise a test to indicate when a code is instantaneous.
Let Xi = xi1 xi2 . . . xim be a word from a certain code. The sequence of sym-
bols (xi1 xi2 . . . xij ), with j ≤ m, is called the preﬁx of the codeword Xi .

Example: the codeword 10000 has five prefixes: 1, 10, 100, 1000, and
10000. A necessary condition for a code to be instantaneous is that no
codeword is a prefix of another codeword.
SOURCE CODING • 23

Codes

Block Nonblock

Nonsingular Singular

Uniquely Nonuniquely
decodable decodable

Instantaneous Noninstantaneous

Figure 3.1. Classes of source codes.

The various classes of codes presented in this section are summarized

in Figure 3.1.

3.2 CONSTRUCTION OF INSTANTANEOUS

CODES
In order to construct a binary instantaneous code for a source with five
symbols, one can begin by attributing the digit 0 to symbol s0 (Abramson
1963)
x0 ← 0.
In this case, the remaining source symbols should correspond to the
codewords that begin with the digit 1. Otherwise, it is not a prefix code. It
is not possible to associate x1 to the codeword 1, because no other symbol
would remain to begin the other codewords.
Therefore,
x1 ← 10.
This codeword assignment requires that the remaining codewords begin
with 11. If
x2 ← 110
then, the only unused prefix with three bits is 111, which implies that

x3 ← 1110

and

x4 ← 1111.
24 • INFORMATION THEORY

In the previously constructed code, note that if one begins the code
construction by making x0 to correspond to 0, this restricts the available
number of codewords, because the remaining codewords had to, necessar-
ily, begin with 1.
On the other hand, if a two-digit word had been chosen to represent x0 ,
there would be more freedom to choose the others, and there would be no
need to assign very long codewords to the last ones.
A binary instantaneous code can be constructed to represent the ﬁve
symbols (Abramson 1963). The ﬁrst assignment is

x0 ← 00.

Then, one can assign

x1 ← 01

and two unused preﬁxes of length two are saved to the following codeword
assignment:
x2 ← 10
x3 ← 110
x4 ← 111.
The question of which code is the best is postponed for the next section,
because it requires the notion of average length of a code, that depends on
the symbol probability distribution.

3.3 KRAFT INEQUALITY

Consider an instantaneous code with source alphabet given by

S = {x0 , x1 , . . . , xK−1 }

and code alphabet

X = {x0 , x1 , . . . , xM −1 }.

Let X0 , X1 , . . . , XK−1 be the codewords, and let li be the length of the

word Xi . The Kraft inequality establishes that a necessary and sufﬁcient
condition for the existence of an instantaneous code of length l0 , l1 , . . . ,
lK−1 is

K−1
r −li ≤ 1, (3.1)
i=0
in which r is the number of different symbols of the code.
SOURCE CODING • 25

For the binary case,

K−1
2−li ≤ 1. (3.2)
i=0

The Kraft inequality can be used to determine if a given sequence of

length li is acceptable for a codeword of an instantaneous code.
Consider an information source, with four possible symbols, x0 , x1 ,
x2 , and x3 . Table 3.8 presents ﬁve possible codes to represent the original
symbols, using a binary alphabet.

Example: for code A, one obtains

3
2−li = 2−2 + 2−2 + 2−2 + 2−2 = 1.
i=0

Therefore, the codeword lengths of this code are acceptable for an instan-
taneous code. But, the Kraft inequality does not tell if A is an instanta-
neous code. It is only a necessary condition that has to be fulﬁlled by the
lengths.
For the example, the inequality states that there is an instantaneous
code with four codewords of length 2. In this case, it is clear that the
binary codewords of code A satisfy the Kraft inequality and also form an
instantaneous code.
For code B,

3
2−li = 2−1 + 2−3 + 2−3 + 2−3 = 7/8 ≤ 1.
i=0

In this case, the lengths of the codewords are suitable to compose an

instantaneous code. Code B is also a preﬁx code.

Table 3.8. Selected binary codes

Source symbols Code A Code B Code C Code D Code E

x0 11 1 1 1 1
x1 10 011 01 011 01
x2 01 001 001 001 001
x3 00 000 000 00 00
26 • INFORMATION THEORY

Code C is similar to code B, except for a discarded bit in the second

codeword. For this code, one obtains

3
2−li = 2−1 + 2−2 + 2−3 + 2−3 = 1.
i=0

The codeword lengths satisfy the Kraft inequality and, by inspection,

one observes that this code is instantaneous.
Code D is obtained from B, discarding a bit in the fourth codeword.
Although the lengths satisfy the Kraft inequality, code D is not instanta-
neous, because it is not a preﬁx code. The fourth codeword is a preﬁx of
the third one.
Finally, for code E,
3
9
2−li = ,
8
i=0
and the codeword lengths do not satisfy the Kraft inequality. Therefore,
code E is not instantaneous.
Consider a source with eight symbols to be encoded into an instanta-
neous ternary code, whose codeword lengths are 1, 2, 2, 2, 2, 3, 3, 3. Using
the Kraft inequality,

9
1 1 1 24
3−li = +4 +3 = < 1,
3 9 27 27
i=0

which indicates that this code is possible, as follows:

x0 ← 0

x1 ← 10
x2 ← 11
x3 ← 20
x4 ← 21
x5 ← 220
x6 ← 221
x7 ← 222
For a source with 11 symbols, if the codeword lengths are 1, 2, 2, 2,
2, 2, 2, 3, 3, 3, 3, it is not possible to obtain a ternary instantaneous code,
because
10
1 1 1 31
3−li = + 6 + 4 = > 1.
3 9 27 27
i=0
SOURCE CODING • 27

3.4 HUFFMAN CODE

This section describes the Huffman coding algorithm, and the procedure
to construct the Huffman code when the source statistics are known.
The technique was developed by David Albert Huffman (1925–1999),
in a paper for a course on Information Theory taught by Robert Mario Fano
(1917–), at the Massachusetts Institute of Technology (MIT). The obtained
sequences are called Huffman codes, and they are preﬁx codes.
Huffman procedure is based on two assumptions regarding the optimum
preﬁx codes:

1. The most frequent symbols, those with higher probability, are rep-
resented by shorter codewords.
2. The least frequent symbols are assigned codewords of same length.

According to the ﬁrst assumption, as the most probable symbols are

also the most frequent, they must be as short as possible to decrease the
average length of the code. The second assumption is also true, because
for a preﬁx code a shorter codeword could not be a preﬁx of another one.
The least probable symbols must be distinct and have same length (Sayood
2006).
Furthermore, the Huffman process is completed by the addition of a
simple requisite. The longer codewords that correspond to the least fre-
quent symbols differ only on the last digit.

3.4.1 CONSTRUCTING A BINARY HUFFMAN CODE

Given a discrete source, a Huffman code can be constructed along the

following steps:

1. The source symbols are arranged in decreasing probability. The

least probable symbols receive the assignments 0 and 1.
2. Both symbols are combined to create a new source symbol, whose
probability is the sum of the original ones. The list is reduced by
one symbol. The new symbol is positioned in the list according to
its probability.
3. This procedure continues until the list has only two symbols, which
receive the assignments 0 and 1.
4. Finally, the binary codeword for each symbol is obtained by a
reverse process.
28 • INFORMATION THEORY

Table 3.9. Discrete source with ﬁve

symbols and their probabilities

Symbols Probabilities
x0 0.4
x1 0.2
x2 0.2
x3 0.1
x4 0.1

In order to explain the algorithm, consider the source of Table 3.9.

The ﬁrst phase is to arrange the symbols in a decreasing order of prob-
ability. Assign the values 0 and 1 to the symbols with the smallest proba-
bilities. They are then combined to create a new symbol. The probability
associated to the new symbol is the sum of the previous probabilities. The
new symbol is repositioned in the list, to maintain the same decreasing
order for the probabilities. The procedure is shown in Figure 3.2.
The procedure is repeated until only two symbols remain, which are
assigned to 0 and 1, as shown in Figure 3.3.
The procedure is repeated to obtain all codewords, by reading the digits
in inverse order, from Phase IV to Phase I, as illustrated in Figure 3.4.
Following the arrows, for symbol x4 one ﬁnds the codeword 011.

Symbol Phase I
x0 0.4 0.4
x1 0.2 0.2
x2 0.2 0.2
x3 0.1 0 0.2
x4 0.1
1

Figure 3.2. Probabilities in descending order for the Huffman code.

Symbol Phase I Phase II Phase III Phase IV

x0 0
0.4 0.4 0.4 0.6 1
x1 0.2 0.2 0.4 1
0
0.4
x2 0.2 0.2 1
0
0.2
x3 0.1 1
0
0.2
x4 0.1

Figure 3.3. Huffman code. At each phase, the two least probable symbols
are combined.
SOURCE CODING • 29

Symbol Phase I Phase II Phase III Phase IV

x0 0
0.4 0.4 0.4 0.6 1
x1 0
0.2 0.2 0.4 1 0.4
x2 0
0.2 0.2 1 0.2
x3 0
0.1 1 0.2
x4 0.1
(a)

Symbol Probability Codeword

x0 0.4 00
x1 0.2 10
x2 0.2 11
x3 0.1 010
x4 0.1 011
(b)

Figure 3.4. (a) Example of the Huffman coding algorithm to obtain the
codewords. (b) Resulting code.

For the example, the average codeword length for the Huffman code is
given by

4
L= pk lk = 0, 4(2) + 0, 2(2) + 0, 2(2) + 0, 1(3) + 0, 1(3) = 2.2 bits.
i=0

The source entropy is calculated as

4
1
H (X ) = pk log2
pk
i=0

1 1 1
H (X ) = 0, 4 log2 + 0, 2 log2 + 0, 2 log2
0, 4 0, 2 0, 2

1 1
+ 0, 1 log2 + 0, 1 log2 ,
0, 1 0, 1
or,
H (X ) = 2.12193 bits.

The code efﬁciency is

H (X ) 2.12193
η= = ,
L 2.2

which is equal to 96.45 percent.

It is important to say that the Huffman procedure is not unique, and
several variations can be obtained for the ﬁnal set of codewords, depending
30 • INFORMATION THEORY

Table 3.10. Four distinct Huffman codes obtained for the

source of Table 3.9

Symbols Code I Code II Code III Code IV

x0 00 11 1 0
x1 10 01 01 10
x2 11 00 000 111
x3 010 101 0010 1101
x4 011 100 0011 1100

on the way the bits are assigned. But, in spite of how the probabilities are
positioned, the average length is always the same, if the rules are followed.
The difference is the variance of the codeword lengths, deﬁned as

K−1
V[L] = pk (lk − L)2 , (3.3)
k=0

in which pk and lk denote the probability of occurrence of the k-th source

symbol, and length of the respective codeword.
Usually, the procedure of displacing the probability of the new symbol
to the highest position in the list produces smaller values for V[L], as
compared to the displacement of the probability to the lowest position of
the list.
Table 3.10 presents four Huffman codes obtained for the source of
Table 3.9. Codes I and II were obtained shifting the new symbol to the
highest position in the list of decreasing probabilities.
Codes III and IV were produced by shifting the new symbol to the
lowest position in the list. Codes I and III used the systematic assignment
of 0 followed by 1 to the least frequent symbols. Codes II and IV used the
systematic assignment of 1 followed by 0 to the least frequent symbols.
For all codes the average codeword length is 2.2 bits. For codes I and II,
the variance of the codeword lengths is 0.16. For codes III and IV, the
variance is 1.36.
CHAPTER 4

INFORMATION TRANSMISSION

Claude Elwood Shannon (1916–2001) is considered the father of Informa-

tion Theory. In 1948, he published a seminal article on the mathematical
concept of information, which is one of the most cited for decades. Infor-
mation left the Journalism field to occupy a more formal area, as part of
Probability Theory.
The entropy, in the context of information theory, was initially defined
by Ralph Vinton Lyon Hartley (1888–1970), in the article “Transmission
of Information”, published by Bell System Technical Journal in July 1928,
10 years before the formalization of the concept by Claude Shannon.
Shannon’s development was also based on Harry Nyquist’s work (Harry
Theodor Nyqvist, 1889–1976), which determined the sampling rate, as a
function of frequency, necessary to reconstruct an analog signal using a set
of discrete samples.
In an independent way, Andrei N. Kolmogorov developed his Complex-
ity Theory, during the 1960’s. It was a new information theory, based on
the length of an algorithm developed to describe a certain data sequence.
He used Alan Turing’s machine in this new definition. Under certain con-
ditions, Kolmogorov’s and Shannon’s definitions are equivalent.
The idea of relating the number of states of a system with a physical mea-
sure, although, dates back to the 19th century. Rudolph Clausius proposed
the term entropy for such a measure in 1895.
Entropy comes from the Greek word for transformation and in Physics,
it is related to the logarithm of the ratio between the final and initial temper-
ature of a system, or to the ratio of the heat variation and the temperature
of the same system.
Shannon defined the entropy of an alphabet at the negative of the mean
value of the logarithm of the symbols’ probability. This way, when the
symbols are equiprobable, the definition is equivalent to that of Nyquist’s.
32 • INFORMATION THEORY

But, as a more generic deﬁnition, Shannon’s entropy can be used

to compute the capacity of communication channels. Most part of the
researchers’ work is devoted to either compute the capacity or to develop
error-correcting codes to attain that capacity.
Shannon died on February 24, 2001, as a victim of a disease named after
the physician Aloysius Alzheimer. According to his wife, he lived a quiet
life, but had lost his capacity to retain information.

4.1 THE CONCEPT OF INFORMATION THEORY

The concept of information transmission is associated with the existence
of a communication channel that links the source and destination of the
message. This can imply the occurrence of transmission errors, caused by
the probabilistic nature of the channel. Figure 4.1 illustrates the canonical
model for a communication channel, proposed by Shannon in his seminal
paper of 1948. This is a very simpliﬁed model of reality, but contains the
basic blocks upon which the mathematical structure is built.

4.2 JOINT INFORMATION MEASUREMENT

Consider two discrete and ﬁnite sample spaces, and , with the associ-
ated random variables X and Y ,

X = x1 , x2 , . . . , xN ,
Y = y1 , y2 , . . . , yM . (4.1)

The events from may jointly occur with events from . Therefore,
the following matrix contains the whole set of events in the product

Transmitter Channel Receiver

Noise

Figure 4.1. Model for a communication channel.

INFORMATION TRANSMISSION • 33

space ,
⎡ ⎤
x 1 y1 x 1 y2 ··· x 1 yM
⎢ x2 y 1 x 2 y2 ··· x 2 yM ⎥
[XY ] = ⎢
⎣ ···
⎥ (4.2)
··· ··· ··· ⎦
xN y1 xN y2 ··· xN yM

The joint probability matrix is given in the following in which no restric-

tion is assumed regarding the dependence between the random variables.
⎡ ⎤
p1,1 p1,2 ··· p1,M
⎢ p2,1 p2,2 ··· p2,M ⎥
[P(X , Y )] = ⎢
⎣ ···
⎥ (4.3)
··· ··· ··· ⎦
pN ,1 pN ,2 ··· pN ,M

Figure 4.2 shows the relation between the input and output alphabets,
which are connected by the joint probability distribution matrix [P(X , Y )].
The joint entropy between the random variables from sources X and Y
is given by

N M
H (X , Y ) = − pk,j log pk,j , (4.4)
k=1 j=1

which may be simpliﬁed to

H (X , Y ) = − p(x, y) log p(x, y). (4.5)
X Y

The marginal entropies may be written in terms of the marginal proba-

bilities, p(x) and p(y)

H (X ) = − p(x) log p(x) (4.6)
X

and

H (Y ) = − p(y) log p(y). (4.7)
Y

X [ P(X,Y) ] Y

Figure 4.2. A probabilistic communication channel.

34 • INFORMATION THEORY

4.3 CONDITIONAL ENTROPY

The concept of conditional entropy is essential to model, and understand, the
operation of the communication channel, because it provides information
about a particular symbol, given that another symbol has occurred. The
entropy of alphabet X conditioned to the occurrence of a particular symbol
y is given by
p(x, y) p(x, y)
H (X |y) = − log
p(y) p(y)
X

=− p(x|y) log p(x|y). (4.8)
X

which can be written as

H (X |Y ) = − p(y)p(x|y) log p(x|y), (4.10)
Y X

or
H (X |Y ) = − p(x, y) log p(x|y). (4.11)
Y X

In the same way, the mean conditional entropy of source Y , given the
information about source X , is

H (Y |X ) = − p(x)p(y|x) log p(y|x), (4.12)
X Y

or
H (Y |X ) = − p(x, y) log p(y|x). (4.13)
X Y

4.4 MODEL FOR A COMMUNICATION CHANNEL

A communication channel can be modeled based on the previous devel-
opments. Consider a source that has the given alphabet X . The source
INFORMATION TRANSMISSION • 35

transmits the information to the destiny using a certain channel. The sys-
tem maybe described by a joint probability matrix, which gives the joint
probability of occurrence of a transmitted symbol and a received one,
⎡ ⎤
p(x1 , y1 ) p(x1 , y2 ) · · · p(x1 , yN )
⎢ p(x2 , y1 ) p(x2 , y2 ) · · · p(x2 , yN ) ⎥
[P(X , Y )] = ⎢
⎣ ···
⎥ (4.14)
··· ··· ··· ⎦
p(xM , y1 ) p(xM , y2 ) · · · p(xM , yN )

There are ﬁve probability schemes to analyze:

1. [P(X , Y )], joint probability matrix,

2. [P(X )], marginal probability matrix of X ,
3. [P(Y )], marginal probability matrix of Y ,
4. [P(X |Y )], probability matrix conditioned on Y ,
5. [P(Y |X )], probability matrix conditioned on X .

Those probability schemes produce ﬁve entropy functions, associated

to the communication channel, whose interpretations are given as follows:

1. H (X )—Average information per source symbol, or source entropy,

2. H (Y )—Average information per received symbol, or receiver
entropy,
3. H (X , Y )—Average information associated to pairs of transmitted
and received symbols, or average uncertainty of the communication
system,
4. H (X |Y )—Average information measurement of the received
symbol, given that X was transmitted, or conditional entropy,
5. H (Y |X )—Average information measurement of the source, given
that Y was received, or equivocation.

4.5 NOISELESS CHANNEL

For the noiseless discrete channel, each symbol from the input alphabet has
a one-to-one correspondence with the output. The joint probability matrix,
as well as, the transition probability matrix, has the same diagonal format,
⎡ ⎤
p(x1 , y1 ) 0 ··· 0
⎢ 0 p(x , y ) ··· 0 ⎥
[P(X , Y )] = ⎢
⎣ ···
2 2 ⎥ (4.15)
··· ··· ··· ⎦
0 0 ··· p(xN , yN )
36 • INFORMATION THEORY

⎡ ⎤
1 0 ··· 0
⎢ 0 1 ··· 0⎥
[P(X |Y )] = [P(Y |X )] = ⎢
⎣· · ·
⎥ (4.16)
··· ··· · · ·⎦
0 0 ··· 1

The joint entropy equals the marginal entropies

N
H (X , Y ) = H (X ) = H (Y ) = − p(xi , yi ) log p(xi , yi ), (4.17)
i=1

and the conditional entropies are null

H (Y |X ) = H (X |Y ) = 0. (4.18)

As a consequence, the receiver uncertainty is equal to the source

entropy, and there is no ambiguity at the reception, which indicates that
the conditional entropies are all zero.

4.6 CHANNEL WITH INDEPENDENT OUTPUT

AND INPUT
For the channel with independent input and output, there is no relation
between the transmitted and received symbols, that is, given that a given
symbol has been transmitted, any symbol can be received, with no con-
nection whatsoever with it. The joint probability matrix has N identical
columns
⎡ ⎤
p p1 · · · p1
⎢ p2 p2 · · · p2 ⎥ M
1
[P(X , Y )] = ⎢ ⎥,
⎣· · · · · · · · · · · · ⎦ pi = . (4.19)
i
N
pM p M · · · p M

The input and output symbol probabilities are statistically independent,

that is,
p(x, y) = p(x)p(y). (4.20)
Computing the entropy gives
M

H (X , Y ) = −N pi log pi , (4.21)
i=1
M

M
H (X ) = − Npi log Npi = −N pi log pi − log N , (4.22)
i=1 i=1
INFORMATION TRANSMISSION • 37

1 1
H (Y ) = − N log = log N , (4.23)
N N

M
H (X |Y ) = − Npi log Npi = H (X ), (4.24)
i=1
M
1
H (Y |X ) = − Npi log = log N = H (Y ). (4.25)
N
i=1

As a consequence, the channel with independent input and output does

not provide information, that is, has the highest possible loss, contrasting
with the noiseless channel.

4.7 RELATIONS BETWEEN THE ENTROPIES

It is possible to show, using Bayes rule for the conditional probability, that
the joint entropy can be written in terms of the conditional entropy, in the
following way

H (X , Y ) = H (X |Y ) + H (Y ), (4.26)
H (X , Y ) = H (Y |X ) + H (X ). (4.27)

Shannon has shown the fundamental inequality

H (X ) ≥ H (X |Y ), (4.28)

whose demonstration is given in the following.

The logarithm concavity property can be used to demonstrate the inequal-
ity, lnx ≤ x − 1, as follows,
p(x)
H (X |Y ) − H (X ) = p(x, y) log
p(x|y)
Y X

p(x)
≤ p(x, y) − 1 log e. (4.29)
p(x|y)
Y X

But, the right hand side of the inequality is zero, as shown in the
following

(p(x) · p(y) − p(x, y)) log e = (p(y) − p(y)) log e
Y X Y
= 0. (4.30)
38 • INFORMATION THEORY

Therefore,
H (X ) ≥ H (X |Y ). (4.31)

In a similar manner, it can be shown that

H (Y ) ≥ H (Y |X ). (4.32)

The equality is attained if and only if X and Y are statistically independent.

4.8 MUTUAL INFORMATION

A measure of mutual information provided by two symbols (xi , yi ) can be
written as

I (xi ; yj ) = log2 p(xi |yj ) − log2 p(xi )

p(xi |yj ) p(xi , yj )
= log2 = log . (4.33)
p(xi ) p(xi )p(yj )

It can be noticed that the a priori information of symbol xi is contained

in the marginal probability p(xi ). The a posteriori probability that sym-
bol xi has been transmitted, given that yi was received is p(xi |yi ). There-
fore, in an informal way, the information gain for the observed symbol
equals the difference between the initial information, or uncertainty, and
the ﬁnal one.
The mutual information is continuous in p(xi |yi ), and also symmet-
ric, or
I (xi ; yj ) = I (yj ; xi ), (4.34)

which indicates that the information provided by xi about yi is the same

provided by yi about xi .
The function I (xi ; xi ) can be called the auto-information of xi , or

1
I (xi ) = I (xi ; xi ) = log , (4.35)
p(xi )

because, for an observer of the source alphabet, the a priori knowledge of

the situation is that xi will be transmitted with probability p(xi ), and the a
posteriori knowledge is the certainty that xi transmitted.
In conclusion,

I (xi ; yj ) ≤ I (xi ; xi ) = I (xi ), (4.36)

I (xi ; yj ) ≤ I (yj ; yj ) = I (yj ). (4.37)
INFORMATION TRANSMISSION • 39

The statistical mean of the mutual information per pairs of symbols

provides an interesting interpretation of the mutual information concept,

I (X ; Y ) = E[I (xi ; yj )] = p(xi , yj )I (xi ; yj ), (4.38)
i j

which can be written as

p(xi |yj )
I (X ; Y ) = p(xi , yj ) log . (4.39)
i j
p(xi )

The average mutual information can be interpreted as a reduction on

the uncertainty about the input X , when the output Y is observed (MacKay
2003). This deﬁnition provides an adequate metric for the average mutual
information of all pairs of symbols, and can be put in terms of the entropy,
such as

I (X ; Y ) = H (X ) + H (Y ) − H (X , Y ), (4.40)
I (X ; Y ) = H (X ) − H (X |Y ), (4.41)
I (X ; Y ) = H (Y ) − H (Y |X ). (4.42)

Put that way, the average mutual information gives a measure of the
information that is transmitted by the channel. Because of this, it is called
transinformation, or information transferred by the channel. It is always
non-negative, even if the individual information quantities are negative for
certain pairs of symbols.
For a noiseless channel, the average mutual information equals the joint
entropy.

I (X ; Y ) = H (X ) = H (Y ), (4.43)
I (X ; Y ) = H (X , Y ). (4.44)

On the other hand, for a channel in which the output is independent of the
input, the average mutual information is null, implying that no information
is transmitted by the channel.

I (X ; Y ) = H (X ) − H (X |Y ),
= H (X ) − H (X ) = 0. (4.45)

It is possible to use the set theory, presented in the Appendix, to obtain

a pictorial interpretation of the fundamental inequalities discovered by
Shannon. Consider, the set of events A and B, associated to the set of
symbols X and Y , and a Lebesgue measure m. It is possible to associate,
40 • INFORMATION THEORY

unambiguously, the measures of A and B with the entropies of X and Y

(Reza 1961),

m(A) ←→ H (X ), (4.46)
m(B) ←→ H (Y ). (4.47)

In the same way, one can associate the joint and conditional entropies
the union and intersection of sets, respectively,

m(A ∪ B) ←→ H (X , Y ), (4.48)
m(AB) ←→ H (X |Y ), (4.49)
m(BA) ←→ H (Y |X ). (4.50)

The average mutual information is, therefore, associated to the measure

of the intersection of the sets,

m(A ∩ B) ←→ I (X ; Y ). (4.51)

Those relations can be seen in Figure 4.3, which also serve as a means
to memorize the entropy and mutual information properties.
Therefore, the fundamental inequalities can be written as a result of set
operations (Reza 1961),

m(A ∪ B) ≤ m(A) + m(B) ←→ H (X , Y ) ≤ H (X ) + H (Y ), (4.52)

m(AB) ≤ m(A) ←→ H (X |Y ) ≤ H (X ), (4.53)
m(BA) ≤ m(B) ←→ H (Y |X ) ≤ H (Y ), (4.54)

and, ﬁnally,

m(A ∪ B) = m(AB) + m(BA) + m(A ∩ B), (4.55)

which is equivalent to

H (X , Y ) = H (X |Y ) + H (Y |X ) + I (X ; Y ). (4.56)

A A∩B B

Figure 4.3. Venn diagram corresponding to the relation between the

entropies.
INFORMATION TRANSMISSION • 41

For a noiseless channel, the two sets coincide, and the relations can be
written as:

m(A) = m(B) ←→ H (X ) = H (Y ), (4.57)

m(A ∪ B) = m(A) = m(B) ←→ H (X , Y ) = H (X ) = H (Y ), (4.58)
m(AB) = 0 ←→ H (X |Y ) = 0, (4.59)
m(BA) = 0 ←→ H (Y |X ) = 0, (4.60)

and, ﬁnally,

m(A ∩ B) = m(A) = m(B) = m(A ∪ B)

←→ I (X ; Y ) = H (X ) = H (Y ) = H (X , Y ). (4.61)

For a channel with output symbols independent from the input symbols,
the sets A and B are considered mutually exclusive, therefore:

m(A ∪ B) = m(A) + m(B) ←→ H (X , Y ) = H (X ) + H (Y ) (4.62)

m(AB) = m(A) ←→ H (X |Y ) = H (X ) (4.63)
m(A ∩ B) = 0 ←→ I (X ; Y ) = 0. (4.64)

The same procedure can be applied to a multiple port channel. For a

three port channel, one can obtain the following relations for the entropies:

H (X , Y , Z) ≤ H (X ) + H (Y ) + H (Z), (4.65)
H (Z|X , Y ) ≤ H (Z|Y ). (4.66)

In the same reasoning, it is possible to obtain the following relations for

the average mutual information for a three-port channel:

I (X ; Y , Z) = I (X ; Y ) + I (X ; Z|Y ), (4.67)
I (Y , Z; X ) = I (Y ; X ) + I (Z; X |Y ). (4.68)

4.9 CHANNEL CAPACITY

Shannon deﬁned the discrete channel capacity as the maximum of the
average mutual information, computed for all possible probability sets that
42 • INFORMATION THEORY

can be associated to the input symbol alphabet, that is, for all memoryless
sources,
C = max I (X ; Y ) = max [H (X ) − H (X |Y )]. (4.69)

4.9.1 CAPACITY OF THE MEMORYLESS DISCRETE

CHANNEL

Consider X as the alphabet of a source with N symbols. Because the tran-

sition probability matrix is diagonal, one obtains

N
C = max I (X ; Y ) = max [H (X )] = max − p(xi ) log p(xi ) . (4.70)
i=1

Example: the entropy attains a maximum when all symbols are equiprob-
able. Therefore, for the memoryless discrete channel, the capacity is

N
1 1
C = max − log ,
N N
i=1

which gives
C = log N bits per symbol. (4.71)

The channel capacity can also be expressed in bits per second, or shannon
(Sh), and corresponds to the information transmission rate of the channel,
for symbols with duration T seconds,

C
CT = bits per second, or Sh. (4.72)
T

Therefore, for the noiseless channel,

C 1
CT = = log N bits per second, or Sh. (4.73)
T T

4.9.2 RELATIVE REDUNDANCY AND EFFICIENCY

The absolute redundancy is the difference between the actual information

transmission rate and I (X ; Y ) and the maximum possible value,
INFORMATION TRANSMISSION • 43

Absolute redundancy for a noisy channel = C − I (X ; Y )

= log N − H (X ). (4.74)

The ratio between the absolute redundancy and the channel capacity is
deﬁned as the system relative redundancy,

log N − H (X )
Relative redundancy for a noiseless channel, D =
log N
H (X )
=1 − . (4.75)
log N

The system efﬁciency is deﬁned as the complement of the relative

redundancy,

I (X ; Y ) H (X )
Efﬁciency of the noiseless channel, E = =
log N log N
= 1 − D. (4.76)

When the transmitted symbols do not occupy the same time interval, it
is still possible to deﬁne the average information transmission rate for the
noiseless channel, as
N
− i=1 p(xi ) log p(xi )
RT = N , (4.77)
i=1 p(xi )Ti

in which Ti represent the symbol intervals.

For a discrete noisy channel the capacity is the maximum of the average
mutual information, when the noise characteristic p(yi |xi ) is speciﬁed.
⎛ ⎞
N
M
p(yj |xi )
C = max ⎝ p(xi )p(yj |xi ) log ⎠, (4.78)
p(yj )
i=1 j=1

in which the maximum is over p(xi ). It must be noticed that the maxi-
mization in relation to the input probabilities do not always leads to an
admissible set of source probabilities.
Bayes’ rule deﬁnes the relation between the marginal probabilities p(yj )
and the a priori probabilities p(xi ),

N
p(yj ) = p1 (xi )p(yj |xi ), (4.79)
i=1
44 • INFORMATION THEORY

in which the variable are restricted to the following conditions:

p(xi ) ≥ 0 i = 1, 2, . . . , N , (4.80)

N
p1 (xi ) = 1.
i=1

Example: determine the capacity of the memoryless Binary Symmetric

Channel (BSC) (Blake 1987).
The channel is illustrated in Figure 4.4, in which the error probability,
or the probability of transmitting a symbol and receiving another one, is
indicated as p.
The channel capacity is given by Formula 4.69,

C = max I (X ; Y ),

in which the average mutual information can be written as

1
1
pij
I (X ; Y ) = pij log . (4.81)
pi qj
i=0 j=0

Assume that the symbol a priori probabilities are p0 = r and p1 = v,

with r + v = 1. Probabilities r and v are chosen in such a manner that, for
each channel use, the maximum quantity of information is transferred.
The joint probabilities are

p00 = r(1 − p), p01 = rp, p10 = (1 − r)p, p11 = (1 − r)(1 − p).

1−p
0 0

X Y
p

1 1
1−p

Figure 4.4. Memoryless binary symmetric channel.

INFORMATION TRANSMISSION • 45

and the average mutual information can be written as

(1 − p)r
I (X ; Y ) = [(1 − p)r] log
r[(1 − p)r + (1 − r)p]

rp
+ [rp] log
r[(rp + (1 − r)(1 − p)]

(1 − r)p
+ [(1 − r)p] log
(1 − r)[r(1 − p) + (1 − r)p]

(1 − p)(1 − r)
+ [(1 − p)(1 − r)] log ,
(1 − r)[(rp + (1 − r)(1 − p)]

that can be put in the following way, after the simpliﬁcation with logarithm
properties,

I (X ; Y ) = p log p + (1 − p) log (1 − p)
− [r(1 − p) + (1 − r)p] log [r(1 − p) + (1 − r)p]
+ [rp + (1 − r)(1 − p)] log [rp + (1 − r)(1 − p).

The objective is to determine the value of r that maximizes the expres-

sion, taking into account that the logarithms are base two. The obtained
expression for r is a complicated one, but the maximum that the average
mutual information attains is given by

C = max I (X ; Y ) = 1 − p log p + (1 − p) log (1 − p), (4.82)

that represents the memoryless binary symmetric channel capacity. The

graph for the capacity C(p), as a function of the channel error probability,
is shown in Figure 4.5.

Example: determine the capacity of the Binary Erasure Channel (BEC)

shown in Figure 4.6, in which the parameter E represents the erasure and
1 − p the occurrence probability (Blake 1987).
The average mutual information for this channel is given by

1
2
pij
I (X ; Y ) = pij log , (4.83)
pi qj
i=0 j=0

and the probabilities pij are the following, for p0 = r and p1 = v, with
r + v = 1,

p00 = rp, p01 = r(1 − p), p10 = 0, p11 = (1 − r)(1 − p), p02 = 0,
p12 = (1 − r)p.
46 • INFORMATION THEORY

C(p)

0 p 1

Figure 4.5. Graph for the capacity of the memoryless binary symmetric
channel.

p
0 0

1−p

1 1
p

Figure 4.6. Binary erasure channel.

Substituting the probability values in Equation 4.83, one obtains

rp
I (X ; Y ) = [rp] log 2
r p

(1 − p)r
+ [(1 − p)r] log ,
(1 − p)r

(1 − r)(1 − p)
+ [(1 − r)(1 − p)] log
(1 − p)2

(1 − r)p
+ [(1 − r)p] log ,
(1 − r)2 p
INFORMATION TRANSMISSION • 47

C(p)

0 p 1

Figure 4.7. Graph of the capacity for the binary erasure channel.

Simplifying the terms in the expression, gives

I (X ; Y ) = p[r log r − (1 − r) log (1 − r)] = pH (r), (4.84)

in which H (r) is the entropy function.

Because p is determined, I (X ; Y ) is maximized by the choice of a
value for r that produces the highest H (r), that is, r = 1/2, for which
H (r) = 1. Therefore, the capacity of the binary erasure channel is sim-
ply p. Figure 4.7 shows the graph for the capacity as a function of the
probability p.
CHAPTER 5

MULTIPLE ACCESS SYSTEMS

5.1 INTRODUCTION
The multiple access channel can be analyzed through the use of an infor-
mation theoretic approach. In this type of communications system, several
users transmit simultaneously to a common receiver. The main multiplex-
ing strategies that fall in that category include frequency division multiple
access (FDMA), time division multiple access (TDMA), and code division
multiple access (CDMA).
The main objective of the information theoretic approach of the multi-
ple access channel is to ﬁnd its capacity region, that is, the set of information
rates at which the simultaneous reliable communication of the messages
of each user is possible (Verdú 1989b).
The problem of ﬁnding the capacity region of a multiple access chan-
nel can be traced to a pioneering work on two-way communication chan-
nels by Shannon, in 1961 (Shannon 1961). A similar approach led to
the multi-way solution for the discrete memoryless channel (Ahlswede
1971). The related problem of simultaneous communication from one
source to several receivers was proposed in 1971, and solutions were
given to the cases of orthogonal, Gaussian, and compound channels (Cover
1972).
The usual assumption for the Shannon channel was a strategic coopera-
tion between the senders and the receiver. This approach changed in 1981,
when the lack of synchronization was proven not to reduce the capacity
region for the multiple access channel (Cover, McEliece, and Posner 1981).
Later it was shown, for the two-user case, that the only effect of frame asyn-
chronism on the discrete memoryless channel is the removal of the convex
hull operation from the expression of the capacity region.
50 • INFORMATION THEORY

The same result could be extended to the multiuser case and implies that
the maximum achievable rate sum, or total capacity, of the memoryless mul-
tiple access channel is never decreased by the lack of frame synchronism.
This is a direct implication of the convexity property, because the rate sum
of any convex combination of rate pairs is equal to the convex combination
of the respective rate sums (Hui and Humblet 1985).
Further study on the asynchronous Gaussian multiple access channel
established the nonoptimality of the conventional receiver (independent
detection of the signals), which is a result of the additive component of
multiple access interference (MAI). Fortunately for CDMA systems, in
which the designer is allowed to choose a signal constellation with large
bandwidth, the cross-correlations between the signals can be kept low for
all relative delays and an acceptable performance can be achieved (Verdú
1986).
A paper published in 1975 showed, by means of an example, that the
use of feedback could increase the capacity region of the multiple access
memoryless channel. But the problem of determining the actual capac-
ity region, when feedback is available, remained unsolved (Gaarder and
Wolf 1975). A point to be stressed here is that, despite the efforts of many
researchers in the ﬁeld, the capacity region of the multiterminal commu-
nication channel remains, in general, unknown. A general formula for the
capacity of such channels is expressed by (Mandell and McEliece 1991).
The role of the memory in the multiple access environment was stressed
in the article (Verdú 1989a). In fact, the study of asynchronous multiple
access channels can be enriched with the theory of multiple access channels
with memory, in which the effect on the current symbol depends on the
previous state of the channel. The usual approach to the design of codes
for channels with memory has been the assumption of interleaving, which
provides an interesting trade-off between decoding complexity and storage
capacity.
In one of the ﬁrst works to suggest this technique, interleaving appeared
as a type of time division multiplexing (Elliott 1965). The issue there was
that interleaving could enhance the error-control effectiveness of error-
correcting codes, when the separation between bits of a code word is on
the order of several hundred bits. The purpose of interleaving is to provide
randomization, in order to mitigate potential burst errors.
There are many different types of channels that can be characterized by
the nature of errors that are generated between transmitter and receiver.
On a bursty channel, given a burst of symbols, an interleaving scheme
can be used to spread the errors to many codewords. A suitable error-
correcting code could then be utilized to correct the remaining errors in
MULTIPLE ACCESS SYSTEMS • 51

each codeword. An optimum interleaver is then one which utilizes the least
amount of memory and has the smallest average delay between the input
and output of a symbol from the interleaver, besides providing the neces-
sary randomization of the data.
The symbol-asynchronous Gaussian multiple access channel encom-
passes many interesting models. The direct sequence spread spectrum mul-
tiple access channel represents a possible application for some of those
models. The capacity region of the symbol-asynchronous Gaussian mul-
tiple access channel was obtained recently (Verdú 1989b). However, fur-
ther study is necessary to obtain the capacity region for the asynchronous
Gaussian multiple access channel with memory. This could be helpful in
modeling the wireless digital communications channel as introduced in the
last chapter.
Moreover, for channels with memory, frame asynchronism may drasti-
cally reduce the capacity region and, in particular, the maximum achiev-
able rate sum. A useful model for a channel with memory will be
introduced in the last section of this chapter and analyzed in some detail.

5.2 THE GAUSSIAN MULTIPLE ACCESS

CHANNEL
The multiple access channel can be deﬁned as a system in which M
transmitters, Xi ∈ Xi , simultaneously communicate to a common recei-
ver, Y ∈ Y. It is characterized by the transition probability matrix
P(y|x1 , · · · , xM ), which represents the probabilities associated with out-
put y when the M inputs are x1 , · · · , xM . The channel is assumed memo-
ryless, and the alphabets are considered continuous sets. When the noise
affecting the transmitted signals is considered Gaussian, the systems is
called the Gaussian multiple access channel, which was studied in detail
by (Wyner 1974).
The choice of Gaussian signals to convey information in the additive
Gaussian channel is not a matter of chance. The use of a set of Gaussian
waveforms maximizes the mutual information for this kind of channel
(Sommer 1966). The multiple access channel is shown in Figure 5.1.
In the following, a general idea on the computation of the capacity region

is presented. Consider that the output is Y = i Xi + Z, in which the Xi
are inputs and Z is a Gaussian noise random variable independent of the
inputs, with variance σZ2 . The encoding constraint is that the messages,
which are n-vectors, satisfy E[(1/n)||Xi ||2 ] ≤ σi2 . The Euclidean norm || · ||
is used for the vectors in the preceding formula. Besides, one imposes the
52 • INFORMATION THEORY

X2 CHANNEL Y

Figure 5.1. The multiple access channel.

additional condition that E[Xi2 ] ≤ σi2 , i = 1, . . . , M (Wyner 1974). In the

following, all logarithms are to the base two, which implies that information
rates, and capacities, are measured in bits.
The mutual information can be deﬁned for the set of inputs and out-
put as the difference between the output entropy H (Y ) and the noise
entropy H (Z)

I (X1 , · · · , XM ; Y )
= H (Y ) − H (Y |X1 , · · · , XM ) (5.1)
= H (Y ) − H (Z). (5.2)

Considering the independence between the inputs and the Gaussian

noise, the variance of the output can be written

M
V [Y ] = V [Xi ] + V [Z] (5.3)
i=1
M
≤ σi2 + σZ2 . (5.4)
i=1

Therefore, the mutual information is bounded by

M
1
I (X1 , · · · , XM ; Y ) ≤ log 2π e σi + σ Z
2 2
2
i=1
1
− log 2π eσZ2
2
M 2
i=1 σi + σZ
2
1
= log (5.5)
2 σZ2
1
= log (1 + S 2 ) = C. (5.6)
2
MULTIPLE ACCESS SYSTEMS • 53

in which the total signal-to-noise ratio (SNR) is deﬁned as

M 2
σ
S = i=12 i .
2
(5.7)
σZ

Furthermore,

I (Xi ; Y |X1 , · · · , Xi−1 , Xi+1 , · · · , XM )

= H (Y |Xi ) − H (Y |X1 , · · · , XM ) (5.8)
for i = 1, · · · , M (5.9)

which is bounded as

I (Xi ; Y |X1 , · · · , Xi−1 , Xi+1 , · · · , XM )

1 σi2 + σZ2
≤ log
2 σZ2
1
= log (1 + si2 ) = Ci (5.10)
2
for i = 1, · · · , M

in which si2 is
σi2
si2 = . (5.11)
σZ2
A multiuser discrete multiple access channel satisﬁes

Ri ≤ I (Xi ; Y |X1 , · · · , Xi−1 , Xi+1 , · · · , XM ),

for i = 1, · · · , M
R1 + · · · + RM ≤ I (X1 , · · · , XM ; Y ) (5.12)

in which the set of Ri represents the transmission rates for M independent

input signals Xi . The output signal is Y . The boundary of this region is
called the convex hull.

Example: the set of all achievable rates must lie inside the region deﬁned by
the inequalities R1 + · · · + RM ≤ C and Ri ≤ Ci , as depicted in Figure 5.2,
for the case of two users.
For the case of three users, the set of equations above deﬁne an upper
bound on the capacity, because other constraints can be imposed on the
partial sums of transmission rates.
The channel capacity, or the maximum possible transmission rate, can
only be attained on the convex hull, or contour surface. Thus time-sharing
could be necessary to achieve some of the points in the capacity region, for
54 • INFORMATION THEORY

C1 R1 + R2 = C

0 C2 R2

Figure 5.2. Capacity region for the Gaussian multiple access channel,
M = 2.

instance if Ci = C (Bierbaum and Wallmeier 1979). This model assumes

cooperative users, which can be translated into a time-sharing environ-
ment.The transmission of information without any prearrangement between
the users results in a degradation of the capacity region (Sommer 1966),
which can be minimized with the use of multiplexing techniques such
as CDMA.

5.3 THE GAUSSIAN CHANNEL WITH

RAYLEIGH FADING

Consider that the output is Y = i ri Xi + Z, in which the Xi are the trans-
mitted signal, Z is a Gaussian noise random variable independent of the
inputs, with variance σZ2 , and ri are the attenuation coefﬁcients associated
with the fading channel. The channel is assumed memoryless, and the
alphabets are considered continuous sets.
The choice of Gaussian signals to convey information in the additive
Gaussian channel does not optimize the transmission rate any more. The
accuracy of a simple approximation to the capacity of a fading, additive
Gaussian noise channel is discussed for the single user case. The special
case of Rayleigh fading is considered in greater detail and the signal dis-
tribution that yields capacity for this case is found.
The random variables ri are the attenuation parameters deﬁned in
Chapter 2, which are assumed to follow a Rayleigh distribution given by

r − 2γr22
pR (r) = e u(r). (5.13)
γ2
MULTIPLE ACCESS SYSTEMS • 55

The output is not guaranteed to be Gaussian, for a Gaussian input, in

the fading channel. This is the reason for not assuming an input Gaussian
distribution in the ﬁrst hand. The product of the signal versions and the
attenuation factors yield a distribution that is not Gaussian, in the general
case, for the received signal.
This poses a problem for computing the capacity region: The capacity
is obtained by maximizing the mutual information over all the input distri-
butions and this is achieved, for the Gaussian channel, only by producing a
Gaussian output (Shannon 1948). Thus, in order to attain the capacity one
has to “Gaussianize” the output by choosing a suitable distribution for the
set of input signals.
A fading additive Gaussian noise channel is assumed to have an output of
the form Y = RX + Z in which: N is a Gaussian random variable with mean
zero and variance σZ2 , that is, N ∼ N (0, σZ2 ); X is a signal random variable
with density function pX (x) and variance σX2 ; R is a positive valued random
variable with density function pR (r) and E[R2 ] = 2γ 2 ; Y is the received
random variable. It is assumed that X and therefore Y has mean zero. The
capacity of this channel has been determined (Ericson 1970) as:

1
C(s ) = ER
2
log (1 + R s ) , s2 = σX2 /σZ2
2 2
2
∞
1
= pR (r) log (1 + r 2 s2 )dr, (5.14)
0 2

in which the units will be bits per channel use and the parameter s2 will
be referred to as SNR. The equation results from the consideration of the
parametric equations deﬁning the reliability function for random coding
for this channel and is valid for any positive valued random variable.
It is of interest here to know how this expression is related to the equation

1
C(s2 ) = log (1 + γ 2 s2 ), (5.15)
2
which might be easier to use in many situations.
From the convexity of the logarithm function and Jensen’s inequality
(Blahut 1987) it is clear that for all SNRs, s2 , C(s2 ) ≤ C(s2 ). The magnitude
of the difference C(s2 ) − C(s2 ) will be shown, graphically, to be small
for the case of a Rayleigh fading distribution. Implicit in the form of the
capacity expression is the fact that Y should be a Gaussian random variable
to achieve capacity. Consequently U = RX should be a Gaussian random
variable.
The question of determining the distribution of X , for a given distribu-
tion of R to yield U a Gaussian distribution is considered. It is noted that
56 • INFORMATION THEORY

since U and R are correlated with unknown joint distribution, this prob-
lem is quite different from the usual transformation problem for random
variables. It is solved for the case of R Rayleigh distributed. The general
problem seems difﬁcult.
From the form of the equation for the capacity of the channel, it is
argued that to achieve capacity, Y must be a zero mean Gaussian random
variable. For a given fading random variable R, the signal random variable X
should be chosen so that U = RX is Gaussian. Since U and R are correlated
it does not seem a simple matter to compute the distribution of X = U /R,
using the standard techniques.
Likewise, it does not seem a simple matter to compute I (X , Y ) and
the maximizing distribution to achieve capacity, making the technique of
(Ericson 1970) the more interesting.

Example: for the case of R a Rayleigh distributed random variable with

density the distribution of X can be achieved in the following manner.
Since R and X are, by assumption, independent and Y − N = U = RX ∼
N (0, σU2 ) , as discussed, then X has a zero mean and a symmetric distri-
bution. Odd moments of U and X are zero and E[X 2k ] = E[U 2k ]/E[R2k ].
The required moments are readily computed as

(2k)! 2k
E[R2k ] = k!γ 2k E[U 2k ] = σ , (5.16)
2k k! U

and so
2k
E[U 2k ] 2k σ
E[X 2k ] = = √ , k = 1, 2, 3, · · · . (5.17)
E[R2k ] k 2γ
√
Let a = σ/( 2γ ). Let pX (x) be the density of X and φX (v) the characteristic
function. The series expansion of φX (v) is

∞
E[X 2k ]
φX (v) = ( − 1)k
(2k)!
k=0
∞ 2k

= ( − 1)k k
(5.18)
(2k)!
k=0
∞
(av)2k
= ( − 1)k
(k!)2
k=0

This series is identiﬁed as (Abramowitz and Stegun 1965, p. 360) J0 (av)

in which J0 ( · ) is a Bessel function of the ﬁrst kind and zeroth order. The
MULTIPLE ACCESS SYSTEMS • 57

corresponding density function is then given as (Oberhettinger 1990)

∞
1
pX (x) = J0 (av)e−jvx dv
2π −∞ (5.19)
1
= , |x| < 2a
π((2a)2 − x2 )1/2
Thus for the random variable X with this density, R with the Rayleigh
density, U = RX has the required Gaussian density N (0, σU2 ). While the
above technique is of interest, since success depends on the identification
of an infinite series, it does not lend itself to generalization.
The effect of fading has been removed from the final expression for the
capacity at a certain cost. In order to counterbalance the stochastic atten-
uation, it is necessary to design a circuit that provides the transformation
of random variables necessary at the transmission end. It is not certain, at
this point, that this can be achieved in general for any type of fading.
In physical terms, the introduction of fading in the channel decreases
the noise margin of the system. Thus, the detection threshold remains con-
strained to a smaller region near the origin, for the case of binary commu-
nications. If the input distribution cannot be altered, it is possible to define
another transmission rate, which will fall short of being the optimum in this
case. The trade-off assumed for attaining the full capacity of the system
resides in the design of a circuit to transform the input signal and shape its
probability density function (pdf) accordingly.
The capacity evaluated by Ericson (1970) can be interpreted as an
average channel capacity, as defined by Lee (1990), or as analyzed by
Dobrushin (1961). The concept of average or mean capacity can be proven
useful, anyway, and has been successfully applied to the case of dynamic
allocation of users (Alencar and Blake 1993a).
In order to complete the investigation, in light of the results of Eric-
son, one presents the derivation of the formula for the average capac-
ity for the single user case, beginning with the general expression below
(Ericson 1970)
1
C = ER log (1 + R s )
2 2
(5.20)
2
in which s2 = σX2 /σZ2 is the SNR and E[ · ] represents the expected value
functional.

Example: introducing the Rayleigh pdf, from Equation 5.13, into the pre-
vious formula gives

1 ∞ r − r2
C= log (1 + r 2 s2 ) 2 e 2γ 2 u(r)dr. (5.21)
2 −∞ γ
58 • INFORMATION THEORY

The last equation can be simpliﬁed by the application of a known theo-

rem of probability (Blake 1987), leading to
∞ ξ
log e −
C= log (1 + ξ s2 )e 2γ 2 dξ , (5.22)
4γ 2 0

which can be solved by standards methods, yielding

1
log e 1
C=− Ei − 2 2 e 2γ 2 s2 (5.23)
2 2γ s

in which the Exponential Integral function is deﬁned by

x et
Ei(x) = dt, x < 0. (5.24)
−∞ t

Concluding this section, Figure 5.3 illustrates a comparison between

Formulas 5.15 and 5.23, in the single user case, for three values of the
fading parameter γ . The dotted curves represent the average capacity, for
each value of the parameter.

Capacity, nats per channel use

5
2.0

4
1.0

C 3
0.5

0
-10 0 10 20 30
SNR

Figure 5.3. Average and actual capacity, for γ = 0.5, 1.0, and 2.0.
MULTIPLE ACCESS SYSTEMS • 59

5.4 THE NONCOOPERATIVE MULTIPLE

ACCESS CHANNEL
The noncooperative channel has asynchronous multiplexing, which im-
plies that the M information sources transmit through a common medium
without any prearrangement among themselves (Sommer 1966). There-
fore, all components of the received signal are statistically independent,
and so perform as interference with respect to any other given signal.
For the conditions mentioned, a slight generalization of a former work
can be attempted (Sommer 1966). Based on the results stated in Shannon
(1948), the mutual information between the set of inputs and the output,
for the noncooperative additive Gaussian noise channel, can be written as
I (X1 , · · · , XM ; Y ) = H (Y ) − H (Y − G) (5.25)
in which

M
G= Xi . (5.26)
i=1
This leads to the already evaluated expression
I (X1 , · · · , XM ; Y ) = H (Y ) − H (Z) (5.27)
bounded as
M
1
I (X1 , · · · , XM ; Y ) ≤ log 2π e σi + σ Z
2 2
2
i=1
1
− log 2π eσZ2
2
M 2
i=1 σi + σZ
2
1
= log (5.28)
2 σZ2
1
=log (1 + S 2 ) = C. (5.29)
2
By the same token, one can compute the individual mutual information
I (Xi ; Y ) = H (Y ) − H (Y − Xi ) (5.30)
1
≤ log 2π e σZ2 S 2 + σZ2
2
1
− log 2π e σZ2 Si2 + σZ2 (5.31)
2
in which the interference to noise ratio (INR) is deﬁned as
M
j=1 j =i σj
2
Si =
2
. (5.32)
σZ2
60 • INFORMATION THEORY

Rearranging the terms in 5.31 gives

2
1 S +1
I (Xi ; Y ) ≤ log = Ci . (5.33)
2 Si2 + 1

The capacity region forms a square in two dimensions, a parallelepiped

for three sources, and a hyper-parallelepiped for more than three sources.
It can be seen that the hyperplane deﬁned by expression 5.29 never touches
the contour surface, which means that the capacity of the noncooperative
channel is given by the sum of the individual rates,

M
C= Ci (5.34)
i=1

or
2
1
M
S +1
C= log (5.35)
2
i=1
Si2 + 1

which can be put in the form

M
1 S2 + 1
C = log 2+1
. (5.36)
2 Si
i=1

Example: the last expression is maximized when all the transmitters carry
the same power, that is, σi2 = σ 2 . This assumption simpliﬁes the expression
of the INR to
M −1 2
Si2 = S (5.37)
M
and also simpliﬁes the formula for the channel capacity
M
1 S2 + 1
C = log M −1 2
(5.38)
2 M S +1

or
M
1 M (S 2 + 1)
C= log . (5.39)
2 M (S 2 + 1) − S 2
Example: an interesting result can be obtained by taking the limit of the
capacity when the number of sources goes to inﬁnity. First, the formula is
rearranged to
−M
1 S2
C = log 1 − (5.40)
2 M (S 2 + 1)
MULTIPLE ACCESS SYSTEMS • 61

or, by noticing that S 2 = Ms2

−M
1 s2
C= log 1 − . (5.41)
2 M (s2 + 1/M )

Now, using the known property of limits

M
1
lim 1+ =e (5.42)
M →∞ M

one obtains ﬁnally

log e
C∞ = . (5.43)
2
Thus, for a large number of transmitters the capacity assumes the value
log e/2, which is bounded and low if compared to the capacity that can be
achieved with the cooperative approach between the users. This must also
be compared to the single user capacity, obtained by allowing M = 1 in
expression 5.41
1
C = log (1 + s2 ) (5.44)
2
which is unbounded with respect to the SNR. The capacity region for
the noncooperative channel can be seen in Figure 5.4, for two users, as a
rectangle inside the diamond shaped capacity region for the cooperative
channel, evaluated by Wyner (1974). The points C1 , C1 , C2 , and C2 are
deﬁned below

1 σ12 1 σ12
C1 = log 1 + 2 C1 = log 1 + 2 (5.45)
2 σ2 + σZ2 2 σZ

C'1 R1 + R2 = C

0 C2 C'2 R2

Figure 5.4. Capacity region for the noncooperative channel, M = 2.

62 • INFORMATION THEORY

and

1 σ22 1 σ 2
C2 = log 1 + 2 C2 = log 1 + 22 . (5.46)
2 σ1 + σZ2 2 σZ

As the number of users increases towards inﬁnity the capacity region

shrinks, and the channel capacity converges to a given value. There is no
way to increase the capacity of the noncooperative channel by increasing
the power of the transmitted signals. The limiting capacity C∞ becomes
independent of the SNR, as the number of senders goes to inﬁnity. A possi-
bility left to boost the capacity is by an increase in the available bandwidth.
Spread spectrum techniques could prove useful in this case.

5.5 MULTIPLE ACCESS IN A DYNAMIC

ENVIRONMENT
The selection of the more appropriate multiple access technique, for a
given communications problem, depends on the channel characteristics.
This section describes and analyzes a model for the Gaussian multiple
access channel, in which the addition of new users is governed by a renewal
process. An information theoretic approach is attempted, in order to ﬁnd
the set of transmission rates at which the reliable communication of the
messages of each user is possible.
The problem of ﬁnding the capacity region of a multiple access chan-
nel can be linked to a seminal work by Shannon on two-way commu-
nication channels in 1961 (Shannon 1961). A usual assumption for the
Shannon model is a strategic cooperation between the senders and the
receiver.
The next section presents a different approach for the analysis of
the multiple access channel. Instead of relying on static models for
the channel, a dynamic structure is proposed and analyzed. The non-
cooperative model in which the users compete for the channel resources
is assumed. The access, or addition of new users to the channel, is gov-
erned by a Markovian process and the multiple access interference
(MAI) is assumed proportional to the number of active users (Kleinrock
1975). This model assumes a variable assignment for the allocation of
users in the channel, which is a reasonable assumption for CDMA
systems.
MULTIPLE ACCESS SYSTEMS • 63

5.6 ANALYSIS OF THE CAPACITY FOR A

MARKOVIAN MULTIPLE ACCESS CHANNEL
Most models proposed for the multiple access channel assume that the
number of users in the channel is fixed—but, in most cases, it is actually a
random variable, governed by a distribution of probabilities. The channel
capacity, for many channels, varies with the real time allocation of users.
For instance, the capacity for the noncooperative model of accessibility is
a function of the number of users which contribute to the multiple access
interference (MAI). In the memoryless noncooperative channel the capac-
ity decreases as more users are introduced, reaching a fixed value as this
number increases to infinity (Alencar 1992b; Sousa 1989).
In order to account for the effect of a changing environment, with the
users dropping in and out of the channel, a new model is proposed. Consider
a Gaussian channel with k users. The variance of the MAI in the channel is
assumed to be kσX2 plus some background Gaussian noise of σZ2 . Thus, the
total noise variance depends on how many users are on the channel. It is
meant to reflect a multiuser channel with other user interference modeled
by Gaussian noise. Now suppose the channel is in state k if there are k
users are on the channel, and a Markovian transition regime is imposed.
One model that might be interesting is a birth–death model, in which
transitions occur only between adjacent states—so from state k you can go
to either k + 1 or k − 1, with certain probabilities. Thus, for example, in a
synchronous system this might reflect the model in which the probability
of more than one user entering or leaving the system in a given data bit is
negligible. The steady state probabilities are not difficult to compute and a
formula for the average channel capacity for the model can be found.
The channel with extrinsic memory can be thought of as a dynamic
system, governed by a birth–death process, with a variable number of active
users and implying a dynamic capacity—contrary to the classical static
capacity that assumes a fixed number of users. It can be seen as an infinite
state channel (ISC), as opposed to the finite state models for channels found
in the literature (Gilbert 1960; Fritchman 1967). In this new approach, the
channel capacity changes as the number of users changes in a dynamic
fashion. This last quantity is considered a stochastic process, and has an
underlying Markov chain of states (Kleinrock 1975).
A transition probability matrix P = {pij } governs the behavior of the
channel accessibility. The transition probabilities are obtained from the
Markov model depicted in Figure 5.5, in which αk and βk are the birth and
death parameters, respectively. The channel will be called the Alpha-Beta
64 • INFORMATION THEORY

αk–1 αk

k–1 k k +1

βk βk+1

Figure 5.5. Markov model for the multiple access channel.

channel, or AB channel for short. The state of the channel k at a given

time depends only on the previous state. In order to simplify the results,
the input (X ) and output (Y ) alphabets are assumed to follow a Gaussian
distribution. The model AB assumes all users transmitting with the same
power σX2 and the Gaussian plus MAI noise, at any state k, is obtained from
the following equation
σk2 = kσX2 + σN2 . (5.47)

After a certain number of interactions, the Markov chain reaches a steady

state. The set of steady state probabilities = {πk |k = 1, 2, 3 . . .} can be
computed by means of one of the well-known techniques (Kleinrock 1975;
Adke and Manjunath 1984). Each state k deﬁnes the number of users,
the multiple access interference and the conditional capacity given that
the system is in that state, C(k). This conditional capacity is given by the
solution of the noncooperative model for the additive Gaussian channel
(Alencar and Blake 1993b).
Therefore, the formula for the capacity of the channel with extrin-
sic memory is deﬁned, in connection with the concept of mean capacity
(Dobrushin 1961), as the supremum of the weighing of each state capacity,
C(k), by the associated steady state probability, πk , given by Wang (1992)

M
C = sup πk C(k). (5.48)
{πk } k=1

The conditional capacity of the noncooperative channel, given that the

channel is in state k, is given by the sum of the individual capacities, Cik ,
for each i-th user (Alencar and Blake 1993b),

C(k) = kCik (5.49)

or
2
1
k
S +1
Ck = log (5.50)
2
i=1
Si2 + 1
MULTIPLE ACCESS SYSTEMS • 65

in which S 2 represents the total SNR. The expression can be put in the form
k
1 S2 + 1
Ck = log . (5.51)
2 S2 + 1
i=1 i

The last expression reaches a maximum when all the transmitters carry
the same power, that is, σi2 = σX2 . This assumption simpliﬁes the formula
of the interference to noise ratio (INR) to

k −1 2
Si2 = S (5.52)
k
and also simpliﬁes the formula for the channel conditional capacity, giving
k
1 S2 + 1
Ck = log (5.53)
2 k−1 2
k S +1

or
k
1 k(S 2 + 1)
Ck = log , (5.54)
2 k(S 2 + 1) − S 2

which can be further simplified by noticing that the total SNR is S 2 = ks2 ,
in which s2 = σX2 /σZ2 .
Two special cases for the AB model are treated here. First, the case
in which the birth and death parameters are considered constant for any
state, and the access is unrestricted. Second, the case in which the users
are discouraged by the amount of noise in the system. This occurs because
every new user can be thought to degrade the system to a certain extent,
by decreasing the SNR. For the first case, αk = λ and βk = μ. For the
second case, αk = λ/(k + 1) and βk = μ, in which λ and β are fixed
probabilities.

Example: the ﬁrst case produces a Geometric distribution for the proba-
bilities.
This distribution shows the likelihood of being in a certain state in the
long run, after the transient dies out and is given by the equation

pk = (1 − ρ)ρ k , k = 0, 1, 2, . . . , for ρ < 1 (5.55)

in which ρ = λ/μ is usually called the utilization factor of the system.

For the Geometric distribution considered, the statistical mean is given
by ρ/(1 − ρ) and the variance by ρ/(1 − ρ)2 . The probability of ﬁnding
66 • INFORMATION THEORY

Figure 5.6. Capacity for the channel with Geometric accessibility, as a

function of the signal-to-noise ratio, for different values of ρ.

more than L users at a time in the system is given by ρ L+1 . Using Equa-
tion 5.48, and substituting the appropriate terms, the capacity for this case
is deﬁned as the maximum, over the set of utilization factors

∞
k
(k + 1)σX2 + σN2
C = max (1 − ρ)ρ k log , (5.56)
ρ
k=1
kσX2 + σN2

whose plot is presented in Figures 5.6 and 5.7.

As can be seen, from the plots, the limiting value of the sum capacity,
as the utilization factor increases, is the same obtained in the case of non-
cooperative users of the previous section. The sum capacity decreases, for
high SNR, to log e/2, with an increase in the utilization factor. For low
SNR, on the contrary, the curves show an increase to the limiting value, as
the system tends to occupy states of higher order.

Example: the solution for the second case seems to be a better approach
to reality. A Poisson distribution is obtained for the stationary probabilities
of the ISC model, given by
MULTIPLE ACCESS SYSTEMS • 67

Capacity, bits per channel use

4 SNR=30dB

3
SNR=20dB
C
2

SNR=10dB
1
log e/2
SNR=0dB
0 SNR=-10dB
0.2 0.4 0.6 0.8 1
Utilization factor

Figure 5.7. Capacity for the channel with Geometric accessibility, as a

function of the utilization factor, for different values of the signal-to-noise
ratio.

ρ k −ρ
pk = e , k = 0, 1, 2, . . . (5.57)
k!
with mean and variance equal to ρ.
The capacity, therefore, is given by the maximum, over the range of
utilization factors, of the following formula

∞
k
ρk (k + 1)σX2 + σN2
−ρ
C = max e log . (5.58)
ρ
k=1
k! kσX2 + σN2

The results are shown in Figures 5.8 and 5.9 as a function of the SNR
and utilization factor, respectively. Figure 5.9, in particular, gives a good
example of the system behavior as the utilization factor increases. It can
be seen that, as the distribution becomes more uniform, the sum capacity
tends to the already computed limit log e/2. As the factor increases there
is a more pronounced tendency for the occupation of higher states in the
Markov chain.
Several models were described in this chapter. The ﬁrst modeled the
case when the users acknowledge a certain protocol for transmission and
cooperate to achieve the maximum transmission rates. In the second model,
the users do not cooperate, in the sense that they transmit their signals
68 • INFORMATION THEORY

Capacity, bits per channel use

2
0
1.8
1
1.6

1.4 2

1.2
3
C
1 4
5
0.8
log e/2
0.6

0.4

0.2

-10 0 10 20 30
SNR

Figure 5.8. Capacity for the channel with Poisson accessibility, as a func-
tion of the signal-to-noise ratio, for ρ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

without concern to any approach to maximize the transmission of infor-

mation. The third model included the role of accessibility on the evaluation
of the capacity. The information theoretic approach is used in order to ﬁnd
the information rates at which the simultaneous and reliable communica-
tion of the messages for each user is possible.
For the cooperative model, the capacity region was obtained and the
effect of fading was estimated for the case of Rayleigh attenuation factors.
The capacity was shown to be attained by a process of “Gaussianization”
of the output in terms of the input distribution. It is interesting to notice
that the multipath effect, discussed in the last chapter, can be analyzed by
the same techniques presented in this chapter and can provide a net gain
for systems that use spread spectrum techniques (Alencar 1992a).
Spread spectrum thus provides the necessary diversity for the system,
and the multipath phenomenon can be used to the advantage of the trans-
mission. In a real channel, pseudo-noise spread spectrum signals are used
to resolve the multipath introduced by the environment.
Two problems of interest associated with the closeness of an approx-
imation to the capacity of a fading Gaussian noise channel have been
MULTIPLE ACCESS SYSTEMS • 69

Capacity, bits per channel use

C
1.4 SNR = 20dB

1.2

1 SNR = 10dB

0.8

log e/2
SNR = 0dB
0.6

0.4
SNR = -10dB

0.2

0
0 2 4 6 8 10
Utilization factor

Figure 5.9. Capacity for the channel with Poisson accessibility, as a func-
tion of the utilization factor, for some values of the signal-to-noise ratio.

considered. The relative closeness of the capacity, as given by Ericson,

to the expression proposed, independent of SNRs, is of interest, allowing
the simpler expression to be used with some confidence in some analyses.
The more precise result available for the case of Rayleigh fading is also
of interest. The consideration of capacity for this case led to the interesting
distribution problem discussed in this chapter.
Some results were obtained from the analysis of the noncooperative
model. First, as the number of users increases towards infinity, the capacity
region shrinks and the channel capacity converges to a given value. Second,
there is no way to increase the capacity of the noncooperative channel by
increasing the power of the transmitted signals in this limiting case. The
channel capacity C∞ becomes independent of the SNR as the number of
senders goes to infinity.
As discussed before, the classical Shannon capacity formula offers a
tradeoff between signal power and bandwidth. In the noncooperative case,
because of the inherent restriction on the transmitted power, the possibility
left to enlarge the capacity is by an increase in the available bandwidth and
spread spectrum techniques could prove useful in this case.
As discussed previously and inferred from the mathematical results and
plots, the capacity for the Markovian multiple access channel is dependent
on the utilization factor of the system. For both Geometric and Poisson
distributions, when more users gain access to the AB channel the capacity
70 • INFORMATION THEORY

increases up to a point in which it reaches a maximum. From this point on,

if the number of users entering the channel is proportionately larger than
the number of drop outs, the capacity begins to decrease. The capacity will
eventually reach zero as the utilization factor increases to the limit. In the
case of a Geometrically distributed accessibility, the maximum capacity is
achieved for a ρ = 0.97. The maximum value of ρ is one.
For the Poisson model the maximum is reached for ρ = 1.3, approx-
imately. There is no limit for ρ if this distribution is used. The Poisson
distribution models the case in which the incoming users are discouraged
by the amount of noise in the system. As the SNR decreases, more users
abandon the channel. Both the Geometric and Poisson models result in
capacities that are upper bounded by the capacity of the noncooperative
channel, with ﬁxed assignment, as the number of users go to inﬁnity (Alen-
car and Blake 1993b).
This model is suitable for describing the properties of systems that
present a variable assignment for user allocation, as is the case with CDMA
techniques.
CHAPTER 6

CODE DIVISION MULTIPLE

ACCESS

6.1 INTRODUCTION
This chapter presents some of the main characteristics of spread spectrum
systems, including the indication of possible types of interference, anti-
jamming advantages of the technique in terms of signal-to-noise (SNR)
ratio and possible applications of the techniques. The usefulness of spread
spectrum to overcome interference is stressed. Among others, the main
sources of interference are multipath, multiple media, multiple user, and
multiple cell (Viterbi 1991).
The ﬁrst developments using concepts related with the current tech-
niques of spread spectrum occurred in the mid-forties, and the applications
involved military problems (Scholtz 1982). The techniques acquired great
popularity in the communications community, for the last 20 years, since
the papers on the subject began to appear after two decades of secrecy due
to military interests.
The main feature of spread spectrum signals is related to their band-
widths, which are much larger than those assigned to baseband or mod-
ulated signals. The way in which the spectrum is spread is essential, and
is often realized by a code that is independent of the transmitted data.
The receiver needs a replica of the code, synchronized in time, in order to
recover the data.
The spreading process introduces a great deal of redundancy in the
transmitted signal. This allows the receiver to suppress part of the interfer-
ence embedded in the signal, which is the chief advantage of the process.
It is possible to devise other advantages of the spread spectrum systems
in relation to personal communications (Pickholtz, Schilling, and Milstein
1982; Scholtz 1982).
72 • INFORMATION THEORY

• Allows the addressing of a selective group of customers in the same

channel, that is, code division multiple access (CDMA).
• Utilization of the signal to measure the propagation delay, which is
important in personal communications systems to access the atten-
uation in the channel.
• It is possible to hide the signal in the background noise in order to
make its reception difﬁcult by those who do not know the spreading
code (cryptographic capability).
• Useful as a means of diversity in multipath channels, which turns
out to be the main problem of personal communications systems.
• Features a low probability of interception (jamming) of the signal,
due to the randomness of the spreading code.
• Could improve the data rate, limited in wireless channels by multi-
path and fading problems.

One of the strategies in use to spread the spectrum is the Direct Sequence
modulation (DS), in which a fast pseudo-random sequence causes ampli-
tude switching in the data carrying signal, as suggested in Figure 6.1. In this
Figure, the data signal is represented by b(t), the pseudo-noise sequence is
c(t), Tb is the bit interval, and Tc is the chip interval.
Figure 6.2 shows a block diagram for the direct sequence spread spec-
trum system. The signal is randomized before reaching the modulator. At
the receiver, a synchronization circuit retrieves the PN code to produce the
original signal.
Another technique used is called Frequency Hopping (FH), in which
the signal is subject to frequency switching, according to a predetermined
pseudo-random sequence.
Figure 6.3 illustrates the FH technique, in which a pseudo-noise gener-
ator is used as input to a synthesizer to produce a pseudo-random sequence
of frequencies that shifts the originally modulated signal frequency.

b(t)
+1

Tb 2Tb t
-1
c(t)
+1

Tc Tb 2Tb t
-1

Figure 6.1. Data signal and pseudo-noise sequence.

CODE DIVISION MULTIPLE ACCESS • 73

Mixer Modulator Channel Demodulator Mixer

PN Generator Sync circuit PN Generator

Figure 6.2. Direct sequence spread spectrum system.

Modulator Mixer Channel Mixer Demodulator

PN Generator Synthesizer Synthesizer PN Generator

Figure 6.3. Frequency hopped spread spectrum system.

Buffer/Gate Modulator Channel Gate/Buffer Demodulator

PN Generator PN Generator Sync circuit

Figure 6.4. Spread spectrum using random time windows.

It is also possible to produce spread spectrum by transmitting bursts of

the signal in random instants of time. This technique is known as Time
Hopping (TH) and is illustrated in Figure 6.4.
Practical systems can combine the presented techniques, as is usually
the case, to obtain certain properties. In a sense, the spread spectrum system
is an antithesis of the usual communication system. Because, in contrast
to the usual system premises, it requires a very large bandwidth and a very
low power spectrum density for the transmitted signal.
Despite such advantages, the spread spectrum approach had some cri-
tiques as quoted below:

The mystique of spread spectrum communications is such that

commercial enterprises, as well as academics, are often attracted
by the novelty and cleverness of the technique. Also, in small artiﬁ-
cially aided markets, there may be temporary economic advantages.
74 • INFORMATION THEORY

In the long run, though, it’s the author’s opinion that we must stand
back and question the wisdom of squandering a precious resource
such as bandwidth for reasons of expediency (Viterbi 1985).

6.2 FUNDAMENTALS OF SPREAD

SPECTRUM SIGNALS
It was mentioned in the last section that the spread spectrum techniques
can be used to protect the signal against jammers. The following discusses
the most common types of jammers, and how is it possible to attain this
objective.
An attempt to classify the jammers is sketched below (Pickholtz, Schilling,
and Milstein 1982; Cook and Marsh 1983):

• Partial band jammer. The interference is concentrated in a portion

of the signal bandwidth.
• Partial time jammer, in which the intentional interference occurs in
only part of the transmission time. This can be accomplished by the
transmission of narrow pulses or impulses.
• Tone jammer, which concentrates its power in a single frequency.
• Broad-band noise jamming. In this case the jammer tries to attack
the whole signal bandwidth. One possible approach is the genera-
tion of a pseudo-noise sequence (PN) by the jammer.
• Repeater jamming. The jammer uses the very transmitted signal to
cause the interference. This type of interference can produce a dele-
terious inﬂuence on the signal spectrum that mimics the multipath
effect.

The vectorial, or signal space, approach is introduced here to explain

how spread spectrum can be used in parallel with other existing sys-
tems and still protect the signals against interference to or from those
systems (Wozencraft and Reiffen 1961). A data signal, with restricted
dimensionality, is spatially expanded in such a way that prevents the degra-
dation of the communication process by a limited power interfering
signal.
The classical digital transmission problem is the one in which both
transmitter and receiver know the complete set of M waveforms xi (t), each
one with duration Tb seconds. The transmitter selects one of the M sig-
nals at each Tb seconds, which yields a data rate of log2 (M )/Tb bits/s.
If, for example, signal xj (t) is transmitted, the received signal over the
CODE DIVISION MULTIPLE ACCESS • 75

interval [0, Tb ] will be y(t) = xj (t) + nI (t) + n(t), in which n(t) is the addi-
tive white Gaussian noise (AWGN) with power spectral density of N0 /2
W /Hz and nI (t) is the multiple access interference, also assumed to be
Gaussian distributed.
It can be demonstrated that it is possible to represent the whole signal set
by the use of no more than L (L ≤ M ) orthogonal basis functions. It must
also be pointed out that only those noise waveforms inside the signal space
are relevant. In theory, a noise signal would require an infinite number of
components. It is said that the above defined set has dimension N , if this
is the number of basis functions needed to characterize the set of signals
(Wozencraft and Reiffen 1961). It can be shown that L ≈ 2WTb , in which
W is the approximate bandwidth shared by the set of signals. The optimum
receiver for an AWGN channel consists of a bank of correlators, or a bank of
matched filters. The decision strategy is to choose the maximum correlator
output. The performance of this system depends only on the ratio of the bit
energy to the noise power spectrum density.
The key operation in the whole process is the spectral compression of the
interfering signal accomplished by the receiver. This is done by correlating
the received signal with a known spreading code. The information bear-
ing signal is compacted and, simultaneously, the interfering signal power
is spread over the frequency range. This implies an effective advantage
for the data signal in relation to the interfering signal, but requires perfect
synchronization of the pseudo-random code in the receiver. For a signal
of duration Tb and bandwidth W , the approximate number of dimensions
is 2WTb and the processing gain, or sequence code length, can be writ-
ten as

Tb
N= (6.1)
Tc

in which Tc is the chip interval.

S(f) S(f) S(f)

Signal spectrum, f (Hz) Received spectrum, f (Hz) Demodulated spectrum, f (Hz)

Figure 6.5. Spectra of transmitted and received signals.

76 • INFORMATION THEORY

6.3 PERFORMANCE ANALYSIS OF

CDMA SYSTEMS
The main areas of concern on multiple access systems are multiaccess
information theory, collision resolution, and spread spectrum (Gallager
1985). The approach followed by researchers on information theoretic
multiaccess takes into consideration the effects of noise and interference,
neglecting the random arrivals of messages. On the other hand, research on
collision resolution focused primarily on the bursty arrivals of messages
and the interference between transmitters ignoring the noise component
(Hui 1984a; Hui 1984b).
For multiaccess communication using spread spectrum several sources
can transmit at the same time, using different modulating sequences, and
each will look like broadband noise to the others. Spread spectrum could
also provide a solution to the problem of channel allocation.
One of the main problems in communications is the simultaneous
allocation of multiple users to a channel. This must be accomplished
without severe degradation in the performance of the system. Three tech-
niques have been proposed to attack the problem, as mentioned in the
introduction: FDMA, TDMA, and CDMA. The aim of the ﬁrst two tech-
niques is to separate the signals in frequency and time, respectively. The
third one uses the properties of spread spectrum to separate the signals in
code, that is, to modulate each message signal with a sequence selected
from a set of orthogonal sequences. Each user has its own address embed-
ded in one of the code sequences (Cook et al. 1983). CDMA relies on
the ability of a spread spectrum system to reject independent interfer-
ence by a factor approximately equal to its processing gain (Simon et al.
1985).
An ideal system should combine the features of all the mentioned multi-
ple access techniques, for example, the orthogonality of the TDMA system
leads to simultaneous achievability of single-user capacities by all users.
In the case of FDMA or CDMA, this goal could be attained by a sufﬁcient
separation of the frequencies in the former, and by the use of very large
spreading factors in the latter (Cheng and Verdú 1991).
Multiple access techniques such as FDMA and TDMA are suscepti-
ble to various types of interference, including jamming and multipath.
FDMA also presents the intermodulation problem. On the other hand,
TDMA must have all the users in perfect synchronism. Both FDMA and
TDMA have their capacities limited by the available bandwidth. CDMA
also has its drawbacks. Interference caused by transmitters that are close to
the receiver cause the near-far problem. In a mobile environment, the near-
far effect can be combated by the selection of sequences with very good
CODE DIVISION MULTIPLE ACCESS • 77

cross-correlation properties, as discussed in the next section, or by the use

of adaptive power control (Schilling et al. 1991).
The second problem is the possibility of interference with existing sys-
tems (Kavehrad and Ramamurthi 1987). Unlike the former techniques,
CDMA has its capacity solely limited by the amount of interference into
the system, which allows any reduction in interference to be converted
directly and linearly into and increase in capacity (Gilhousen et al. 1991).
As pointed out before, the use of spread spectrum allows new users to
share the spectrum with existing ones. CDMA is also interference limited,
that is, the number of users that share the same spectrum and still maintain
an acceptable performance is determined by the interference generated
by the set of remaining users. In CDMA systems, each user is given a
distinct pseudo-random code. Hence, users in adjacent cells use the same
bandwidth and therefore interfere, making a received signal appear noisier
as the trafﬁc increases in the system. The effect of the interfering energy
depends on the processing gain of the spread spectrum technique.
Taking into account the adjacent cells interference, the total number of
users M that can simultaneously access a given cell can be estimated from
the following equation M = 3GP /8, in which GP is the processing gain. The
computation of M above assumes all users simultaneously transmitting.
For a speech activity factor p and an interference increase factor λ the
heuristic formula above can be adjusted to give the net number of users U
that can access a given cell in a given time period (Schilling et al. 1991)

M
U= . (6.2)
V (1 + λ)

A more formal approach can be followed with the use of algebraic coding
theory and the deﬁnition of balanced codes. An (n, w, λ) binary balanced
code is a set of binary n-tuples, each of weight w, such that any two words
agree in at most λ positions. If the maximum number of words in a balanced
code is U , it follows that (Blake and Mullin 1976; Chung and Kumar 1990)

n
λ+1
U≤ . (6.3)
w
λ+1

In the last expression it is evident that the maximum of the autocor-

relation function is given by w, and a good set of codes requires that the
minimum distance, d = 2(w − λ), be kept at a maximum.
As implied above, spread spectrum systems are less subject to multipath
signal variations than conventional systems. In a direct sequence receiver,
78 • INFORMATION THEORY

if the reflected signal is delayed, compared to the direct signal, by more than
one code chip, the reflected signal is treated as any other uncorrelated input
signal. The higher the code chip rate, for a particular system, the smaller
its multipath problem. In the mobile environment, the result of multipath
reception is that a moving receiver is subject to rapid fading and peaking of
its input signal as the the direct and reflected signals cancel and reinforce
each other. Stationary receivers are also subject to multipath due to the
reflecting and refracting surfaces (Dixon 1984).
A few topics have interested researchers, industries, and users of CDMA
systems. The first one is the design of pseudo-noise sequences with good
autocorrelation and cross-correlation properties. The second concerns the
evaluation of the system capacity, or how many users a CDMA system can
support under given circumstances. The number of allocated users dictates
the type of distribution to characterize the Multiple Access Interference
(MAI). A large number of subscribers favor the Gaussian approximation.
For a small number of subscribers, considering that the MAI is additively
composed of cross-correlations limited to a few chips of coincidence, the
maximum interference is too small to apply the central limit theorem (Brady
1991).
The evaluation of system performance can be done in two ways: using
the average probability of error or computing the outage of the system. The
probability of error is averaged over the ensemble of channels obtained
from the assumed model. The outage measures the probability of trans-
mitting through a channel when the probability of error is greater then a
predefined threshold. In the case of time-varying channel and stationary
terminal, outage indicates the fraction of time the terminal operates with
an error probability above the limit. For the time-invariant channel and
mobile terminal, outage indicates the fraction of building area in which the
terminal can operate (Kavehrad and McLane 1987).
One interesting peculiarity of Direct Sequence Spread Spectrum (DSSS)
is that error detection and correction may not be advisable in some applica-
tions. This is due to the effect of the coding overhead, which increases the
signal apparent data transmission rate and reduces the available processing
gain (Dixon 1984). On the other hand, a substantial improvement in the
capture performance of coded DSSS can be achieved, in comparison to an
uncoded system. Capture phenomena characterize the ability of a receiver
to successfully receive a packet, even though part or all of the packet arriv-
ing at the receiver overlaps with the other packets. The basic mechanism
for capture is the ability of the receiver to synchronize with and lock on
to one packet and subsequently reject other overlapping packets as noise
(Soroushnejad and Geraniotis 1991).
CODE DIVISION MULTIPLE ACCESS • 79

In particular, time-frequency hopping modulation has a potential for

application in those systems in which a large number of users, with widely
variable distances or transmitted power, are to operate simultaneously in a
single link. Such systems tend to employ a simple code sequence, primarily
as an addressing medium, rather than to spread the spectrum speciﬁcally.
This type of modulation is suitable for IWDC systems, in which random
access and discrete addressing are the main concerns. It also offers one of
the few viable solutions to the near-far problem. By the use of synchro-
nization the users can be programmed to transmit on different frequencies
as well as different times (Dixon 1984).

6.4 SEQUENCE DESIGN

Modeling a spread spectrum system implies the generation of random
sequences. Those sequences must exhibit the following properties: i) each
signal in the set is easy to distinguish form a time-shifted version of itself;
ii) each signal in the set is easy to distinguish from a time-shifted version
of every other signal in the set (Sarwate and Pursley 1980).
In real life, practical constraints dictate the use of pseudo-random, or
pseudo-noise sequences (PN), whose properties are well known. The auto-
correlation and the cross-correlation of the sequences play an important
role in designing the system, and are responsible for the spectral charac-
teristics of the spread spectrum signal (Sarwate and Pursley 1980). The
deﬁnition of a pseudo-random sequence is done in terms of the periodic
auto and cross-correlation functions. For a sequence xk , k = 0, . . . , N − 1,
the periodic autocorrelation is

N −1
rX (i) = x(k)x(k + i) (6.4)
k=0

in which the addition is taken modulo N . Given two sequences x and y of

length N , the periodic cross-correlation function is

N −1
rXY (i) = x(k)y(k + i). (6.5)
k=0

A pseudo-random sequence is usually deﬁned as a binary sequence

such that for i = 0, |rX (i)| is small compared to rX (0). On the other hand, a
pseudo-random ensemble of sequences of length N has the cross-correlation
between any two distinct sequences small compared to N , or to the peak
autocorrelation function.
80 • INFORMATION THEORY

The maximal length sequences (m-sequences) are individual codes with

good autocorrelation properties, while the Gold and Kasami codes form
ensembles that have good cross-correlation properties. The m-sequences
are related to simplex signal sets, ﬁnite ﬁelds, and error-correcting codes.
The Gold and Kasami sequences are constructed from certain subsets of
the m-sequences (Blahut 1990).
When modulating a carrier with a code sequence one-zero balance can
limit the degree of carrier suppression, which is dependent on the sym-
metry of the modulating sequence. The amount of offset produced by the
sequence is inversely proportional to the sequence length. Therefore, a long
code sequence can minimize the offset effect (Dixon 1984).
Pseudo-random sequences feature:

• Facility of circuit implementation.

• A controlled degree of randomness.
• Large periods for the chip sequence.

6.4.1 MAXIMAL-LENGTH LINEAR SEQUENCES

The binary pseudo-random sequences are produced by shift-registers and

have been used for many years in different areas of electronics and com-
munications. In much of the literature concerning periodic sequences, the
terms pseudo-random sequence, pseudo-noise sequence, and maximal-
length linear sequence are used synonymously. The sequence has a length
of N = 2m − 1 bits, and is produced by a shift-register with m stages. The
periodic m-sequence is constructed using the Galois Field GF(2m ) (Blahut
1990). Field GF(2m ) is built using an irreducible polynomial p(x) of degree
m over GF(2) (Blake and Mullin 1976).
In CDMA applications, a large number of code sequences is needed.
Under these conditions, multiple feedback points are necessary since the
maximum number of codes of any length available from a set of delay
elements using single-tap feedback would be only m − 1. Unfortunately,
many of the m − 1 codes may be very short cyclic sequences, or the gen-
erator may halt operation altogether by entering zeros in all elements in
some conﬁguration.
The maximum length sequences can be shown to possess a peak cross-
correlation as high as 2(m+1)/2 − 1, implying that their use in a CDMA
application will lead to considerable multiaccess interference. The number
of such sequences with the smallest possible value for cross-correlation
is very limited and depends on the choice of N . The maximum length
sequences can be regarded as codewords of minimum weight in the dual of
CODE DIVISION MULTIPLE ACCESS • 81

a Hamming code, generated by a primitive polynomial. The dual contains

only one codeword and all its cyclic shifts.
Using all the possible linear combinations of feedback taps for an m-
stage register, there are [φ(2m − 1)]/m maximal sequences that can be
generated. The last expression represents an Euler totient function, that is,
the number of positive integers that are relatively prime to and less than
2m − 1.
Table 6.1 lists the number of maximal sequences available for register
lengths 3 through 20. Among the listed sequences are also some Mersenne
prime codes that are deﬁned as those codes for which the code length
N = 2m − 1 is a prime number (Dixon 1984). What is interesting about
the data presented in the table is the number of codes available, that is
not a monotonic function of the register length. For instance, it is always
preferable to use the least prime number for m as one progresses in the
table because it gives the maximum number of sequences.
When several users share the same channel without phase coherence,
as in the case of a totally asynchronous system, it is desirable that the
cross-correlation functions be small in absolute value. In addition, the

Table 6.1. Number of maximal sequences

m Number of codes Prime factors of 2m − 1

3 2 7
4 4 3 and 5
5 6 31
6 4 3, 3, and 7
7 18 127
8 16 3, 5, and 17
9 48 7 and 73
10 60 3, 11, and 31
11 176 23 and 89
12 96 3, 3, 5, 7, and 13
13 630 8,191
14 756 3, 43, and 127
15 1,800 7, 31, and 151
16 2,048 3, 5, 17, and 257
17 7,710 131,071
18 1,728 3, 3, 3, 7, 19, and 73
19 27,594 524,287
20 19,200 7, 7, 127, and 137
82 • INFORMATION THEORY

1 2 ... m-1 m

Figure 6.6. Pseudo-noise sequence generator

autocorrelation function must attain the maximum possible value. In matched

ﬁlter detection, the output of the detector is the correlation of the input and
a replica of the desired signal.
The investigation of lower bounds for the cross-correlation between
pseudo-sequences is still a matter of research. One of the important results
for the multichannel case is due to Welch, and states that the lower bound
on the peak magnitude of the cross-correlation rMAX is (Welch 1974)

M −1
rMAX ≥ N (6.6)
MN − 1

in which the set is composed of M sequences of length N .

Example:
√ for large values of N and M the bound is well approximated
as N .

6.4.2 GOLD SEQUENCES

A method for choosing the linear maximal codes, to be used as compo-

nents of Gold sequences, was introduced in 1967 (Gold 1967). The Gold
sequences are generated by modulo-2 addition of a pair of maximal-length
sequences, as shown in Figure 6.7. The code sequences are added on a
chip basis, by synchronous clocking. The main feature of this type of
sequence generator is the large number of codes it supplies, although it
requires only one pair of feedback tap sets (Dixon 1984).
The Gold sequences are constructed from m-sequences. For a given m
and a pair {a, b} of distinct m-sequences, the Gold code is the set

C = {a, b, a + T i b : i = 0, . . . , 2m − 2}. (6.7)

The Gold code contains 2m + 1 sequences of block-length 2m − 1. The

elements of the code are called Gold sequences. Sequences a and b
are selected so their cross-correlation function has a maximum value
CODE DIVISION MULTIPLE ACCESS • 83

1 2 ... m-1 m

+
+

1 2 ... m-1 m

+ +

Figure 6.7. Gold sequence generator.

2(m+2)/2 + 1. Such a pair of sequences will always exist. The cross-

correlation functions and, except of the main peak, the autocorrelation
functions of a Gold code can only take values in the set {−1, −2(m+2)/2 −
1, 2(m+2)/2 − 1}.
Hence the largest magnitude of any cross-correlation of pairs of sequences
from C is 2(m+2)/2 + 1 (Blahut 1990). Table 6.2 compares the relative peak
cross-correlations for the m-sequences and Gold sequences (Proakis 1989).
It can be seen that the periodic cross-correlation function between the pairs
of m-sequences has large peaks when compared to the Gold sequences of
the same period.

Table 6.2. Relative peak cross-correlations of m-sequences, Gold

sequences and Kasami sequences

m m-sequences Gold sequences Kasami sequences

3 0.71 0.71 –
4 0.60 0.60 0.33
5 0.35 0.29 –
6 0.36 0.27 0.14
7 0.32 0.13 –
8 0.37 0.13 0.07
9 0.22 0.06 –
10 0.37 0.06 0.03
11 0.14 0.03 –
12 0.34 0.03 0.03
84 • INFORMATION THEORY

Gold codes are useful for CDMA applications for they form a large set
of codes. But, with the exception of a and b, the set of Gold sequences
are not maximum-length shift-register sequences of length N . Hence their
autocorrelation functions are not two-valued. Such codes can be viewed
as the duals of the cyclic codes generated by the product of two primitive
polynomials.
The advantage of Gold sequence generators is the possibility of pro-
ducing a new sequence with every change in phase position between the
two primary generators. In addition to this, the Gold codes may be chosen
so that, over a set of codes available from a given generator, the cross-
correlation between the codes is bounded. Thus, Gold codes are attractive
for applications in CDMA.
Furthermore, the maximal sequences of the same length are not guar-
anteed to have bounded cross-correlation. But,√ for a given value of m, the
Gold codes exhibit cross-correlation that is 2 greater than the maximal
length sequences of the same length (Dixon 1984).
A method for choosing the linear maximal codes, to be used as compo-
nents of Gold sequences, was introduced in 1967. The outputs form a set
of sequences whose members display cross-correlation and autocorrelation
side-lobes bounded by 2(m+1)/2 + 1 for m odd, and by 2(m+1)/2 − 1 for m
even (Gold 1967).

6.4.3 KASAMI SEQUENCES

The Kasami ensemble of codes is also constructed from m-sequences. Each

code is a collection of sequences that presents good cross-correlation prop-
erties. Consider a m-sequence a of length N = 2m − 1, for m even. This can
be written as N = (2m/2 − 1)(2m/2 + 1), which means that starting with
any position of a and taking every (2m/2 + 1)th bit of sequence a cyclically
repeated gives a sequence b of length N and period (2m/2 − 1). There are
(2m/2 − 1) distinct shifts of b.
The Kasami code is the set

C = {a, a + T i b : i = 0, . . . , 2m/2 − 2} (6.8)

in which the operator T denotes a cyclic shift by one bit position. The
Kasami sequences are the elements of the Kasami code, each sequence
having block-length N .
The cross-correlation functions and off-peak autocorrelation functions
of elements of a Kasami code only take values in the set {−1, 2m/2 − 1,
−2m/2 − 1}. Hence the largest magnitude of any cross-correlation function
CODE DIVISION MULTIPLE ACCESS • 85

pairs of sequences from the code is 2m/2 + 1 (Blahut 1990). The Kasami
sequences also satisfy the Welch bound, and are in this sense optimal
(Proakis 1989).
It might be noted that if the sequences are allowed to be deﬁned over the
complex mth roots of unity, for some positive integer m, then the auto and
cross-correlation properties of the sequences could be improved greatly,
and still maintain the balanced and run properties of the m-sequences, that
is, there is approximately an even distribution of each of the 2m elements
of the Galois Field, and short runs of identical symbols are more likely to
appear than longer runs (Komo and Liu 1990). An extensive work on the
subject of sequence design is found in (Sarwate and Pursley 1980).
Some attributes of spread spectrum, employed by CDMA systems, are
important for personal communications systems, including the utilization
of the voice activity factor, no need for equalization, use of one radio
per site, soft handoff, no need for guard time, sectorization for capacity,
less fading for the wide-band signal, easy transition from current systems,
capacity advantage, no frequency management or assignment needed, soft
capacity, coexistence with other systems and suitability for microcell and
in-building systems (Lee 1991).
Modeling a spread spectrum system implies the generation of random
sequences. Those sequences must exhibit the same properties, in order to
be eligible for CDMA use. In real life, practical constraints dictate the use
of pseudo-random, or pseudo-noise sequences (PN), whose properties are
well-known.
CHAPTER 7

THE CAPACITY OF A CDMA

SYSTEM

7.1 INTRODUCTION
The area of Personal Communications Network (PCN) is a huge success in
the communications ﬁeld. The main technology used in mobile communi-
cations is code division multiple access (CDMA) that has some advantages
over the previous techniques. This chapter describes and analyzes a model
for the discrete-input continuous-output, Direct-Sequence Spread Spec-
trum (DSSS), Gaussian multiple access channel.
In this chapter, the modulation scheme is specialized, with the choice
of DPSK modulation. This modulation scheme deserves some comments.
DPSK is regarded by many authors as a natural choice for CDMA, espe-
cially if one considers a fading multipath environment, in which syn-
chronous carrier recovery is a difﬁcult task (Misser, Wijffels, and Prasad
1991; Turin 1984b).
As a differentially coherent modulation technique, DPSK also compares
well to similar systems using coherent PSK modulation and is less trouble-
some to demodulate in a complex environment such as an indoor wireless
channel (Kavehrad and McLane 1985; Kavehrad and Ramamurthi 1987).
The capacity of the PSK scheme, for the coding channel resulting from
hard decision detection, was computed in Sousa (1992). In this chapter,
upper and lower bounds on the sum capacity are obtained and compared.

7.2 ANALYSIS OF A CDMA SYSTEM WITH A FIXED

NUMBER OF USERS AND SMALL SNR
One of the problems related to CDMA is the possibility of interference
with existing systems (Kavehrad and McLane 1985). Unlike the former
88 • INFORMATION THEORY

techniques, CDMA has its capacity solely limited by the amount of inter-
ference into the system, which allows any reduction in interference to be
converted directly and linearly into an increase in capacity (Gilhousen et al.
1991).
The use of spread spectrum allows new users to share the spectrum
with existing ones. Because CDMA is interference limited, the number
of users that share the same spectrum, and still maintain an acceptable
performance, is determined by the interference generated by the set of
users.
In CDMA systems, each user is given a distinct pseudo-random code,
which either identifies the user to the central station in a star network, or
addresses another user in a more fully connected network (Turin 1984b).
Hence, the users utilize the same bandwidth and therefore interfere, mak-
ing a received signal appear noisier as the traffic increases in the system.
The effect of the interfering energy depends on the processing gain of the
spread spectrum technique.
As implied, spread spectrum systems are less subject to multipath signal
variations than conventional systems. In a direct sequence receiver, if the
reflected signal is delayed compared to the direct signal by more than one
code chip, the reflected signal is treated as uncorrelated to the direct signal.
The higher the code chip rate, for a particular system, the smaller its mul-
tipath problem. In the mobile environment, the result of multipath reception
is that a moving receiver is subject to rapid fading and peaking of its input
signal as the direct and reflected signals cancel and reinforce each other.
Stationary receivers are also subject to multipath, due to the reflecting and
refracting surfaces (Dixon 1984).
Researchers, industries, and prospective users of CDMA systems have
been concerned with three major issues regarding the design of those sys-
tems. The first one is the design of pseudo-noise sequences with good
autocorrelation and cross-correlation properties.
The second issue, is the evaluation of system performance, that can be
done in two ways: using the average probability of error or computing the
outage of the system. The probability of error is averaged over the ensemble
of channels obtained from the assumed model.
In the case of time-varying channel and stationary terminal, outage indi-
cates the fraction of time the terminal operates with an error probability
above the limit. For the time-invariant channel and mobile terminal, out-
age indicates the fraction of building area where the terminal can operate
(Kavehrad and Ramamurthi 1987).
The evaluation of the system capacity, or which transmission rate the sys-
tem can support under given circumstances, is the third issue. The number
THE CAPACITY OF A CDMA SYSTEM • 89

of allocated users dictates the type of distribution to characterize the mul-

tiple access interference.
A large number of subscribers favor the Gaussian approximation (Turin
1984b) and, for many applications, this approximation is satisfactory even
for a processing gain as low as N = 10 and moderate values of the signal to
interference ratio (Sousa 1990). On the other hand, the Gaussian approxi-
mation is typically more accurate for smaller values of the SNR, or large
multiuser interference (Geraniotis and Ghaffari 1991).
For a small number of subscribers, considering that the MAI is additively
composed of cross-correlations limited to a few chips of coincidence, the
maximum interference is too small to apply the central limit theorem (Brady
1991), or the conditions for utilization of the theorem do not apply if certain
parameters are held constant (Sadowsky and Bahr 1991).
But, even for a small number of users, accurate methods have been
devised to calculate multiple access error probabilities using the Gaussian
approximation (Holtzman 1992). This section is dedicated to the evaluation
of the capacity of a CDMA system, when the number of users is held
constant.
CDMA is a type of noncooperative channel, which has asynchronous
multiplexing, implying that the M information sources transmit through
a common medium without any prearrangement among themselves. This
kind of channel has been investigated for some time, in a general sense,
and some interesting results were found (Sommer 1966; Verdú 1989b).
In this type of communications system, several users transmit simulta-
neously and concurrently to a common receiver. Therefore, all components
of the received signal are statistically independent and so, perform as inter-
ference in respect to any other given signal. The CDMA channel is analyzed
through the use of an information theoretic approach in this section.
The concept of capacity is used regarding the limitation in the transmis-
sion rate due to bandwidth, or related aspects, while the idea of throughput
is used when scheduling is the main concern in the performance evalua-
tion of the system. In the case of CDMA, both factors are involved which
gives rise to the concept of sum capacity, or limiting throughput per unit
bandwidth (Sousa 1989).
Because of the reasoning and considerations developed in the follow-
ing, the term capacity, used heretofore, is equivalent to the concept of sum
capacity, derived in the cited article. The capacity is assumed for the case
of noncooperation among the users and each user contributes a small frac-
tion of the trafﬁc to the channel (Sousa 1989).
The CDMA noncooperative multiple access channel discussed in this
section has M binary inputs, Xi ∈ Xi , and a single Gaussian output, Y ∈ Y.
90 • INFORMATION THEORY

The output is the combination of the M inputs plus noise and can be assumed
Gaussian, using the central limit theorem for a large number of users
(Sousa 1989). The channel is assumed memoryless and, for the assumptions
made, the optimum receiver for a particular signal will be the maximum
likelihood receiver.
In the following, one considers the performance analysis of an asyn-
chronous CDMA system using direct-sequence codes. Let bij = ±1, i =
0, 1, . . . , M , j = −∞, . . . , ∞ be a binary bit string that modulates the
ith transmitter and let the codes ci (t), i = 1, . . . , M , 0 ≤ t ≤ Tb form a set
of quasi-orthogonal sequences, with cross-correlation given by
Tb
rij (τ ) = ci (t)cj (t − τ ) dt, |τ | ≤ Tb , i = j (7.1)
0

which is uniformly small compared to their individual energy (Turin 1984b)

Tb
Ec = ci2 (t) dt, i = 1, . . . , M . (7.2)
0

At the reception side, the desired signal is represented by

∞

xi (t) = bij/N ci (t − jTb )φj (t) (7.3)
j=−∞

in which {φj (t)} is a set of orthonormal functions and [j/N ] represents the
greatest integer less than or equal to j/N . If modulation is used, along with
rectangular chip pulses, the functions are given by

φj (t) = cos [ωc (t − jTb )]. (7.4)

Therefore, the signal whose reception is sought can be expressed as

∞

xi (t) = bij/N ci (t − jTb ) cos [ωc (t − jTb )]. (7.5)
j=−∞

However, the M transmitted signals combine additively, giving for any

receiver a total signal y(t), such that

M
y(t) = xi (t) + n(t) (7.6)
i=1

in which n(t) is a zero mean Gaussian noise of double-sided power density

N0 /2. Consequently, part of the total signal is viewed as noise for each
THE CAPACITY OF A CDMA SYSTEM • 91

user. The other M − 1 current users present a multiple access interference

(MAI) at the ith receiver of the form

M
nI (t) = xk (t) (7.7)
k=1,k =i

or, upon substitution

M ∞

nI (t) = bkj/N ck (t − jTb − τk ) cos [ωc (t − jTb − τk )],
k=1,k =i j=−∞
(7.8)
in which {τk } is a set of independent random variables distributed over
the interval [0, Tb ). The total received signal can be written as an explicit
function of the MAI and Gaussian noise

y(t) = xi (t) + nI (t) + n(t). (7.9)

Using the maximum likelihood receiver, or a matched ﬁlter followed by a

detector, the ﬁlter output yi at time t = Tb is given by (Turin 1980; Kavehrad
and McLane 1985)
Tb
yi = y(t)ci (t) cos ωc t dt (7.10)
0

which can be put in the form

∞
1
M
yi = bi Eb + bi[j/N ]
2
k=1,k =i j=−∞
Tb
× ci (t)ck (t − jTb − τk ) dt cos φkj
0
Tb
+ ci (t)n(t) cos ωc t dt (7.11)
0

in which
φkj = θk − ωc τk − jωc Tb (7.12)

and {θk } form a set of independent random variables, uniformly distributed

in the interval [0, 2π ). It is also considered that τk and θk are independent.
The polarity of a given bit is determined by a comparison between the
phases at the output of the ﬁlter at t = (j + 1)Tb and jTb .
It is assumed that ωc Tb = 2πl, in which l is an integer. This is equivalent
to the narrow-band assumption, as discussed in (Turin 1984b). Also, the
92 • INFORMATION THEORY

mean and variance of the third term are 0 and N0 Eb /2, respectively, in which
Tb
Eb = ci2 (t) cos2 ωc t dt, i = 1, . . . , M . (7.13)
0
The second term deserves some attention. It can be written as (Turin
1984a)
1
M
Ni = nk cos φkj (7.14)
2
k=1,k =i
in which
∞
Tb
nk = bi[j/N ] ci (t)ck (t − jTb − τk ) dt. (7.15)
j=−∞ 0

But, the variance of the term nk , deﬁned as σk2 , inside the summation
can be expressed as
Tb /N
Nτ 2 Tb2 Tb2
σk = 2
2
1− (Tb − τ )dτ = 2 − . (7.16)
0 Tb 3N 12N 2

Therefore, the variance of Ni , deﬁned as σi2 , is

M − 1 2 (M − 1)Tb2 (M − 1)Tb2
σi2 = σk = − . (7.17)
8 12N 48N 2
Because the MAI term is assumed Gaussian, and Eb = Tb /2, the ﬁl-
ter output is a Gaussian variable, with mean μI = bi Eb . Therefore, the
multiple access variance σI2 is given by (Borth and Pursley 1979; Turin
1984b; Pickholtz, Schilling, and Milstein 1982)
(M − 1)Eb2 (M − 1)Eb2 N 0 Eb
σI2 = − + . (7.18)
3N 12N 2 2
Based on the results stated by Shannon (1948), the mutual information
between the ith individual input and the output, for the noncooperative
additive channel, can be written as
I (Xi ; Y ) = H (Y ) − H (Y − Xi )
q−1 ∞
p(y|xj )
= p(y|xj )P(xj ) log dy (7.19)
−∞ p(y)
j=0

in which

q−1
p(y) = p(y|xj )P(xj ), (7.20)
j=0
the logarithms are to the base 2, unless otherwise stated, implying that
information rates are given in bits and q represents the number of signal
levels.
THE CAPACITY OF A CDMA SYSTEM • 93

The maximization of Equation 7.19, over the set of input probabilities,

gives the expressions for the individual capacities

q−1
∞ p(y|xj )
Ci = max p(y|xj )P(xj ) log dy. (7.21)
P(xj ) −∞ p(y)
j=0

The expression is set as the maximum of the average mutual information

between the input X = {x1 , . . . , xq−1 } and the output Y = ( − ∞, ∞).
For possible inputs xk = +1 and xk = −1, the average mutual informa-
tion is maximized when the input probabilities are P(xk = +1) = P(xk =
−1) = 1/2. Hence, the expression can be put in the form (Alencar and
Blake 1993c)
∞
1 p(y| + Eb )
Ci = p(y| + Eb ) log dy
2 −∞ p(y)

1 ∞ p(y| − Eb )
+ p(y| − Eb ) log dy (7.22)
2 −∞ p(y)

in which, assuming that nI (t) is Gaussian,

(y∓Eb )2
1 −
2σI2
p(y| ± Eb ) = √ e . (7.23)
σI 2π

The noise variance σI2 was obtained assuming that the interuser inter-
ference amounts to a Gaussian distribution, for a CDMA system using
direct-sequence codes (Turin 1984b). It can be simpliﬁed by noting that for
cik (t) = ±1, k = 1, . . . , N , the code energy is Ec = Tb = 2Eb , and for usual
values of the processing gain (Pursley 1977; Geraniotis and Pursley 1985)

(M − 1)Eb2 N 0 Eb
σI2 = + (7.24)
3N 2
in which M is the number of users in the channel, N is the processing gain,
N0 /2 is the Gaussian noise power spectrum density, and Tb is the bit time.
It can reasonably be argued, for the present model, that the sum capacity
is a one-feature function containing most of the needed information about
the behavior of the system. This is fair to say for spread spectrum tech-
niques, which deﬁne a noncooperative type of channel whose capacity is
given by the sum of the individual capacities,

M
C= Ci . (7.25)
i=1
94 • INFORMATION THEORY

The individual rates, upon substitution of the appropriate terms, are

equal to
⎡ ⎤
∞ (y−Eb )2 2 exp ( − (y−Eb )2
)
1 − 2σI2
Ci = √ e 2σI2
log ⎣ (y−Eb )2 (y+Eb )2
⎦ dy
2σI 2π −∞ exp ( − 2σI2
) + exp ( − 2σI2
)
⎡ ⎤
∞ (y+Eb )2 2 exp ( − (y+Eb )2
)
1 − 2σI2
+ √ e 2σI2
log ⎣ (y−Eb )2 (y+Eb )2
⎦ dy
2σI 2π −∞ exp ( − 2σI2
) + exp ( − 2σI2
)

which can be simpliﬁed to

⎡ ⎤
∞ (y−Eb )2
1 − 2
Ci = √ e 2σI2
log ⎣ ⎦ dy. (7.26)
σI 2π −∞ 1 + exp ( − 2yEb
)
σI2

The remaining part of this section is concerned with establishing a

suitable approximation for the sum capacity in Equation 7.26, in terms
of upper and lower bounds, for small values of the signal to total noise
ratio. Expanding the logarithm term in a power series in Eb , for low values
of the signal-to-noise ratio (SNR), one obtains (Gallager 1968; Gradshteyn
and Ryzhik 1990)
⎡ ⎤
2 y2 Eb2 y4 Eb4
log ⎣ ⎦ = log e yEb − + + O(Eb ) .
6
1 + exp ( − 2yE b
) σI2 2σI4 12σI8
σ2 I
(7.27)

The expansion is valid for the case in which the number of users in the
system is much larger than the code length, or when the thermal noise is
preponderant in the system, according to Equation 7.13.
Substituting into Equation 7.26 and integrating, gives

Eb2 Eb4 Eb6 Eb8 Eb10
Ci = log e − + + +O (7.28)
2σI2 4σI4 2σI6 12σI8 σI10

in which the remaining terms become insigniﬁcant as the SNR goes to

zero. It is appropriate to say that spread spectrum systems operate, in
some cases, under conditions of very low SNR. Therefore, neglecting
all but the ﬁrst term, one obtains an upper bound on the individual
capacity

Eb2
Ci ≤ log e (7.29)
σI2
THE CAPACITY OF A CDMA SYSTEM • 95

Because the series in 7.27 is an alternating one, a lower bound can be

found easily considering also the second term

Eb2 Eb4
Ci ≥ log e − . (7.30)
2σI2 2σI4

Using the ﬁrst term in Equation 7.27 as an approximation for the indi-
vidual capacity and substituting the formula for the MAI variance one ﬁnds
a formula for the capacity, under the assumption of very low SNR
⎡ ⎤
⎢ Eb2 ⎥
Ci ≈ log e ⎣ ⎦ . (7.31)
(M −1)Eb2 N 0 Eb
2 3N + 2

Using now Equation 5.34, maximized for the case in which all the trans-
mitters carry the same power, one obtains

3 log eMN
C = MCi ≈ . (7.32)
2(M − 1) + 3N NEb0

Example: interesting results are obtained when one varies the SNR Eb /N0
or number of users in the system. One gets, as the SNR goes to zero

lim C = 0 (7.33)
Eb
N0 →0

If the SNR is small, as compared to the ratio of processing gain to num-

ber of users, Eb /N0 3N /2(M − 1), then the capacity can be approxi-
mated by

Eb
C ≈ log eM (7.34)
N0

and the capacity is proportional to the SNR, as well as to the number of

channel users.

Example: another interesting result can be obtained by taking the limit of

the capacity when the number of sources goes to inﬁnity, for a rectangular
chip pulse,

3 log eN
lim C ≈ (7.35)
M →∞ 2
96 • INFORMATION THEORY

It is easy to conclude that, as the number of users increases towards

inﬁnity, the capacity region shrinks, and the channel capacity converges to
a value that is proportional to the processing gain. In this case, by using a
set of orthogonal spreading codes, the capacity becomes independent of the
SNR and it is not possible to increase the capacity of the noncooperative
channel by increasing the power of the transmitted signals.
Because the limiting capacity C becomes dependent on the processing
gain, as the number of senders goes to inﬁnity, the only possibility left to
boost the capacity is by an increase in the available bandwidth. This is one of
the reasons why spread spectrum techniques are useful for noncooperative
type of systems. Figure 7.1 shows the relation between the exact (middle
curve) and the approximate capacity (upper and lower bounds) in terms
of the SNR, for a given code length (N = 100) and a number of users
(M = 500).
The exact formula is obtained from 7.26, after substitution into 5.34.
The approximate value 7.32 is used as the upper bound. It is possible to
note that the upper and lower approximations are fairly close to the exact
result, if the SNR is below −5 dB.

Capacity, bits per channel use

160

140

120

100
C
80

40 Upper bound
Lower bound

-20 -15 -10 -5 0

Eb/No, dB

Figure 7.1. Capacity approximations for the channel, as a function of the

SNR, for M = 500 and N = 100.
THE CAPACITY OF A CDMA SYSTEM • 97

7.3 CDMA SYSTEM WITH A FIXED NUMBER OF

USERS AND HIGH SNR
In the previous section, an approximation for the capacity of a CDMA
system was obtained considering a small SNR. In that case, either the
number of users was very large, as compared to the processing gain, or the
signal power was completely embedded in the Gaussian noise. This section
presents a set of approximations for the channel capacity, considering now
a large SNR, or a large ratio of processing gain to number of users.
In order to analyze the sum capacity in detail, it is important to ﬁnd
both approximations and bounds for the function. The equation for the
individual capacity 7.26 can be simpliﬁed by the application of logarithm
properties to
∞ (y−Eb )2

1 − 2yEb
2σI2
Ci = √ e 1 − log 1 + exp ( − 2 ) dy (7.36)
σI 2π −∞ σI

which, upon integration of the ﬁrst term, gives

∞ (y−Eb )2
2yE

1 − − 2b
2σI2
Ci = 1 − √ e log 1 + e σI
dy. (7.37)
σI 2π −∞

Expanding the exponent leads ﬁnally to

E2
− b
1 ∞ − y2
−
yEb
−
2yEb
2σI2 2σI2 σI2 σI2
Ci = 1 − log ee √ e e ln 1 + e dy.
σI 2π −∞
(7.38)

As seen from the analysis undergone in the last section, the integrand in
the above equation is suitable for a series expansion, as long as
Eb /N0 1. In order to adjust the integrand, so that the expansion can hold
for Eb /N0 1 as well, one can resort to the Fourier transform properties.
The functions inside the integral
y2
1 −
2σI2
g(y) = √ e (7.39)
σI 2π
and
yEb
2yEb

− −
σI2 σI2
f (y) = e ln 1 + e (7.40)

can be Fourier transformed to (Gradshteyn and Ryzhik 1990),

2 ω2
σN
G(ω) = F[g(y)] = e− 2 (7.41)
98 • INFORMATION THEORY

and
π jπσI2 ω
π csc 2 + 2Eb
F(ω) = F[f (y)] = Eb
. (7.42)
σI2
+ jω
Therefore, applying the Fourier transform property of the integral of the
product of two functions,
∞ ∞
f (y)G(y)dy = F(ω)g(ω) dω (7.43)
−∞ −∞
one obtains

π jπσI2 ω
−
E2
b ∞ σ 2 ω2 π csc 2 + 2Eb
2σI2 − I2
Ci = 1 − log e e e Eb
dω. (7.44)
−∞
σI2
+ jω

This last integral can be simpliﬁed, using the following property of

hyperbolic functions

π jπσI2 ω jπσI2 ω
csc + = sec (7.45)
2 2Eb 2Eb
then

jπσI2 ω
−
E2
b ∞ σ 2 ω2 π sec 2Eb
2σI2 − I2
Ci = 1 − log e e e Eb
dω. (7.46)
−∞
σI2
+ jω

The application of yet another simple property gives

E2 σI2 ω2
− b
2σI2
∞ e− 2
Ci = 1 − π log e e dω. (7.47)
πσI2 ω Eb
−∞ cosh 2Eb σI2
+ jω

Multiplying and dividing the integrand by the conjugate of the complex

term in the denominator leads to
σI2 ω2
E2 e− 2 Eb
− jω
− b2 ∞
σ 2
Ci = 1 − π log e e 2σI 2 I 2 dω. (7.48)
πσ ω Eb
−∞ cosh I
2Eb σ 4 + ω 2
I

The integral can now be split into two terms

⎡ σI2 ω2
E2 e− 2 Eb
− b2 ⎢ ∞ σ 2
Ci = 1 − π log e e 2σI ⎣ 2 I2 dω
−∞ cosh πσI ω Eb
+ ω 2
2Eb σ 4
⎤ I
∞ 2
σI ω 2
jωe− 2 ⎥
− 2 2 dω⎦ . (7.49)
πσ ω E
−∞ cosh I
2Eb σ
b
4 + ω 2
I
THE CAPACITY OF A CDMA SYSTEM • 99

But the second integrand is an odd function of ω and therefore integrates

to zero. The ﬁrst integrand is even, which yields ﬁnally
E2 σI2 ω2
− b
2σI2
Eb ∞ e− 2
Ci = 1 − 2π log e e dω. (7.50)
σI2 0 cosh
π σI2 ω Eb2
+ ω2
2Eb σI4

The capacity can be lower and upper bounded by recognizing that

the Cauchy function is upper bounded by 1 and lower bounded by
exp ( − σI4 ω2 /Eb2 ), respectively. The lower bound is
E2 σI2 ω2
− b
2σI2
σI2 ∞ e− 2
Ci ≥ 1 − 2π log e e 2 dω (7.51)
Eb 0 πσ ω
cosh 2EI b

and the upper bound

E2 σI2 ω2 σI4 ω2
− b
2σI2
σI2 ∞ e− 2 −
E2
Ci ≤ 1 − 2π log e e 2 e b dω. (7.52)
Eb 0 πσ ω
cosh 2EI b

Both expressions are good approximations for the actual value of the
capacity if Eb /σI2 1, which implies N M . For very long code sequences,
the hyperbolic cosine can also be simpliﬁed to
2
πσI2 ω 1 πσI2 ω
cosh =1 + + ···≈1 (7.53)
2Eb 2 2Eb

Using this simpliﬁcation in Equation 7.51, one obtains the lower bound
and approximation
E2 ∞
− b
σI2 σI2 ω2
Ci ≥ 1 − 2π log e e 2σI2
e− 2 dω. (7.54)
Eb 0
and ﬁnally
E2
σI − b2
Ci ≥ 1 − π log e e 2σI . 3/2
(7.55)
Eb
This exponential approximation, referred to as Bound 1, is plotted along
with the exact capacity in Figure 7.2, for M = 20 and N = 100, after the
substitution into the formula for the sum capacity.
On the other hand, another approximation can be attempted using Equa-
tion 7.52, which can be put in the following form
σI2 σI4
E2 ∞
−( +
E2
)ω2
σI2
2
− b2 e b
Ci ≤ 1 − 2π log e e 2σI
dω. (7.56)
Eb 0 πσI2 ω
cosh 2Eb
100 • INFORMATION THEORY

Capacity, bits per channel use

0
Exact
C Bound

-10

-20

-30
0 5 10 15 20
Eb/No, dB

Figure 7.2. Bound 1 on the capacity for the channel, as a function of the
SNR (Eb /N0 ).

and integrated, after the simpliﬁcation indicated by 7.53, to provide the

approximation

√ E2
2π 3/2 σI 1 − b
2σI2
Ci ≈ 1 − log e e (7.57)
2 Eb 1 σI2
2 + Eb2

It is interesting to observe that both 7.55 and 7.57 converge to the same
limit as the SNR increases. This approximation, referred to as Bound 2, is
plotted to compare with the exact capacity in Figure 7.3, for M = 20 and
N = 100, after the substitution into the formula for the sum capacity.
Equation 7.51 can lead to another lower bound on the individual capacity.
Consider for a while only the integral term of the mentioned equation

σI2 ω2
∞ e− 2
I= 2 dω. (7.58)
0 πσ ω
cosh 2EI b
THE CAPACITY OF A CDMA SYSTEM • 101

Capacity, bits per channel use

0
Exact
C Bound

-10

-20

0 5 10 15 20
Eb/No, dB

Figure 7.3. Bound 2 on the capacity for the channel, as a function of the
SNR (Eb /N0 ).

It is not difﬁcult to obtain a simpler expression, making use of the property

of integration by parts,
σI2 ω2 ∞
2Eb e− 2 πσI2 ω
I= arctan sinh
πσI2 2Eb
0
∞
2Eb − I σ 2 ω2
π σI2 ω
+ ωe 2 arctan sinh dω. (7.59)
0 π 2Eb
The ﬁrst term on the right side of the expression vanishes, when the limits
are applied, yielding
∞
2Eb − σI2 ω2 π σI2 ω
I= ωe 2 arctan sinh dω. (7.60)
0 π 2Eb

Using the usual assumption Eb /σI2 1, and the inequality arctan (x) ≤ x
it is possible to upper bound the integrand, obtaining

2Eb ∞ − σI2 ω2 π σI2 ω
I≤ ωe 2 sinh dω. (7.61)
π 0 2Eb
102 • INFORMATION THEORY

which integrates to (Gradshteyn and Ryzhik 1990)

√ π 2 σI2
2π 8E 2
I≤ e b . (7.62)
2σI2

Substituting the above equation into Equation 7.51 gives

E2 π 2 σI2
√ σI − b2 + 8E 2
Ci ≥ 1 − 2π 3/2
log e e 2σI b . (7.63)
Eb
The ﬁnal version of the bound is found introducing the expression for
the multiple access interference plus noise

(M − 1)Eb2 N 0 Eb
σI2 = + (7.64)
3N 2
into Equation 7.63. This yields

√ 3/2 M −1 N0
Ci ≥ 1 − 2π log e +
3N 2Eb

3N π2 M − 1 N0
× exp − + + (7.65)
2(M − 1) + 3N NEb0 8 3N 2Eb

and the total capacity, or sum capacity, is lower bounded by

√ 3/2 M −1 N0
C ≥ M − 2π log eM +
3N 2Eb

3N π2 M − 1 N0
× exp − + + (7.66)
2(M − 1) + 3N NEb0 8 3N 2Eb

which is depicted, as a function of the SNR, in Figure 7.4 and referred to

as Bound 3, for M = 20 and N = 100.

Example: two cases of interest are obtained by making Eb /N0 → ∞

√ 3/2 M −1 3N π2 M − 1
C ≥ M − 2π log eM exp − +
3N 2(M − 1) 24 N
(7.67)

and N → ∞

√ N0 Eb π 2 N0
C ≥ M − 2π 3/2 log eM exp − + (7.68)
2Eb N0 16 Eb
THE CAPACITY OF A CDMA SYSTEM • 103

Capacity, bits per channel use

20
C
0

-20

-40
Exact
Bound
-60

-80

-100

-120
0 5 10 15 20
Eb/No, dB

Figure 7.4. Bound 3 on the capacity for the channel, as a function of the
SNR (Eb /N0 ).

The bound for the sum capacity can be expressed in terms of the prob-
ability of error, using Equation 7.82
2
1 π 1
C ≥ M − 2π 3/2
log ePe M exp (7.69)
ln (1/2Pe ) 8 2 ln (1/Pe )

7.4 A TIGHT BOUND ON THE CAPACITY OF A

CDMA SYSTEM
In this section a lower bound on the sum capacity, for a CDMA system,
is obtained. The bound is shown to be very tight, although not an expo-
nential one, and therefore amenable for computation in place of the actual
formula for the system capacity, which involves a time consuming numer-
ical integration. Some results are reported in this section, in terms of the
processing gain, number of users and SNR. An interesting threshold is dis-
covered, relating the sequence length, or processing gain, and the number
of users in the channel. Conditions for attaining the capacity are established
and the capacity is plotted as a function of the probability of error.
104 • INFORMATION THEORY

The following derivation presents a lower bound for the channel capac-
ity, considering a large SNR, or a large ratio of processing gain to number
of users. This is likely to be the case in real systems, because both assump-
tions decrease the probability of error. In order to set the stage for the
evaluation of the lower bound, it is interesting to present a slightly different
formulation from that one presented in the past section. One begins with
Equation 7.47, repeated here for reference
E2 σI2 ω2
− b
2σI2
∞ e− 2
Ci = 1 − π log e e dω. (7.70)
πσI2 ω Eb
−∞ cosh 2Eb σI2
+ jω

The integral inside Equation 7.50, in last section, is not amenable for
an analytic solution. One form of attacking the problem is to ﬁnd a bound
for the capacity, by a convenient substitution implied by the inequality
2
cosh (x) ≥ 1 + x2 . Then,

E2 σI2 ω2
− b
Eb ∞ e− 2
Ci ≥ 1 − 2π log e e 2σI2
dω.
σI2 0 πσI2 ω 2 Eb2
1+ 1
2 2Eb σI4
+ ω2
(7.71)

Rearranging the equation, in order to come out with a standard deﬁnite

integral, one obtains
E2 σI2 ω2
− b
2σI2
8Eb3 ∞ e− 2
Ci ≥ 1 − 2π log e e dω
π 2 σI6 0 8Eb2
+ ω2
Eb2
+ ω2
π 2 σI4 ω σI4
(7.72)
which can be separated by a partial fraction expansion to
E2
− b
8Eb
2σI2
Ci ≥ 1 − 2π log e e
(π 2 − 8)σI2
⎡ ⎤
∞ −
σI2 ω2 ∞ −
σI2 ω2
⎢ e 2 e 2 ⎥
×⎣ dω − dω⎦ .
8Eb2 Eb2
0
π 2 σI4 ω
+ ω2 0
σI4
+ ω2

The above integrals can then be evaluated by noting that (Gradshteyn

and Ryzhik 1990)
∞ −μ2 x2
e π β 2 μ2
dx = [1 − erfc(βμ)] e (7.73)
0 x +β
2 2 2β
for Reβ > 0 and |argμ| < π/4.
THE CAPACITY OF A CDMA SYSTEM • 105

Therefore,
E2
− b
2π 2Eb
2σI2
Ci ≥ 1 − log e e 1 − erf
π2 − 8 π σI
⎤
4E 2 Eb2
π b
Eb
2e 2σI ⎦
2 2 2
× √ e π σI − 1 − erf √ (7.74)
2 2σI

in which erf (x) is deﬁned as

x
2
e−t dt.
2
erf (x) = √ (7.75)
π 0

This equation can be further simpliﬁed to

⎡
8−π 2 Eb2
2π ⎣ π 2Eb 2 2
Ci ≥ 1 − log e 2 √ 1 − erf e 2π σI
π −8 2 π σI

Eb
− 2 1 − erf √ . (7.76)
2σI
Substituting now the expression for the total noise variance into Equa-
tion 7.76 leads to the ﬁnal expression for the lower bound on the sum
capacity
2πM
C ≥ M − log e 2 (7.77)
π −8

π 2 6NEb /N0
× √ 1 − erf
2 π 2(M − 1)Eb /N0 + 3N
8−π 2 6NEb /N0
×e 2π 2 2(M −1)Eb /N0 +3N

1 6NEb /N0
− 2 1 − erf √ .
2 2(M − 1)Eb /N0 + 3N

Equation 7.77 is plotted in Figure 7.5, along with the exact solution
(upper curve), as a function of the SNR (SNR = Eb /N0 , dB) for M = 20
and sequence length N = 100. It is remarkable to notice that the difference
does not exceed 5 percent from −10 dB to 20 dB and actually converges
to zero as the SNR increases. The lower bound will thus be used as an
approximation for the actual capacity in the following, for values of SNR
of −10 dB or more.
One of the reasons for having a bound on the capacity, of the form
expressed by Equation 7.77, is the possibility of interpreting correctly the
role of the many parameters on the performance of the system, because
106 • INFORMATION THEORY

Capacity, bits per channel use

12
Exact
C Bound
10

2
-10 -5 0 5 10 15 20
Eb/No, dB

Figure 7.5. Capacity for the channel, compared with the lower bound,
as a function of the SNR (Eb /N0 ), for M = 20 and sequence length
N = 100.

those parameters, and their relationship, appear explicitly in the formula

for the bound. As will be seen in what follows, some parameters have the
property of influencing and even disrupting the system.
The second reason involves the possibility of combining different for-
mulas, to attain a certain objective, as in the case of expressing the capacity
as a direct function of the probability of error and vice-versa.
The third reason is related to computational resources. Evaluation of
the exact solution is time and memory consuming and is justified only
when no feasible approximation is available. It is instructive to recall that
even in the case when the exact solution is being computed, the computer
system will always resort to some form of approximation or truncation of
the formulas.
Equation 7.77 is also plotted in Figure 7.6, as a function of the SNR,
for M = 20, having the sequence length as a parameter. It is interesting to
notice that for a high processing gain there is no further significant gain in
capacity if N ≥ 10 M .
In Figure 7.7, the capacity is displayed as a function of the sequence
length, for a SNR of 10 dB, parametrized by the number of users. It must
THE CAPACITY OF A CDMA SYSTEM • 107

2
-10 -5 0 5 10 15 20
Eb/No, dB

Figure 7.6. Approximate capacity for the channel, as a function of the

signal-to-noise ratio (Eb /N0 ), for M = 20, having N as a parameter.

be noticed that the capacity of the system is achieved for a sequence length
of N = 100 and N = 200, for M = 10 and M = 20, respectively.
In Figure 7.8, the capacity is shown in terms of the number of users,
for a given SNR (10 dB), and two values of the processing gain N = 100
and N = 200 (upper curve). This result shows that the capacity for the
binary case, in bits per channel use, is numerically equal to the number
of users, as long as this number is small compared to the processing gain.
Any increase in the number of users above M = 0.2N produces a transmis-
sion rate that falls short of the system capacity. This is in agreement with
previous results (Turin 1984b; Pickholtz, Schilling, and Milstein 1982),
contradicting (Hui 1984b), which suggested that using long PN sequences
would decrease the sum capacity. Figure 7.9 illustrates the previous point,
by showing a decrease in the capacity with an increase in the number of
users, for a fixed value of N . It is important to mention that if both M and
N increase to infinity, the limiting capacity will be given by log e/2, as
predicted in Chapter 2 (Sousa 1989).
The importance of finding a lower bound on the capacity is due to the
nature of the noncooperativeness of CDMA. In this case, the lower bound
108 • INFORMATION THEORY

Capacity, bits per channel use

M=100
80
M=80

60
M=60
C
40
M=40

20
M=20

M=10
0

20 40 60 80 100 120 140 160 180 200

Figure 7.7. Capacity for the channel, using the approximate formula, as
a function of the sequence length (N ), for different values of M .

represents the worst possible scenario for the system, which is incidentally
close to reality.

Example: from Equation 7.77 it is also possible to draw some limiting

results, which are supposed to hold for the exact formula for the capacity.
If the processing gain increases to inﬁnity

lim C = M . (7.78)
N →∞

In case the SNR goes to inﬁnity

lim C = M . (7.79)
Eb
N0 →∞

Finally, if the number of users increases without bound the sum capacity
displays a peculiar behavior. If M ≥ 36N

lim C = 0 (7.80)
M →∞

otherwise
lim C = ∞. (7.81)
M →∞
THE CAPACITY OF A CDMA SYSTEM • 109

Capacity, bits per channel use

160

140

120

100
N=200
C 80
N=100

0
20 40 60 80 100 120 140 160 180 200
M

Figure 7.8. Capacity for the channel, as a function of the number of users
(M ), using the approximation for the capacity.

The relation between the number of users and the sequence length,
χ = M /N , has been explored in a recent paper (Sousa 1989). It is remark-
able that χ = 36 sets a threshold for the capacity, as the number of users
increases to inﬁnity. This channel behavior demands further study. It is
rather interesting to observe that the capacity can be achieved, for M → ∞,
as long as the sequence length increases accordingly.
It is useful to express the sum capacity in terms of the error probabil-
ity. A complete analysis of a DSSS with DPSK modulation, in terms of
probability of error, can be found in Kavehrad and Ramamurthi (1987) or
in Geraniotis and Pursley (1986). In the following, a simpliﬁed analysis is
pursued. This can be done, by rewriting the formula for the probability of
error for the DPSK system, considering now the effect of multiple access
interference (Turin 1984b).

1 1 M −1 N0 −1
Pe = exp − + (7.82)
2 2 3N 2Eb
solving the expression for Eb /N0
−1
1 M −1 N0
ln 2Pe = − + (7.83)
2 3N 2Eb
110 • INFORMATION THEORY

Capacity, bits per channel use

300

250

200
N=200
C 150
N=100

100

0
500 1000 1500 2000 2500 3000 3500

Figure 7.9. Capacity for the channel, as a function of the number of users
(M ), including the case M N .

and ﬁnally
−1 −1
Eb 1 2(M − 1)
= ln − . (7.84)
N0 2Pe 3N
Substituting this last result into the formula for the sum capacity one
ﬁnds the relation between the capacity, in bits per channel use, and the
probability of error for high values of the signal to noise error. This relation
is expressed in the following formula

2πM π 2 1
C ≥ M − log e 2 √ 1 − erf 2 ln (7.85)
π −8 2 π 2Pe

8−π 2 1
ln 2P1e
×e π 2
− 2 1 − erf ln ,
2Pe

which is depicted in Figure 7.10, for M = 100 users.

From this graph, it is clear that an error probability Pe ≥ 10−2 will
produce a decrease in available capacity of 3 percent or more. On the other
hand, a probability of error Pe ≤ 10−3 will leave the capacity almost intact.
Of course, the probability of error can be decreased only at the expense
of an increase in the SNR of the channel, or through the use of long and
better code sequences. This represents a lower bound on the sum capacity,
THE CAPACITY OF A CDMA SYSTEM • 111

Capacity, bits per channel use

100

99.5

C
98.5

97.5

0.002 0.004 0.006 0.008 0.01

Figure 7.10. Capacity for the channel, as a function of the probability of

error (Pe ).

for a given probability of error, which also implies the worst case for the
system operation.
In order to conclude this section, it is useful to express Equation 7.77 in a
more simpliﬁed form, as an exponential bound, to sum up the results from
the previous section. It is not difﬁcult to recognize in Equation 7.77 the
complementary error function, erfc(x) = 1 − erf (x), which can be approx-
imated by (Schwartz 1970)

e−x
2

erfc(x) ≈ √ , x 1. (7.86)
x π

Therefore, after the appropriate substitution, one ﬁnds the exponential

approximation below
√ E2
2π σI − b2
Ci ≈ 1 − log e e 2σI . (7.87)
2 Eb

This formula will be referred to as Bound 4 in the following. It is plotted

in Figure 7.11 for M = 20 users and gain N = 100.
112 • INFORMATION THEORY

Capacity, bits per channel use

12 Exact
C Bound
10

0 5 10 15 20

Eb/No, dB

Figure 7.11. Bound 4 on the capacity for the channel, as a function of

the SNR (Eb /N0 ).

A summary of the exponential bounds found in this chapter is shown

in Figure 7.12. It is clear that Bound 4, although a simpler form of Equa-
tion 7.77, is vastly superior to all the remaining bounds.
The chapter discussed the performance analysis of a direct-sequence
CDMA system, in terms of the sum capacity. Upper and lower bounds
on the sum capacity, for a CDMA system, were obtained in this chapter.
Some of the bounds were shown to be very tight and therefore amenable for
computation in place of the actual formula for the system capacity, which
involves a time-consuming numerical integration.
Some results were reported in terms of the processing gain, number
of users, and SNR. An interesting threshold was discovered, relating the
sequence length and the number of users in the channel. Conditions for
attaining the capacity were established and the capacity was plotted as
a function of the probability of error, for a DPSK modulation scheme.
One of the reasons for the preference for DPSK instead of a coherent
system is that the wireless mobile channel introduces random phase shift
(Yue 1983).
THE CAPACITY OF A CDMA SYSTEM • 113

Capacity, bits per channel use

20 a

b
0 c
d
e
-20

a: Exact
-40 b: Bound 4
C c: Bound 2
-60 d: Bound 1
e: Bound 3

-80

-100

-120
0 5 10 15 20
Eb/No, dB

Figure 7.12. Bounds on the capacity, as a function of the SNR (Eb /N0 ).

It is instructive to compare the new bound in Equation 7.77 with existing

bounds and approximations on the sum capacity and with the exact solution.
This is done in Figure 7.13, in which the upper curve (solid line) is the exact
solution and the previous bounds are given below, for M = 20 users and
N = 100

1 1 −Eb /σI
C ≈ −M log + e , (7.88)
2 2

M Eb M Eb M 5π e
C≥ log 1 + − log 1 + − log (7.89)
2 σI 2 5σI 2 24

and

3π Eb −1/2
C ≥ M − Me−3Eb /4σI
4σI
−1/2
−3Eb /4σI 3π Eb
− Mh e , (7.90)
4σI
114 • INFORMATION THEORY

Capacity, bits per channel use

.6 Numerical Solution
Equation 11
C Equation 12
Equation 13
.4 Equation 14

0 5 10 15 20 25 30
Eb/No, dB

Figure 7.13. Comparison between the new and existing bounds, as a

function of the SNR (Eb /N0 ), for M = 20 and N = 100.

in which σI is given by Equation 7.24. Equation 7.13 is the corrected

version of the bound given by Shamai, Ozarow, and Wyner (1991) and
h(x) = −x log x − (1 − x) log (1 − x) is the binary entropy function (Ozarow
and Wyner 1990).
Substitution of the appropriate parameters and simpliﬁcation of the
resulting expressions yields
6NEb /N0

1 1 −
C ≈ −M log + e 2(M −1)Eb /N0 +3N
, (7.91)
2 2

M 6NEb /N0
C≥ log 1 +
2 2(M − 1)Eb /N0 + 3N

M 1 6NEb /N0 (7.92)
− log 1 +
2 5 2(M − 1)Eb /N0 + 3N

M 5π e
− log
2 24
THE CAPACITY OF A CDMA SYSTEM • 115

and
−1/2
− 34
6NEb /N0
3π 6NEb /N0
C ≥ M − Me 2(M −1)Eb /N0 +3N
4 2(M − 1)Eb /N0 + 3N
6NEb /N0
−3
− Mh e 4 2(M −1)Eb /N0 +3N

−1/2 ⎤
3π 6NEb /N0 ⎦
× (7.93)
4 2(M − 1)Eb /N0 + 3N

Equation 7.77 compares favorably with respect to the former bounds and
approximations represented by Equations 7.91, 7.92, and 7.93. In fact,
Equation 7.77 fits the exact solution very well, even for SNRs as low
as 0 dB.
The objective of this chapter was the evaluation of the sum capacity of
a CDMA system with a fixed number of users. A more realistic analysis
of a CDMA system should include a variable allocation of users (Alencar
and Blake 1993a).
New approaches, for the analysis of CDMA systems, were introduced
in this chapter. One of the important new results is related to the evaluation
of the sum capacity for the CDMA channel. Also, new limits were found
with explicit formulas for the upper and lower bounds on the capacity of
the channel in terms of the probability of error. A tight bound was found,
that allows a computer efficient evaluation of the capacity.
CHAPTER 8

THEORETICAL CRYPTOGRAPHY

8.1 INTRODUCTION
Cryptography is the basis for network security. It comes from the Greek
word kryptos, for tomb, hidden, or secret, combined with graphein, or
writing. A free translation is hidden writing, which defines the proposal of
cyphering a message. Cryptology, combines the same previous prefix with
logia, meaning study, to give the translation of study of secret coding.
Cryptography is the practice and study of techniques for secure commu-
nication in the presence of third parties. It is also a collection of techniques
for construction and analysis of protocols to overcome the influence of jam-
mers and which are related to various aspects of information security, such
as data confidentiality, data integrity, authentication, and nonrepudiation.
Along the centuries, governments used cryptography to cypher mes-
sages, supposed to remain in secrecy, while the spies tried to decipher them.
During the wars, cryptography becomes more important, considering the
nature of the operations.
Bletchley Park was the secret information and counter-information cen-
ter in Great Britain during the Second World War. It was so secret that it
remained as a legend after the end of the war, while England continued
to break the codes of other countries, either enemies or friends (Haykin
1999).
The first computer, the Collossus, was built in Bletchley Park, devel-
oped by Alan Mathison Turing (1912–1954), one of the greatest computer
geniuses of the World. Destroyed after the end of the war, the Collosus was
rebuilt from the original drawings.
The German Enigma code was broken there with the help of Turing,
who benefited from the information passed by a Hungarian mathematician
118 • INFORMATION THEORY

and a German ofﬁcer, who provided information to French spies in ex-

change for money. The French did not know how to decipher the code, and
passed on the information to the British.

8.2 CRYPTOGRAPHIC ASPECTS OF COMPUTER

NETWORKS
The Internet has revolutionized the ways in which companies do business,
since the Internet Protocol (IP) is undeniably efficient, inexpensive, and
flexible. This section presents the main cryptographic aspects of modern
TCP and IP computer networks, which include digital signature technol-
ogy based on asymmetrical cryptographic algorithms, data confidentiality
by applying symmetrical cryptographic systems, and system Public Key
Infrastructure (PKI) (Tanenbaum 2003).
The possible vulnerabilities of TCP and IP computer networks, and
possible techniques to eliminate them, are considered. It seem that only a
general and multilayered security infrastructure could cope with possible
attacks to the computer network systems.

8.2.1 POTENTIAL VULNERABILITIES OF COMPUTER

NETWORKS

The IP is efﬁcient, inexpensive, and ﬂexible. However, the existing meth-

ods used to route IP packets leave them vulnerable to security risks, such
as spoofing, sniffing, and session hijacking, and provide no form of nonre-
pudiation for contractual or monetary transactions.
Organizations need to secure communications between remote offices,
business partners, customers, and travelling and telecommuting employees,
besides securing the internal environment. The transmission of messages
over the Internet or intranet poses a risk, given the lack of protection at the
existing Internet backbone.
Control and management of security and access between the entities
in a company’s business environment is important. Without security, both
public and private networks are susceptible to unauthorized monitoring and
access. Internal attacks might be a result of minimal or nonexistent intranet
security.
Risks from outside the private network originate from connections to
the Internet and extranets. Password-based user access controls alone do
not protect data transmitted across a network. Without security measures
and controls in place, the data might be subjected to attack. Some attacks
THEORETICAL CRYPTOGRAPHY • 119

Cryptanalysis

Message Encryption Decryption Destination

source X Y X

Key generation

Figure 8.1. General model for a cryptosystem.

are passive, and the information is only monitored. Other attacks are active,
and the information is altered to corrupt or destroy the data or the network
itself.

8.3 PRINCIPLES OF CRYPTOGRAPHY

The general model for an encryption system is shown in Figure 8.1. The
original message, also called plaintext and denoted by the letter X , is con-
verted to an apparently random sequence called ciphertext and denoted by
the letter Y . The encryption process consists of an encipherment algorithm
that uses a key K, which is independent of the plaintext. The algorithm
produces a different output depending on the key used (Stallings 1999).
It is important to consider that a cryptanalyst, or hacker, could observe
the encrypted message and try to decode it. The algorithm must, therefore,
be powerful enough to resist the attack.
Encryption can be seen as a transformation, parametrized by K,
Y = E(X , K), (8.1)
which converts the original text X to the ciphertext Y , to be transmitted.
At the reception side, the inverse transformation is performed on the
ciphertext to produce the plaintext
X = D(Y , K). (8.2)
Of course, the destination must possess the key to be able to invert the
transformation. Therefore, the transmission of the key is an important part
of the cryptobusiness.
120 • INFORMATION THEORY

The security of a cryptosystem depends on certain assumptions:

• The encryption algorithm must be robust enough that is difﬁcult to

decipher a message based only on the ciphertext.
• The key must be kept secret, and a secure channel should be pro-
vided for transmission.
• There is no need to keep the algorithm secret.

The application of cryptographic techniques depends on the knowledge

of the security of the employed system. A cryptographic system can be
classiﬁed as:

1. How to transform the plaintext to the ciphertext, as the encryption

algorithms are based on two principles (Stallings 1999):

a. Substitution, in which the individual symbols of the original

text (bit, byte, letter, words) are mapped into other elements.
b. Transposition, in which the elements of the original text are
rearranged.

2. How the original text is processed, which involves a block cipher,

that processes a block of plaintext at a time, or a convolutional
cipher, also called stream cipher that processes the input symbols
as they arrive at the encryption subsystem.
3. How the key is generated, which can lead to the production of
a single, or symmetric, key to be used by both the transmitter and
receiver, or use different keys for the sender and receiver, also called
asymmetric or public key encryption.

8.4 INFORMATION THEORETICAL ASPECTS OF

CRYPTOGRAPHY
It is possible to use the terminology of information theory to describe a
secure encryption system. The measure of information, or entropy, of a
plaintext, considered as a subset of symbols selected from the set of all
possible messages, is given by (van der Lubbe 1997)

L
H (X ) = − p(xl ) log p(xl ), (8.3)
l=1

in which p(xl ), l = 1, 2, . . . L represent the uncertainty of occurrence of the

possible plaintexts, considering that the texts are mutually independent.
THEORETICAL CRYPTOGRAPHY • 121

The entropy of the set of keys is given by

M
H (K) = − p(km ) log p(km ), (8.4)
m=1

in which km , m = 1, 2, . . . M are the possible keys.

By the same token, it is possible to introduce a measure of information
for the ciphertext

N
H (Y ) = − p(yn ) log p(yn ), (8.5)
n=1

in which p(yn ), n = 1, 2, . . . N represent the uncertainty of occurrence of the

possible ciphertexts, considering that the texts are mutually independent.
Usually, regarding the bijection from the plaintext into the ciphertext set,
one considers N = L.
The conditional entropy, or key equivocation, is a measure of the uncer-
tainty or information with respect to the key, when the ciphertext is available
is deﬁned as

M
N
H (K|Y ) = p(km , yn ) log p(km |ym ). (8.6)
m=1 n=1

The uncertainty with respect to the plaintext, or conditional entropy

when the ciphertext is available, also known as message equivocation, is
deﬁned as
L N
H (X |Y ) = p(xl , yn ) log p(xl |ym ). (8.7)
l=1 n=1

The conditional entropy, or uncertainty with respect to the key, for a

given plaintext and corresponding ciphertext, is deﬁned as

M
L
N
H (K|X , Y ) = p(km , xl , yn ) log p(km |xl , ym ), (8.8)
m=1 l=1 n=1

which is also known as key appearance equivocation.

Finally, the conditional entropy, or uncertainty with respect to the plain-
text, when the key and the ciphertext are known, is deﬁned as

L
N
M
H (X |Y , K) = p(xm , yl , kn ) log p(xm |yl , km ). (8.9)
l=1 n=1 m=1
122 • INFORMATION THEORY

Since there is a bijection between the plaintext and ciphertext sets, it is

always true that
H (X |Y , K) = 0. (8.10)

As expected, when the ciphertext and the key are available it is possible
to retrieve the plaintext correctly, because the uncertainty with respect to
X is null. There is no loss of information whatsoever at the receiver, and
the full original message is recovered.

8.4.1 RELATIONS BETWEEN THE ENTROPIES

A cipher system user requires that the key appearance equivocation, or

H (K|X , Y ), be as high as possible, because if a cryptanalyst manages
to obtain both the plaintext and the ciphertext, then the uncertainty with
respect to the key must be as large as possible.
This can be seen by the following argument. From the previous results
in information theory, the joint measure of information in the plaintext,
ciphertext, and key is

H (X , Y , K) = H (X |Y , K) + H (Y , K) = H (K|X , Y ) + H (X , Y ). (8.11)

Also, consider the following relations

H (Y , K) = H (K|Y ) + H (Y ), (8.12)

and
H (X , Y ) = H (X |Y ) + H (Y ). (8.13)

Combining the equations, one obtains

H (X |Y , K) + H (K|Y ) = H (K|X , Y ) + H (X |Y ). (8.14)

But, as discussed, H (X |Y , K) = 0, which results in

H (K|Y ) = H (K|X , Y ) + H (X |Y ), (8.15)

which leads to
H (K|X , Y ) = H (K|Y ) − H (X |Y ). (8.16)
THEORETICAL CRYPTOGRAPHY • 123

Therefore, to obtain a large key appearance equivocation the message

equivocation H (X |Y ) must be small. However, a small message equivo-
cation implies that the uncertainty with respect to the plaintext, when the
ciphertext is known, is small. But this is what must be avoided by the
cryptosystem.
The uncertainty with respect to the key must be large to decrease the
uncertainty with respect to the plaintext, and an increase in the uncertainty
with respect to the plaintext decreases the uncertainty with respect to the
key (van der Lubbe 1997).

8.5 MUTUAL INFORMATION FOR

CRYPTOSYSTEMS
The information theoretical approach provides additional results. The
mutual information of the plaintext and the ciphertext is deﬁned as

I (X ; Y ) = H (X ) − H (X |Y ) = H (Y ) − H (Y |X ). (8.17)

As expected, the objective of the cryptodesigner is to minimize the

mutual information I (X ; Y ). If the ciphertext provides no information about
the original message, then

H (X |Y ) = H (X ),

and the mutual information between the plaintext and the encoded mes-
sage is zero, or I (X ; Y ) = 0. This is referred to as the absolutely secure
cryptosystem.
It is possible to obtain a lower limit for the mutual information between
the plaintext and the ciphertext. First, consider Equation 8.16, and since
H (K|X , Y ) ≥ 0, it follows that

H (K|Y ) ≥ H (X |Y ). (8.18)

The use of one of the properties of entropy gives

H (K) ≥ H (K|Y ), (8.19)

and therefore,
H (K) ≥ H (X |Y ). (8.20)
Substituting this last inequality into Equation 8.17 gives

I (X ; Y ) ≥ H (X ) − H (K). (8.21)
124 • INFORMATION THEORY

Inequality 8.21 implies that a decrease in the uncertainty of a set of

keys improves, on average, the independence between the plaintext and the
ciphertext.
It is implicit in the derivation that absolute security of a cryptosystem
can only be achieved for

H (K) ≥ H (X ). (8.22)

For uniformly distributed keys

M
H (K) = − p(km ) log p(km ) = log (M ), (8.23)
m=1

and for uniformly distributed plaintexts

L
H (X ) = − p(yl ) log p(yl ) = log (L). (8.24)
l=1

Therefore,

H (K) = log (M ) ≥ H (X ) = log (L), or M ≥ L, (8.25)

because of the monotonicity of the logarithm function.

The last condition implies that the key must be at least the same length
as the message to be transmitted. However, the inequality can be avoided
with the introduction of security events. When a security event happens the
system can be considered absolutely secure. If the event does not occur the
cryptosystem may not be fully secure (Maurer 1989; Massey 1990).
Even when there is a small chance of a security breach, an absolutely
secure cipher is still feasible, for which H (K) < H (X ). A security event
could be the transmission of a certain key K, followed by the plaintext,
provided that the hacker does not type the secret key at the beginning.
Without typing K, the sequence is obscure to the intruder, and the system
is secure. If the hacker happens to type K, the ciphertext can be deciphered,
and the system has become unreliable.
Therefore, the introduction of the concept of security event gives a new
way of regarding the system security based on information theorems. Infor-
mation security can be assured, even if the length of the key is smaller than
the original sequence, given that a security event is deﬁned for the system,
and provided that the event actually occurs (van der Lubbe 1997).
APPENDIX A

PROBABILITY THEORY

A.1 SET THEORY AND MEASURE

Georg Cantor (1845–1918) developed the modern theory of sets at the end
of the 19th century, and established the mathematical and logical basis for
the theory and demonstrated several important results. The concept of set
cardinality, defined by Cantor, was fundamental to the theory. Cantor was
born in Saint Petersburg, but lived most of his academic life in the city of
Halle, Germany (Boyer 1974). The basic ideas of universal set, empty set,
set partition, discrete systems, continuous systems, and infinity are as old
as humanity itself.
Over the years, philosophers and mathematicians had tried to charac-
terize the infinite, with no sucess. In 1872, J. W. R. Dedekind (1831–1916)
indicated the universal property of infinite sets. He stated that a set is called
infinite when it is similar to a part of itself, on the contrary the set is finite
(Boyer 1974).
Cantor also investigated the properties of infinite sets but, different from
Dedekind, he noticed that the infinite sets are not the same. This led to the
concept of cardinal numbers, to establish a hierarchy of infinite sets in
accordance with their respective powers. Cantor ideas established the set
theory as a complete subject. As a consequence of his published results on
transfinite arithmetic, considered advanced for his time, Cantor suffered
attacks from mathematicians like Leopold Kronecker (1823–1891), who
barred him for a position at the University of Berlin.
Cantor found a position at the ancient and small University of Halle,
in the medieval city of Halle in Germany famous for its mines of rock
salt, and died there in an institution for mental health treatment, following
his attempts to apply his theory to justify religious paradigms in scientific
events.
125
126 • INFORMATION THEORY

A.1.1 BASIC SET THEORY

The notion of a set, as simple as it can be, is axiomatic, in a sense that

as it does not admit a deﬁnition that does not resort to the original notion
of a set. The mathematical concept of a set is fundamental for all known
mathematics, and is used to build important concepts, such as relation,
cartesian product and function. It is also the basis for the modern measure
theory, developed by Henry Lebesgue (1875–1941).
The set theory is based on a set of fundamental axioms: Axiom of
Extension, Axiom of Speciﬁcation, Peano’s Axioms, Axiom of Choice,
besides Zorn’s Lemma, and Schröder-Bernstein’s Theorem (Halmos,
1960).
The objective of this section is to present the theory of sets in an informal
manner, just quoting those fundamental axioms, since this theory is used
as a basis to establish a probability measure. Some examples of common
sets are given in the following.

The binary set: B = {0, 1};

The set of natural numbers, including zero: N = {0, 1, 2, 3, . . . };
The set of odd numbers: O = {1, 3, 5, 7, 9, . . . };
The set of integer numbers: Z = {. . . , −3, −2, −1, −2, 0, 1,
2, 3, . . . };
The set of real numbers: R = ( − ∞, ∞).

There are two really important relations in set theory: the belonging
relation, denoted as a ∈ A, in which a is an element of the set A, and the
inclusion relation, A ⊂ B, which is read “A is a subset of the set B”, or B
is a super set of the set A. Sets may also be speciﬁed by propositions. For
example, the empty set can be written as ∅ = {a | a = a}, that is, the set in
which the elements are different from themselves.
A universal set contains all other sets of interest. An example of a uni-
versal set is provided by the sample space in probability theory, usually
denoted as S or . The empty set is that set which contains no element,
usually denoted as ∅ or { }. It is implicit that the empty set is contained in
any set, that is, ∅ ⊂ A, for any given set A. However, the empty set is not in
general an element of any other set.
A usual way to represent sets is by means of the Venn diagram, as
illustrated in Figure A.1.
Sets are said to be disjoint if they have no element in common, as
illustrated in Figure A.2. Thus, for example, the set of even natural numbers
and the set of odd natural numbers are disjoint.
PROBABILITY THEORY • 127

A A∩B B

Figure A.1. A Venn diagram that represents two intersecting sets.

A B

Figure A.2. A Venn diagram representing disjoint sets.

A.1.2 SOME OPERATIONS ON SETS

It is possible to operate on sets to produce new sets, or families of sets.

The basic set operations are the complement, the union, the intersection,
the subtraction, and the symmetric difference.

• The operation A represents the complement of A with respect to the

sample space ;
• The union of two sets is composed of elements that belong to A or
to B, and is written as A ∪ B;
• The intersection of two sets is composed of elements that belong to
A and to B, and is written as A ∩ B;
• The subtraction of sets, denoted by C = A − B, gives as a result the
set in which the elements belong to A and do not belong to B.
Note: If B is completely contained in A then A − B = A ∩ B;
• The symmetric difference is deﬁned as the set of elements that
belong to A and to B, but do not belong to (A ∩ B). It is written
commonly as A B = A ∪ B − A ∩ B.

The generalization of these concepts to families of sets, as for example

∪N
i=1 Ai and ∩i=1 Ai , is immediate. The following properties are usually
N

employed as axioms in developing the theory of sets (Lipschutz 1968).

• Idempotent
A ∪ A = A, A ∩ A = A
128 • INFORMATION THEORY

Ai A

Figure A.3. Increasing sequence of sets.

A Ai

Figure A.4. Decreasing sequence of sets.

• Associative
(A ∪ B) ∪ C = A ∪ (B ∪ C), (A ∩ B) ∩ C = A ∩ (B ∩ C)
• Commutative
A ∪ B = B ∪ A, A ∩ B = B ∩ A
• Distributive
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C),
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
• Identity
A ∪ ∅ = A, A ∩ U = A
A ∪ U = U, A ∩ ∅ = ∅
• Complementary
A ∪ A = U , A ∩ A = ∅ (A) = A
U = ∅, ∅ = U
• de Morgan laws
A ∪ B = A ∩ B, A ∩ B = A ∪ B

A.1.3 FAMILIES OF SETS

The concept of family is important to characterize ﬁnite or inﬁnite combi-

nations of sets. The increasing sequence of sets, such that limi→∞ ∪ Ai = A,
is one of the most useful families of sets. This sequence is used in proofs
of limits over sets.
The decreasing sequence of sets is deﬁned in a similar manner, as
limi→∞ ∩ Ai = A, and is also used in proofs of limits over sets.
PROBABILITY THEORY • 129

A.1.4 INDEXING OF SETS

The Cartesian product is useful to express the idea of indexing of sets.

Indexing expands the possibilities for the use of sets, and permits to gen-
erate new entities, such as vectors and signals.
Example: consider Ai = {0, 1}. Starting from this set it is possible to con-
struct an indexed sequence of sets by defining its indexing: {AiI }, I =
{0, . . . , 7}. This family of indexed sets Ai constitutes a finite discrete
sequence, that is, a vector.
Example: again, let Ai = {0, 1}, but now use I = Z, the set of positive
and negative integers plus zero. It follows that {AiZ }, which represents an
infinite series of 0s and 1s, that is, it represents a binary digital signal. For
example, · · · 0011111000 · · · .
Example: using the same set Ai = {0, 1}, but now indexing over the set of
real numbers, {AiI }, in which I = R, it is possible to form a signal which
is discrete in amplitude but continuous in time.
Example: considering A = R and I = R, the resulting set represents an
analog signal—a signal that is continuous in time and in amplitude.

A.1.5 AN ALGEBRA OF SETS

For the construction of an algebra of sets or, equivalently, for the con-
struction of a ﬁeld over which operations involving sets make sense, a few
properties have to be obeyed.

1. If A ∈ F then A ∈ F. A is the set containing desired results, or over

which one wants to operate;
2. If A ∈ F and B ∈ F then A ∪ B ∈ F.

The properties guarantee the closure of the algebra with respect to ﬁnite
operations over sets. It is noticed that the universal set always belongs to
the algebra, that is, ∈ F, because = A ∪ A. The empty set also belongs
to the algebra, that is, ∅ ∈ F, since ∅ = , follows by property 1.
Example: the family {∅, } complies with the above properties and there-
fore represents an algebra. In this case, ∅ = {} and ∅ = . The union is also
represented, as can be easily checked.
Example: given the sets {CH } and {CT }, representing the faces of a coin,
respectively, if {CH } ∈ F then {CH } = {CT } ∈ F. It follows that {CH , CT } ∈
F ⇒ ∈ F ⇒ ∅ ∈ F.
130 • INFORMATION THEORY

The previous example can be explained by the following argument. If

there a measure for heads then there must be also a measure for tails, if the
algebra is to be properly defined. Whenever a probability is assigned to an
event then a probability must also be assigned to the complementary event.
The cardinality of a finite set is defined as the number of elements
belonging to this set. Sets with an infinite number of elements are said to
have the same cardinality if they are equivalent, that is, A ∼ B if A = B.
Some examples of sets and their respective cardinals are presented next.

• I = {1, . . . , k} ⇒ CI = k
• N = {0, 1, . . .} ⇒ CN or ℵ0
• Z = {. . . , −2, −1, 0, 1, 2, . . .} ⇒ CZ
• Q = {. . . , −1/3, 0, 1/3, 1/2, . . .} ⇒ CQ
• R = ( − ∞, ∞) ⇒ CR or ℵ

For the above examples the following relations are veriﬁed: CR > CQ =
CZ = CN > CI . The notation ℵ0 , for the cardinality of the set of natural
numbers was ﬁrst employed by Cantor.
The cardinality of the power set, that is, of the family of sets consisting
of all subsets of a given set I , F = 2I , is 2CI .

A.1.6 THE BOREL ALGEBRA

The Borel algebra, established by Félix Edouard Juston Émile Borel (1871–
1956), and written as B, or σ -algebra, is an extension of the algebra so far
discussed to operate with limits at inﬁnity. The following properties are
required from a σ -algebra.

1. A ∈ B ⇒ A ∈ B
!
2. Ai ∈ B ⇒ ∞ i=1 Ai ∈ B

The properties guarantee the closure of the σ -algebra with respect to

enumerable operations over sets. They allow the definition of limits in the
Borel field.
Example: considering the above properties it can be verified that A1 ∩
A2 ∩ A3 · · · ∈ B. In effect, it is sufficient to notice that

A∈B and B ∈ B ⇒ A ∪ B ∈ B,

and
A∈B and B ∈ B ⇒ A ∪ B ∈ B,
PROBABILITY THEORY • 131

and ﬁnally
A ∪ B ∈ B ⇒ A ∩ B ∈ B.
In summary, any combination of unions and intersections of sets belongs
to the Borel algebra. In other words, operations of union or intersection of
sets, or a combination of these operations, produce a set that belongs to the
σ -algebra.

A.2 BASIC PROBABILITY THEORY

The first known published book on probability is De Ludo Aleae (About
Games of Chance) by the Italian medical doctor and mathematician Giro-
lamo Cardano (1501–1576), which came out in 1663, almost 90 years after
his death. This book was a handbook for players, containing some discus-
sion on probability.
The first mathematical treatise about the Theory of Probability, pub-
lished in 1657, was written by the Dutch scientist Christian Huygens (1629–
1695), a folder titled De Ratiociniis in Ludo Aleae (About Reasoning in
Games of Chance).
Abraham de Moivre (1667–1754) was an important mathematician who
worked on the development of Probability Theory. He wrote a book of great
influence in his time called Doctrine of Chances. The law of large numbers
was discussed by Jacques Bernoulli (1654–1705), Swiss mathematician,
in his work Ars Conjectandi (The Art of Conjecturing).
The study of probability was improved in the 18th and 19th centuries,
being worth of mentioning the works of French mathematicians Pierre-
Simon de Laplace (1749–1827) and Siméon Poisson (1781–1840), as well
as the German mathematician Karl Friedrich Gauss (1777–1855).

A.2.1 THE AXIOMS OF PROBABILITY

The basic axioms of probability were established by Andrei Nikolaevich

Kolmogorov (1903–1987), and allowed the development of the complete
theory. The three statements are as follows (Papoulis 1983):

1. Axiom 1 – P() = 1, in which denotes the sample space or uni-

versal set and P( · ) denotes the associated probability;
2. Axiom 2 – P(A) ≥ 0, in which A denotes an event belonging to the
sample space;
3. Axiom 3 – P(A ∪ B) = P(A) + P(B), in which A and B are mutually
exclusive events and A ∪ B denotes the union of events A and B.
132 • INFORMATION THEORY

Kolmogorov established a ﬁrm mathematical basis on which other the-

ories rely, including the Theory of Stochastic Processes, the Communica-
tions Theory, and the Information Theory, that use his axiomatic approach
to Probability Theory.
Kolmogorov’s fundamental work was published in 1933 in Russian,
and soon afterwards was translated to German with the title Grundbegriffe
der Wahrscheinlichkeits Rechnung (Fundamentals of Probability Theory)
(James 1981). In this work, Kolmogorov managed to combine Advanced
Set Theory, developed by Cantor, with Measure Theory, established by
Lebesgue, to produce what to this date is the modern approach to Proba-
bility Theory.
The application of the axioms makes it possible to deduce all results
relative to Probability Theory. For example, the probability of the empty
set, ∅ = {}, is easily calculated as follows. First it is noticed that

∅ ∪ = ,

since the sets ∅ and are disjoint. Thus, it follows that

P(∅ ∪ ) = P() = P(∅) + P() = 1 ⇒ P(∅) = 0.

In the case of sets A and B which are not disjoint, it follows that

P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

A.2.2 BAYES’ RULE

Bayes’ rule, which is essential for the development of Information Theory,

concerns the computation of conditional probabilities and can be expressed
by the following deﬁnition,
P(A ∩ B)
P(A|B) = ,
P(B)
assuming P(B) = 0.
An equivalent manner of expressing the same result is the following,

P(A ∩ B) = P(A|B) · P(B), P(B) = 0.

Some important properties of sets are presented next, in which A and B

denote events from a given sample space.

• If A is independent of B, then P(A|B) = P(A). It then follows that

P(B|A) = P(B) and that B is independent of A.
• If B ⊂ A, then P(A|B) = 1.
PROBABILITY THEORY • 133

B Ai

Figure A.5. Partition of set B by a family of sets {Ai }.

P(A)
• If A ⊂ B, then P(A|B) = P(B) ≥ P(A).
• If A and B are independent events then P(A ∩ B) = P(A) · P(B).
• If P(A) = 0 or P(A) = 1, then event A is independent of itself.
• If P(B) = 0, then P(A|B) can assume any arbitrary value. Usually
in this case one assumes P(A|B) = P(A).
• If events A and B are disjoint, and nonempty, then they are depen-
dent.

A partition is a possible splitting of the sample space into a family of

subsets, in a manner that the subsets in this family are disjoint and their
union coincides with the sample space. It follows that any set in the sample
space can be expressed by using a partition of that sample space, and thus
be written as a union of disjoint events.
The following property can be illustrated by means of the Venn diagram,
as illustrated in Figure A.5.

B = B ∩ = B ∩ ∪M
i=1 Ai = ∪i=1 B ∩ Ai .
N

It now follows that

N
P(B) = P( ∪N
i=1 B ∩ Ai ) = P(B ∩ Ai ),
i=1
P(Ai ∩ B) P(B|Ai ) · P(Ai ) P(B|Ai ) · P(Ai )
P(Ai |B) = = N = N .
i=1 P(B ∩ Ai ) i=1 P(B|Ai ) · P(Ai )
P(B)

A.3 RANDOM VARIABLES

A random variable (r.v.) X represents a mapping of the sample space on
the line, that is, the set of real numbers. A random variable is usually
characterized by a cumulative probability function (CPF) PX (x), or by a
probability density function (pdf) pX (x).
134 • INFORMATION THEORY

Example: a random variable with a uniform pdf, in the interval

[0, 1], is described by the formula pX (x) = u(x) − u(x − 1). It follows, by
Axiom 1, that
+∞
pX (x)dx = 1. (A.1)
−∞

In general, for a given probability distribution, the probability that X

belongs to the interval (a, b] is given by
b
P(a < x ≤ b) = pX (x)dx. (A.2)
a

The cumulative probability function PX (x), of a random variable X , is

deﬁned as the integral of pX (x),
x
PX (x) = pX (t)dt. (A.3)
−∞

A.3.1 EXPECTED VALUE OF A RANDOM VARIABLE

Let f (X ) denote a function of a random variable X . The average value, or

expected value, of the function f (X ) with respect to X is deﬁned as
+∞
E[f (X )] = f (x)pX (x)dx. (A.4)
−∞

The following properties of the expected value follow from (A.4).

E[αX ] = αE[X ], (A.5)

E[X + Y ] = E[X ] + E[Y ] (A.6)

and if X and Y are independent random variables then

E[XY ] = E[X ] · E[Y ]. (A.7)

A.3.2 MOMENTS OF A RANDOM VARIABLE

The kth moment of a random variable X is deﬁned as

+∞
mk = E[X ] =
k
xk pX (x)dx. (A.8)
−∞
PROBABILITY THEORY • 135

Various moments of X have special importance and physical interpre-

tation, as deﬁned in the following:

• E[X ], arithmetic mean, average value, average voltage, statistical

mean;
• E[X 2 ], quadratic mean, total power;
• E[X 3 ], measure of asymmetry of the pdf;
• E[X 4 ], measure of ﬂatness of the pdf.

A.3.3 VARIANCE OF A RANDOM VARIABLE

The variance of a random variable X is an important quantity in commu-

nication theory, usually meaning AC power, and deﬁned as follows,

V [X ] = σX2 = E[(X − E[X ])2 ]. (A.9)

The standard deviation σX is deﬁned as the square root of the variance of X .

A.3.4 CHARACTERISTIC FUNCTION

The characteristic function PX (w), also called moment generating func-

tion, of a random variable X is usually deﬁned based on the Fourier trans-
form of the pdf of X , which is equivalent to substitute f (x) = e−jωx in (A.4),
that is,
+∞
−jωx
√
PX (w) = E[e ]= e−jωx pX (x)dx, in which j = −1. (A.10)
−∞

The statistical moments of a random variable X can also be obtained

directly from then characteristic function, as follows,
1 ∂ i PX (w)
mi = . (A.11)
( − j)i ∂wi w=0

Given that X is a random variable, it follows that Y = f (X ) is also a

random variable, obtained by the application of the transformation f ( · ).
The pdf of Y is related to that of X by the formula (Blake 1987)
pX (x)
pY (y) = , (A.12)
|dy/dx| x=f −1 (y)
in which f −1 ( · ) denotes the inverse function of f ( · ).This formula assumes
the existence of the inverse function of f ( · ) as well as its derivative in all
points.
136 • INFORMATION THEORY

A.3.4.1 Two Important Distributions

1. Gaussian random variable

The random variable X with pdf
(x−mX )2
1 −
2σX2
pX (x) = √ e (A.13)
σX 2π
is called a Gaussian (or Normal) random variable. The Gaussian
random variable plays an extremely important role in engineering,
considering that many well-known processes can be described or
approximated by this pdf. The noise present in either analog or dig-
ital communications systems usually can be considered Gaussian
as a consequence of the inﬂuence of many factors (Leon-Garcia
1989). In Formula (A.13), mX represents the average value and σX2
represents the variance of X .
2. Sinusoidal random variable

A sinusoidal tone X (t) = V cos (ω0 t + φ), in which V represents a

constant amplitude, ω0 is a constant frequency, and φ is a uniformly
distributed random variable, has the following pdf
1
pX (x) = √ , |x| < V . (A.14)
π V 2 − x2

A.3.5 JOINT RANDOM VARIABLES

Consider that X and Y represent a pair of real random variables, with joint
pdf pXY (x, y), as illustrated in Figure A.6. The probability of x and y being
simultaneously in the region deﬁned by the polygon [abcd] is given by the
expression (Alencar 2014).
b d
Prob(a < x < b, c < y < d) = pXY (x, y)dxdy. (A.15)
a c

The individual pdf ’s of X and Y , also called marginal distributions,

result from the integration of the joint pdf as follows,
+∞
pX (x) = pXY (x, y)dy, (A.16)
−∞

and +∞
pY (y) = pXY (x, y)dx. (A.17)
−∞
PROBABILITY THEORY • 137

pXY(x,y)

0 X

Figure A.6. Joint probability density function.

The joint average E[f (X , Y )] is calculated as

+∞ +∞
E[f (X , Y )] = f (x, y)pXY (x, y)dxdy, (A.18)
−∞ −∞

for an arbitrary function f (X , Y ) of X and Y .

The joint moments mik , of order ik, are calculated as
+∞ +∞
mik = E[X i , Y k ] = xi yk pXY (xy)dxdy. (A.19)
−∞ −∞

The two-dimensional characteristic function is deﬁned as the two-

dimensional Fourier transform of the joint probability density pXY (x, y)

PXY (ω, ν) = E[e−jωX −jνY ]. (A.20)

When the sum Z = X + Y of two statistically independent r.v.’s is con-

sidered, it is noticed that the characteristic function of Z turns out to be

PZ (ω) = E[e−jωZ ] = E[e−jω(X +Y ) ] = PX (ω) · PY (ω). (A.21)

As far as the pdf of Z is concerned, it can be said that

∞
pZ (z) = pX (ρ)pY (z − ρ)dρ, (A.22)
−∞

or ∞
pZ (z) = pX (z − ρ)pY (ρ)dρ. (A.23)
−∞
138 • INFORMATION THEORY

Equivalently, the sum of two statistically independent r.v.’s has a pdf

given by the convolution of the respective pdf ’s of the r.v.’s involved in
the sum.
The random variables X and Y are called uncorrelated if E[XY ] = E[X ] ·
E[Y ]. The criterion of statistical independence of random variables, which
is stronger than correlation, is satisﬁed if pXY (x, y) = pX (x) · pY (y).
REFERENCES

Abramowitz, M., and I.A. Stegun, eds. 1965. Handbook of Mathematical

Functions. New York: Dover Publications Inc.
Abramson, N. 1963. Information Theory and Coding. New York:
McGraw-Hill.
Aczél, J., and Z. Daróczy. 1975. On Measures of Information and Their
Caracterizations. New York: Academic Press.
Adke, S.R., and S.M. Manjunath. 1984. An Introduction to Finite Markov
Processes. New Delhi: John Wiley and Sons.
Ahlswede, R. 1971. “Multi-way Communication Channels.” In Proceed-
ings of the 2nd International Symposium on Information Theory,
pp. 103–35. Tsahkad-sor, USSR: Akadémiai Kiadó..
Alencar, M.S. 1992a. “The Capacity Region for the Gaussian Channel with
Random Multipath.” In Proceedings of the Third IEEE International
Symposium on Personal, Indoor and Mobile Radio Communications –
PIMRC’92, pp. 483–87. Boston, MA.
Alencar, M.S. 1992b. “The Capacity Region of the Non-Cooperative Multi-
pleAccess Channel with Rayleigh Fading.” In Proceedings of theAnnual
Conference of the Canadian Institute for Telecommunications Research,
pp. 45–46. Sainte Adèle, Canada.
Alencar, M.S. 2014. Teoria de Conjuntos, Medida e Probabilidade. São
Paulo, Brasil: Edi-toraÉrica Ltda. ISBN 978-85-365-0715-6.
Alencar, M.S., and I.F. Blake. 1993a. “Analysis of the Capacity for a
Markovian Multiple Access Channel.” In IEEE Paciﬁc Rim Confer-
ence on Communications, Computers and Signal Processing, pp. 81–84.
Victoria, Canada.
Alencar, M.S., and I.F. Blake. 1993b. “Analysis of the Capacity Region
for the Non-Cooperative Multiaccess Channel with Rician Fading.” In
IEEE International Conference on Communications – ICC’93, pp. 282–
86. Geneva, Switzerland.

139
140 • REFERENCES

Alencar, M.S., and I.F. Blake. 1993c. “The Capacity of a Code Division
Multiple Access Channel.” In 1993 International Symposium on 139
Communications – ISCOM’93, Vol.1, pp. 1.1–1.9. Hsinchu,Taiwan.
Ash, R.B. 1965. Information Theory. New York: Dover Publications Inc.
Bierbaum, M., and H.-M. Wallmeier. 1979. “A Note on the Capacity Region
of the Multiple Access Channel.” IEEE Transactions on Information
Theory 25, no. 4, p. 484. doi: http://dx.doi.org/10.1109/tit.1979.
1056064
Blahut, R.E. 1987. Principles and Practice of InformationTheory. Reading,
MA: Addison-Wesley Publishing Co.
Blahut, R.E. 1990. Digital Transmission of Information. Reading, MA:
Addison-Wesley Publishing Co.
Blake, I.F. 1987. An Introduction to Applied Probability. Malabar, FL:
Robert E. Krieger Publishing Co.
Blake, I.F., and R.C. Mullin. 1976. An Introduction to Algebraic and Com-
binatorial Coding Theory. New York: Academic Press Inc.
Borth, D.E., and M.B. Pursley. 1979. “Analysis of Direct-Sequence Spread-
Spectrum Multiple-Access Communications OverRician Fading Chan-
nels.” IEEE Transactions on Communications 27, no. 10, pp. 1566–77.
doi: http://dx.doi.org/10.1109/tcom.1979.1094291
Boyer, C. 1974. História da Matemática. São Paulo, Brasil: Editora Edgard
Blucher Ltda.
Brady, D.P. 1991. “A Semiclassical Analysis of Optical Code Division
Multiple Access.” IEEE Transactions on Communications 39, no. 1,
pp. 85–93. doi: http://dx.doi.org/10.1109/26.68279
Cheng, R.S., and S. Verdú. 1991. “The Effect of Asynchronism on the
Total Capacity of Gaussian Multiple-access Channels.” In Proceedings
of the IEEE International Symposium on Information Theory, p. 211.
Budapest, Hungary.
Chung, H., and P.V. Kumar. 1990. “Optical Orthogonal Codes – New
Bounds and an Optimal Construction.” IEEE Transactions on Infor-
mation Theory 36, no. 4, pp. 866–73. doi: http://dx.doi.org/10.1109/
18.53748
Cook, C.E., F.W. Ellersick, L.B. Milstein, and D.L. Schilling. 1983. Spread-
Spectrum Communications. New York: IEEE Press.
Cook, C.E., and H.S. Marsh. 1983. “An Introduction to Spread Spec-
trum.” IEEE Communications Magazine 21, no. 2, pp. 8–16. doi: http://
dx.doi.org/10.1109/mcom.1983.1091346
Cover, T.M. 1972. “Broadcast Channels.” IEEE Transactions on Infor-
mation Theory 18, no. 1, pp. 2–14. doi: http://dx.doi.org/10.1109/tit.
1972.1054727
REFERENCES • 141

Cover, T.M., R.J. McEliece, and E.C. Posner. 1981. “Asynchronous

Multiple-Access Channel Capacity.” IEEE Transactions on Information
Theory 27, no. 4, pp. 409–13. doi: http://dx.doi.org/10.1109/tit.1981.
1056382
Csiszár, I., and J. Kórner. 1981. Information Theory: Coding Theorems for
Discrete Memoryless Systems. New York: Academic Press.
Dixon, R.C. 1984. Spread Spectrum Systems. New York: John Wiley &
Sons.
Dobrushin, R.L. 1961. “Mathematical Problems in the Shannon Theory of
Optimal Coding of Information.” In Proceedings of the 4th Berkeley
Symposium on Mathematics, Statistics and Probability, Vol. 1, pp. 211–
52. Berkeley, CA.
Elliott, E.O. 1965. “A Model of the SwitchedTelephone Network for Data
Communications.” The Bell System Technical Journal 44, no. 1, pp. 88–
109. doi: http://dx.doi.org/10.1002/j.1538-7305.1965.tb04139.x
Ericson, T. 1970. “A Gaussian Channel with Slow Fading.” IEEE Trans-
actions on Information Theory 16, no. 3, pp. 353–55. doi: http://dx.doi.
org/10.1109/tit.1970.1054448
Feinstein, A. 1958. Foundations of Information Theory. New York:
McGraw-Hill Book Company Inc.
Fritchman, B.D. 1967. “A Binary Channel Characterization Using Parti-
tioned Markov Chains.” IEEE Transactions on Information Theory 13,
no. 2, pp. 221–27. doi: http://dx.doi.org/10.1109/tit.1967.1053975
Gaarder, N.T., and J.K. Wolf. 1975. “The Capacity of a Multiple-Access
Discrete Memoryless Channel Can Increase with Feedback.” IEEETrans-
actions on Information Theory 21, no. 1, pp. 100–02. doi: http://
dx.doi.org/10.1109/tit.1975.1055312
Gallager, R.G. 1968. Information Theory and Reliable Communication.
New York: John Wiley and Sons Inc.
Gallager, R.G. 1985. “A Perspective on Multiaccess Channels.”IEEETrans-
actions on Information Theory 31, no. 2, pp. 124–42. doi: http://
dx.doi.org/10.1109/tit.1985.1057022
Geraniotis, E.A., and B. Ghaffari. 1991. “Performance of Binary and Qua-
ternary Direct-Sequence Spread Spectrum Multiple-Access Systems
with Random Signature Sequences.” IEEE Transactions on Communi-
cations 39, no. 5, pp. 713–24. doi: http://dx.doi.org/10.1109/26.87162
Geraniotis, E.A., and M.B. Pursley. 1985. “Performance of Coherent
Direct-Sequence Spread Spectrum Communications Over Specular
Multipath Fading Channels.” IEEE Transactions on Communications
33, no. 6, pp. 502–08. doi: http://dx.doi.org/10.1109/tcom.1985.
1096335
142 • REFERENCES

Geraniotis, E.A., and M.B. Pursley. 1986. “Performance of Noncoher-

ent Direct-Sequence Spread Spectrum Communications over Specular
Multipath Fading Channels.” IEEE Transactions on Communications
34, no. 6, pp. 219–26. doi: http://dx.doi.org/10.1109/tcom.1986.
1096528
Gilbert, E.N. 1960. “Capacity of a Burst-Noise Channel.” The Bell System
Technical Journal 39, no. 5, pp. 1253–65. doi: http://dx.doi.org/10.1002/
j.1538-7305.1960.tb03959.x
Gilhousen, K.S., I.M. Jacobs, R. Padovani, A.J. Viterbi, L.A. Weaver, and
C.E. Wheatley. 1991. “On the Capacity of a Cellular CDMA System.”
IEEE Transactions on Vehicular Technology 40, no. 2, pp. 303–12. doi:
http://dx.doi.org/10.1109/25.289411
Gold, R. 1967. “Optimal Binary Sequences for Spread Spectrum Multi-
plexing.” IEEE Transactions on Information Theory 13, no. 4, pp. 619–
21. doi: http://dx.doi.org/10.1109/tit.1967.1054048
Gradshteyn, I.S., and I.M. Ryzhik. 1990. Table of Integrals, Series, and
Products. San Diego, CA: Academic Press Inc.
Halmos, P.R. 1960. Naive Set Theory. Princeton, NJ: D. Van Nostrand
Company Inc.
Hartley, R.V.L. 1928. “Transmission of Information.” Bell SystemsTech-
nical Journal p. 535. doi: http://dx.doi.org/10.1002/j.1538-7305.1928.
tb01236.x
Haykin, S. 1988. Digital Communications. NewYork: John Wiley and Sons.
Haykin, S. 1999. The Code Book – The Science of Secrecy from Ancient
Egypt to Quantum Cryptography. New York: Doubleday.
Holtzman, J.M. 1992. “A Simple, Accurate Method to Calculate Spread-
Spectrum Multiple-Access Error Probabilities.” IEEE Transactions on
Communications 40, no. 3, pp. 461–64. doi: http://dx.doi.org/10.1109/
26.135712
Hui, J.Y.N. 1984a. “Multiple Accessing for the Collision Channel
Without Feedback.” IEEE Journal on Selected Areas in Communica-
tions 2, no. 4, pp. 575–82. doi: http://dx.doi.org/10.1109/jsac.1984.
1146089
Hui, J.Y.N. 1984b. “Throughput Analysis for Code Division Multiple
Accessing of the Spread Spectrum Channel.” IEEE Journal on Selected
Areas in Communications 2, no. 4, pp. 482–86. doi: http://dx.doi.org/
10.1109/jsac.1984.1146083
Hui, J.Y.N., and P.A. Humblet. 1985. “The Capacity Region of the Totally
Asynchronous Multiple-Access Channel.” IEEE Transactions on Infor-
mation Theory 31, no. 2, pp. 207–16.
James, B.R. 1981. Probabilidade: Um CursoEmNívelIntermediário. CNPq,
Rio de Janeiro, Brasil: Instituto de MatemáticaPura e Aplicada.
REFERENCES • 143

Kavehrad, M., and P.J. McLane. 1985. “Performance of Low-Complexity

Channel Coding and Diversity for Spread Spectrum in Indoor, Wireless
Communication.” AT&T Technical Journal 64, no. 8, pp. 1927–64.
Kavehrad, M., and P.J. McLane. 1987. “Spread Spectrum for Indoor Digital
Radio.” IEEE Communications Magazine 25, no. 6, pp. 32–40.
Kavehrad, M., and B. Ramamurthi. 1987. “Direct Sequence Spread
Spectrum with DPSK Modulation and Diversity for Indoor Wireless
Communications.” IEEE Transactions on Communications 35, no. 2,
pp. 224–36.
Khinchin, A.I. 1957. Mathematical Foundations of Information Theory.
New York: Dover Publications, Inc.
Kleinrock, L. 1975. Queueing Systems. New York: John Wiley & Sons.
Komo, J.J., and S.C. Liu. 1990. “Maximal Length Sequences for Frequency
Hopping.” IEEE Journal on Selected Areas in Communications 8, no. 5,
pp. 819–22.
Lee, W.C.Y. 1990. “Estimate of Channel Capacity in Rayleigh Fading
Environment.” IEEE Transactions on Vehicular Technology 39, no. 3,
pp. 187–89.
Lee, W.C.Y. 1991. “Overview of Cellular CDMA.” IEEE Transactions on
Vehicular Technology 40, no. 2, pp. 291–302.
Leon-Garcia, A. 1989. Probability and Random Processes for Electrical
Engineering. Reading, Massachusetts: Addison-Wesley Publishing Co.
Lipschutz, S. 1968. Teoria de Conjuntos. Rio de Janeiro, Brasil: Ao Livro
Técnico S.A.
MacKay, D.J.C. 2003. Information Theory, Inference, and Learning Algo-
rithms. Cambridge, UK: Cambridge University Press.
Mandell, M., and R. McEliece. 1991. Some Properties of Memory-less
Multiterminal Interference Channels. Proceedings of the IEEE Inter-
national Symposium Information Theory, p. 212. Budapest, Hungary:
IEEE.
Massey, J.L. 1990. The Relevance of Information Theory to Modern
Cryptography. Proceedings of the Bilkent International Conference on
New Trends in Communications, Control and Signal Processing
(NILCON’90), pp. 176–82. Ankara, Turkey: Elsevier Science Publisher.
Maurer, U.M. 1989. A Provably-Secure Strongly-Randomized Cipher. Pro-
ceedings of the Monte Verita Seminar on Future Directions in Cryptog-
raphy. Ascona, Switzerland.
Misser, H.S., C.A.F.J. Wijffels, and R. Prasad. 1991. “Throughput Analysis
of CDMA with DPSK Modulation and Diversity in Indoor Rician Fading
Radio Channels.” Electronics Letters 27, no. 7, pp. 601–603.
Nyquist, H. 1924. “Certain Factors Affecting Telegraph Speed.” Bell Sys-
tems Technical Journal, 3, p. 324.
144 • REFERENCES

Oberhettinger, F. 1990. Tables of Fourier Transforms and Fourier Trans-

forms of Distributions. Berlin, Germany: Springer-Verlag.
Ozarow, L.H., and A.D. Wyner. 1990. “On the Capacity of the Gaussian
Channel with a Finite Number of Input Levels.” IEEE Transactions on
Information Theory 36, no. 6, pp. 1426–28.
Papoulis, A. 1983. “Random Modulation: A Review.” IEEE Transactions
on Acoustics, Speech and Signal Processing 31, no. 1, pp. 96–105.
Pickholtz, R.L., D.L. Schilling, and L.B. Milstein. 1982. “Theory of
Spread–Spectrum Communications—A Tutorial.” IEEE Transactions
on Communications, COM 30, no. 5, pp. 855–84.
Pierce, J.R. 1980.An Introduction to InformationTheory—Symbols, Signals
& Noise. 2nd ed. New York: Dover Publications Inc.
Proakis, J.G. 1989. Digital Communications. McGraw-Hill.
Pursley, M.B. 1977. “Performance Evaluation for Phase-Coded Spread
Spectrum Multiple-Access Communications—Part I: System Analy-
sis.” IEEE Transactions on Communications 25, no. 8, pp. 795–99.
Reza, F.M. 1961. An Introduction to Information Theory. New York:
McGraw-Hill Book Co.
Ricciardi, L.M. 1990. Lectures in Applied Mathematics and Informatics.
Manchester, UK: Manchester University Press.
Sadowsky, J.S., and R.K. Bahr. 1991. “Direct-Sequence Spread-Spectrum
Multiple-Access Communications with Random Signature Sequences:
A Large Deviations Analysis.” IEEE Transactions on Information The-
ory 37, no. 3, pp. 514–27.
Sarwate, D.V. and M.B. Pursley. 1980. “Crosscorrelation Properties of
Pseudorandom and Related Sequences.” Proceedings of the IEEE 68,
no. 5, pp. 593–619.
Sayood, K. 2006. Introduction to Data Compression. San Francisco, CA:
Morgan Kaufmann.
Schilling, D.L., L.B. Milstein, R.L. Pickholtz, M. Kullback, and F. Miller.
1991. “Spread Spectrum for Commercial Communications.” IEEE Com-
munications Magazine 29, no. 4, pp. 66–79.
Scholtz, R.A. 1982. “The Origins of Spread-Spectrum Communications.”
IEEE Transactions on Communications, COM 30, no. 5, pp. 822–54.
Schwartz, M. 1970. Information Transmission, Modulation, and Noise.
New York: McGraw-Hill.
Shannon, C.E. 1948. “A Mathematical Theory of Communication.” The
Bell System Technical Journal 27, pp. 379–423.
Shannon, C.E. 1961. “Two-way Communications Channels.” Proceed-
ings of the 4th Berkeley Symposium on Mathematics, Statistics and
Probability, Vol. 1, pp. 611–44. Berkeley, CA: University of California
Press.
REFERENCES • 145

Shamai, S., L.H. Ozarow, and A.D. Wyner. 1991. “Information Rates for
a Discrete-Time Gaussian Channel with Intersymbol Interference and
Stationary Inputs.” IEEE Transactions on Information Theory 37, no. 6,
pp. 1527–39.
Simon, M.K., J.K. Omura, R.A. Scholtz, and B.K. Levitt. 1985. Vol. 1.
Spread Spectrum Communications. Rockville, MD: Computer Science
Press.
Sommer, R.C. 1966. “Asynchronously Multiplexed Channel Capacity.”
Proceeding of the IEEE 54, no. 1, pp. 79–80.
Soroushnejad, M., and E. Geraniotis. 1991. “Probability of Capture and
Rejection of Primary Multiple-Access Interference in Spread-
Spectrum Networks.” IEEE Transactions on Communications 39, no. 6,
pp. 986–94.
Sousa, E.S. 1989. “Throughput of Spread-Spectrum Systems with a Large
Number of Users.” IEE Proceedings 136, no. 3, pp. 220–26.
Sousa, E.S. 1990. “Interference Modeling in a Direct-Sequence Spread-
Spectrum Packet Radio Network.” IEEE Transactions on Communica-
tions 38, no. 9, pp. 1475–82.
Sousa, E.S. 1992. “Performance of a Spread Spectrum Packet Radio
Network Link in a Poisson Field of Interferers.” IEEE Transactions
on Information Theory 38, no. 6, pp. 1743–54.
Stallings, W. 1999. Cryptography and Network Security—Principles and
Practice. Upper Saddle River, NJ: Prentice Hall.
Tanenbaum, A.S. 2003. Computer Networks. Englewood Cliffs, NJ:
Prentice-Hall, PTR.
Turin, G.L. 1980. “Introduction to Spread-Spectrum Antimultipath Tech-
niques and Their Application to Urban Digital Radio.” Proceedings of
the IEEE 68, no. 3, pp. 328–53.
Turin, G.L. 1984a. “Commutation Signaling—An Antimultipath Tech-
nique.” IEEE Journal on Selected Areas in Communications 2, no. 4,
pp. 548–62.
Turin, G.L. 1984b. “The Effects of Multipath and Fading on the Perfor-
mance of Direct-Sequence CDMA Systems.” IEEE Journal on Selected
Areas in Communications 2, no. 4, pp. 597–603.
Van der Lubbe, J.C.A. 1997. Information Theory. Cambridge, UK: Cam-
bridge University Press.
Verdú, S. 1986. “Minimum Probability of Error for Asynchronous
Gaussian Multiple-Access Channels.” IEEE Transactions on Informa-
tion Theory 32, no. 1, pp. 85–96.
Verdú, S. 1989a. “Multiple-Access Channels with Memory with and with-
out Frame Synchronism.” IEEE Transactions on Information Theory 35,
no. 3, pp. 605–19.
146 • REFERENCES

Verdú, S. 1989b. “The Capacity Region of the Symbol Asynchronous Gaus-

sian Multiple-Access Channel.” IEEE Transactions on Information The-
ory 35, no. 4, pp. 733–51.
Viterbi, A.J. 1985. “When Not to Spread Spectrum—A Sequel.” IEEE
Communications Magazine 23, no. 4, pp. 12–17.
Viterbi, A.J. 1991. “Wireless Digital Communication: A View Based on
Three Lessons Learned.” IEEE Communications Magazine 29, no. 9,
pp. 33–36.
Wang, H.S. Finite-State Modeling, Capacity, and Joint Source/Channel
Coding for Time-Varying Channels. [PhD Thesis]. New Brunswick, NJ:
Graduate School – New Brunswick Rutgers, The State University of
New Jersey; 1992
Welch, L.R. 1974. “Lower Bounds on the Maximum Cross Correlation of
Signals”. IEEE Transactions on Information Theory, 20, no. 3,
pp. 397–99.
Wozencraft, J.M., and B. Reiffen. 1961. Sequential Decoding. Cambridge,
US: MIT Press.
Wyner, A.D. 1974. “Recent Results in the Shannon Theory”. IEEE
Transactions on Information Theory, 20, no. 1, pp. 2–9.
Yue, O.-C. 1983. “Spread Spectrum Mobile Radio, 1977–1982”. IEEE
Transactions on Vehicular Technology, 32, no. 1, pp. 98–105.
Zadeh, L.A. 1965. “Fuzzy Sets”. Information and Control, 8, no. 3,
pp. 338–53.
ABOUT THE AUTHOR

Marcelo Sampaio de Alencar was born in Serrita, Brazil in 1957. He

received his bachelor’s degree in Electrical Engineering from Federal Uni-
versity of Pernambuco (UFPE), Brazil in 1980; his master’s degree from
Federal Universty of Paraiba (UFPB), Brazil 1988; and his PhD from Uni-
versity of Waterloo, Canada in 1993. He is currently an emeritus member
of the Brazilian Telecommunications Society (SBrT), IEEE senior mem-
ber, chair professor at the Department of Electrical Engineering, Federal
University of Campina Grande, Brazil. He also worked for the State Uni-
versity of Santa Catarina (UDESC), Embratel, and University of Toronto,
as visiting professor.
He is founder and president of the Institute for Advanced Studies in
Communications (Iecom). He has been awarded several scholarships and
grants, including three scholarships and several research grants from the
Brazilian National Council for Scientiﬁc and Technological Research
(CNPq), two grants from the IEEE Foundation, a scholarship from the Uni-
versity of Waterloo, a scholarship from the Federal University of Paraiba,
an achievement award for contributions to the Brazilian Telecommunica-
tions Society (SBrT), an award from the Medicine College of the Federal
University of Campina Grande (UFCG), an achievement award from the
College of Engineering of the Federal University of Pernambuco, and the
Medal Professor Attilio Jose Giarola from the Brazilian Microwave and
Optoelectronics Society (SBMO). He has published over 350 engineering
and scientiﬁc papers and 15 books. He is a columnist of the traditional
Brazilian newspaper Jornal do Commercio, and vice president for external
relations, SBrT.

147
INDEX

A bound, 107
Absolutely secure event, 124 channel, 41
Algebra, 127, 129 sum, 94
Borel, 130 Cardinal
closure, 129 number, 130
Algebraic coding theory, 77 numbers, 125
Alzheimer, Aloysius, 32 Cardinality, 130
Appearance Carrier suppression, 80
equivocation, 121
CDMA, 72, 76, 87
Asymmetric key, 120
non-cooperative, 89
Axiom
performance analysis, 76,
choice, 126
90
Peano, 126
Channel
speciﬁcation, 126
binary symmetric, 44
Axioms, 126
Ayer, Alfred Jules, 1 communication, 34
discrete, 43
B independent output, 36
Belonging, 126 noiseless, 35
Binary noisy, 43
Huffman, 27 non-cooperative, 89
Bit, 17 Ciphertext
Bletchley Park, 117 entropy, 121
Borel Clausius, Rudolph, 31
algebra, 130
Closure, 129
Borel, Félix Edouard Juston
CMOS, 17
Émile, 130
Code
BSC, 44
chip rate, 88
C classiﬁcation, 19
Cantor, Georg, 125 comma, 21
Capacity decision tree, 14
149
150 • INDEX

decoding tree, 15 conditional, 34

Huffman, 27 cryptography, 120
instantaneous, 21, 23 cryptosystem, 122
prefix, 14, 16 extended source, 12
extended, 17 inequality, 37
uniquely decodable, 20 joint, 33, 39
Coder properties, 6
codewords, 11 relations, 37
source, 13 Error probability, 109
Codes Extension
block, 19 source, 12
non-singular, 19
Codewords, 11 F
Coding Fading, 72
efficiency, 12 Families, 128
source, 11 Fano, Robert, 27
Communications FDMA, 76
personal, 71 FH, 72
Computer network, 118 Fractals, 125
Cryptography, 72, 118 Frequency hopping, 72
history, 117 Fuzzy set, 2
information theory, 121
model, 119 G
principles, 120 Gold sequence, 82
uncertainty, 121
Cryptosystem H
absolute security, 124 Hartley, Ralph Vinton Lyon, 1,
information theory, 123 31
Hilbert, David, 125
D Huffman
Dedekind, J. W. R., 125 non-unicity, 29
Direct sequence, 72 procedure, 28
Disjoint, 126 Huffman, David, 27
DPSK, 87
DS, 72 I
DSSS, 78, 87 Inclusion, 126
Indexing, 129
E Inequality
Efficiency, 43 Kraft, 24
Empty, 126 Kraft-McMillan, 16
Enigma, 117 Information
Entropy, 3 average mutual, 39
INDEX • 151

channel, 34 M
entropy, 2, 3 MAI, 78
joint, 32 Matched ﬁlter, 91
measure of, 2 Maximal length
mutual, 38 linear sequence, 80
semantics, 1 sequence, 79
theory, 1 Maximum likelihood receiver,
Internet 91
backbone, 118 Measure, 126
protocol, 118 Lebesgue, 40
probability, 126
J Message
Jammer, 74 equivocation, 121
broad band noise, 74 uncertainty, 121
partial band, 74 Moments, 134
partial time, 74 Multipath, 72, 88
repeater, 74 Multiple access, 71
tone, 74 DSSS, 78
Jamming, 72 interference, 78
Joint entropy
cryptosystem, 122 N
Joint random Variables, 136 Non-cooperative channel, 89
Non-repudiation, 118
K Nyquist, Harry, 1, 31
Kasami
sequence, 84 P
Key Password, 118
entropy, 120 PCN, 87
equivocation, 121 Peano
Khinchin, Aleksandr axiom, 126
Yakovlevich, 2 Peirce, Charles Sanders, 1
Kolmogorov, Andrei Personal communications, 71
Nikolaevich, 31, 131 Photon, 17
Kraft Pierce, John Robinson, 1
inequality, 24 Plaintext
Kraft-McMillan entropy, 120
inequality, 16 Preﬁx
Kronecker, Leopold, 125 code, 14, 16
Probability, 126
L joint random variables, 136
Laser, 18 moments, 134
Lebesgue, Henri Léon, 40 random variables, 133
152 • INDEX

Probability theory, 133 Shannon

Pseudo-random, 80 Claude Elwood, 39, 41
Public key infrastructure, first theorem, 12
118 Shannon, Claude Elwood,
1, 31
R Signal
Rényi, Alfréd, 2 space, 74
Radiation Sniffing, 118
electromagnetic, 18 Source
Random variables, 133 efficiency, 12
Rate extended, 12
transmission, 43 Source coding
Redundancy, 12 theorem, 12
absolute, 42 Spectral
relative, 43
compression, 75
Spoofing, 118
S
Schröder-Bernstein Spread spectrum, 71
theorem, 126 m-sequence, 79
Sequence carrier suppression, 80
Gold, 82 direct sequence, 72
Kasami, 84 DSSS, 87
maximal length, 81 frequency hopping, 72
maximal-length, 79, 80 interference, 77
Mersenne, 81 performance Analysis, 76
Welsh bound, 82 pseudo-random, 80
Sequence design, 79 sequence design, 79
Session time hopping, 73
hijacking, 118 time-frequency hopping,
Set, 125 78
algebra, 127 Substitution, 120
disjoint, 126 Sum capacity, 94
families, 128 Symmetric key, 120
fuzzy, 2
infinite, 125 T
operations, 127 TCP/IP, 118
universal, 126 TDMA, 76
universal set, 125 TH, 73
Set theory, 125, 126 Theorem
Sets Schröder-Bernstein, 126
algebra, 129 Time hopping, 73
indexing, 129 Transfinite arithmetic, 125
INDEX • 153

Transformation Venn
cryptography, 119 diagram, 126
Transposition, 120 Venn diagram, 126
TTL, 17
Turing, Alan Mathison, 117 W
Welsh bound, 82
U
Universal, 126 Z
University of Halle, 125 Zadeh, Lotﬁ Asker, 2
Zenon, 125
V Zorn
Vectorial lemma, 126
space, 74 Zorn’s lemma, 126
FORTHCOMING TITLES IN OUR COMMUNICATIONS AND
SIGNAL PROCESSING COLLECTION
Orlando Baiocchi, University of Washington Tacoma, Editor

Signal Integrity: The Art of Interconnect Design For High Speed and
High Reliability Circuits by Joel Jorgenson

Cryptography Explained by Raj Katti

Momentum Press publishes several other collections, including: Industrial, Systems, and
Innovation Engineering; Manufacturing and Processes; Engineering Management; Electrical
Power; Fluid Mechanics; Acoustical Engineering; Aerospace Engineering; Biomedical
Engineering; and Healthcare Administration

Momentum Press is actively seeking collection editors as well as authors. For more
information about becoming an MP author or collection editor, please visit
http://www.momentumpress.net/contact

Announcing Digital Content Crafted by Librarians

Momentum Press offers digital content as authoritative treatments of advanced engineering top-
ics by leaders in their field. Hosted on ebrary, MP provides practitioners, researchers, faculty,
and students in engineering, science, and industry with innovative electronic content in sensors
and controls engineering, advanced energy engineering, manufacturing, and materials science.

Momentum Press offers library-friendly terms:

• perpetual access for a one-time fee

• no subscriptions or access fees required
• unlimited concurrent usage permitted
• downloadable PDFs provided
• free MARC records included
• free trials

The Momentum Press digital library is very affordable, with no obligation to buy in future years.

For more information, please visit www.momentumpress.net/library or to set up a trial in the US,
please contact [email protected].
EBOOKS Information Theory

ISBN: 978-1-60650-528-1

Kolman - Discrete Mathematical Structures 6e
100% (10)
Kolman - Discrete Mathematical Structures 6e
550 pages
Fundamentals of Real and Complex Analysis: Asuman Güven Aksoy
No ratings yet
Fundamentals of Real and Complex Analysis: Asuman Güven Aksoy
402 pages
Discrete Mathematical Structures-Kolman - Busby.ross
85% (34)
Discrete Mathematical Structures-Kolman - Busby.ross
544 pages
Alan P. Parkes - A Concise Introduction To Languages and Machines
100% (3)
Alan P. Parkes - A Concise Introduction To Languages and Machines
346 pages
The Theory of Computation
100% (21)
The Theory of Computation
471 pages
Mark Burgin-Theory of Information (2009)
100% (2)
Mark Burgin-Theory of Information (2009)
688 pages
Ronald Kneusel-Numbers and Computers-En
100% (1)
Ronald Kneusel-Numbers and Computers-En
237 pages
An Introduction To Set Theory and Topology PDF
100% (3)
An Introduction To Set Theory and Topology PDF
459 pages
Discrete Mathematics 6e PDF
100% (14)
Discrete Mathematics 6e PDF
639 pages
Calculus and Analysis in Euclidean Space-Jerry Shurman
100% (7)
Calculus and Analysis in Euclidean Space-Jerry Shurman
516 pages
Concepts of Modern Mathematics, by Ian Steward
90% (10)
Concepts of Modern Mathematics, by Ian Steward
372 pages
(Matthew A. Pons) - Real Analysis For The Undergraduate PDF
100% (11)
(Matthew A. Pons) - Real Analysis For The Undergraduate PDF
423 pages
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
From Everand
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
Sukhpreet Kaur Gill
No ratings yet
Mathematical Logic
From Everand
Mathematical Logic
Stephen Cole Kleene
4/5 (6)
An Introduction to Nonassociative Algebras
From Everand
An Introduction to Nonassociative Algebras
Richard D. Schafer
No ratings yet
Wiley - Interscience.elements - Of.information - Theory.jul.2006.ebook DDU
100% (3)
Wiley - Interscience.elements - Of.information - Theory.jul.2006.ebook DDU
774 pages
Information Theory and Coding by Example
89% (9)
Information Theory and Coding by Example
528 pages
Discrete Mathematics 7th Edition
100% (1)
Discrete Mathematics 7th Edition
785 pages
Category Theory For Dummies
80% (5)
Category Theory For Dummies
27 pages
Conceptual Mathematics A First Introduction To Category Theory
100% (3)
Conceptual Mathematics A First Introduction To Category Theory
376 pages
Bridge To Abstract Math - Mathematical Proof and Structures, 1st Ed. - R. Morash
100% (2)
Bridge To Abstract Math - Mathematical Proof and Structures, 1st Ed. - R. Morash
413 pages
Information Theory and Entropy
100% (1)
Information Theory and Entropy
111 pages
Shane Torbert - Applied Computer Science 2011
100% (2)
Shane Torbert - Applied Computer Science 2011
212 pages
Richard L. Bishop - Geometry of Manifolds
100% (3)
Richard L. Bishop - Geometry of Manifolds
287 pages
A Course On Set Theory
100% (9)
A Course On Set Theory
180 pages
Functional Analysis
100% (7)
Functional Analysis
209 pages
2018 Book MathematicalLogic
100% (9)
2018 Book MathematicalLogic
188 pages
Lambda-Calculus and Combinators
100% (5)
Lambda-Calculus and Combinators
359 pages
Crash Course Category Theory
67% (3)
Crash Course Category Theory
48 pages
Mathematics For Computer Science
100% (1)
Mathematics For Computer Science
339 pages
Elliott Mendelson - Foundations Logic, Language and Mathematics
100% (2)
Elliott Mendelson - Foundations Logic, Language and Mathematics
282 pages
Terrence Tao - Free Probability
No ratings yet
Terrence Tao - Free Probability
32 pages
(Applied Mathematics) Dirk Van Dalen - Logic and Structure 2nd Ed (1994)
100% (2)
(Applied Mathematics) Dirk Van Dalen - Logic and Structure 2nd Ed (1994)
111 pages
Structured Programming, Dahl, Dijkstra, Hoare, Academic Press 1972
100% (7)
Structured Programming, Dahl, Dijkstra, Hoare, Academic Press 1972
234 pages
Shannons Information Theory
100% (1)
Shannons Information Theory
28 pages
The Principles of Mathematics 1000097701
100% (2)
The Principles of Mathematics 1000097701
574 pages
E-Book Discrete Mathematics
100% (3)
E-Book Discrete Mathematics
414 pages
(Textbooks in Mathematics) Tony Barnard - Hugh Neill - Discovering Group Theory - A Transition To Advanced Mathematics-CRC Press (2017)
100% (6)
(Textbooks in Mathematics) Tony Barnard - Hugh Neill - Discovering Group Theory - A Transition To Advanced Mathematics-CRC Press (2017)
232 pages
Topics in Finite and Discrete Mathematics - Sheldon M. Ross
100% (1)
Topics in Finite and Discrete Mathematics - Sheldon M. Ross
279 pages
Awodey, Categories For Everybody - Ps
100% (3)
Awodey, Categories For Everybody - Ps
196 pages
New Ideas in Low Dimensional Topology
100% (1)
New Ideas in Low Dimensional Topology
541 pages
Topological Transformation Groups
From Everand
Topological Transformation Groups
Deane Montgomery
3/5 (1)
Jürgen Jost - Mathematical Concepts-Springer International Publishing (2015)
No ratings yet
Jürgen Jost - Mathematical Concepts-Springer International Publishing (2015)
315 pages
Gdels Theorems and Zermelos Axioms 1nbsped 9783030522780 9783030522797 Compress
100% (1)
Gdels Theorems and Zermelos Axioms 1nbsped 9783030522780 9783030522797 Compress
234 pages
978 3 0348 7917 0
100% (2)
978 3 0348 7917 0
463 pages
Concepts in Programming Languages PDF
100% (1)
Concepts in Programming Languages PDF
584 pages
Matlab Oop
No ratings yet
Matlab Oop
551 pages
E. Cheng & A. Lauda - Higher-Dimensional Categories. An Illustrated Guide Book
100% (3)
E. Cheng & A. Lauda - Higher-Dimensional Categories. An Illustrated Guide Book
182 pages
The Pillar of Computation Theory
100% (2)
The Pillar of Computation Theory
343 pages
Boolean Algebra and Its Applications
From Everand
Boolean Algebra and Its Applications
J. Eldon Whitesitt
4/5 (5)
Combinatorics of Finite Sets
From Everand
Combinatorics of Finite Sets
Ian Anderson
4.5/5 (1)
Theory of Computation
From Everand
Theory of Computation
IntroBooks Team
No ratings yet
Linear Mathematics: A Practical Approach
From Everand
Linear Mathematics: A Practical Approach
Patricia Clark Kenschaft
5/5 (1)
Geometric Algebra
From Everand
Geometric Algebra
Emil Artin
5/5 (1)
Introduction to Formal Languages
From Everand
Introduction to Formal Languages
György E. Révész
2/5 (1)
Rings of Continuous Functions
From Everand
Rings of Continuous Functions
Leonard Gillman
No ratings yet
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet
Algebraic Extensions of Fields
From Everand
Algebraic Extensions of Fields
Paul J. McCarthy
No ratings yet
Applied Functional Analysis
From Everand
Applied Functional Analysis
D.H. Griffel
No ratings yet
Integration, Measure and Probability
From Everand
Integration, Measure and Probability
H. R. Pitt
No ratings yet
Information Theory &: Coding (18EC54)
No ratings yet
Information Theory &: Coding (18EC54)
8 pages
Chapter 2 - 1: Streams and File I/O
No ratings yet
Chapter 2 - 1: Streams and File I/O
8 pages
Reference: Chapter 1 Fred Halsall
100% (1)
Reference: Chapter 1 Fred Halsall
2 pages
Spaces That Aren't Spaces - R - Excel
No ratings yet
Spaces That Aren't Spaces - R - Excel
6 pages
Oralcomm11 q1 Mod1of7 Natureprocessandfunctionsofcommunication v2
No ratings yet
Oralcomm11 q1 Mod1of7 Natureprocessandfunctionsofcommunication v2
25 pages
MGT 162 - Chapter 8 - Communication
100% (1)
MGT 162 - Chapter 8 - Communication
12 pages
Lesson 2 Purposive Communication
No ratings yet
Lesson 2 Purposive Communication
24 pages
To Solve Syllogism Questions With The Help of Rules
No ratings yet
To Solve Syllogism Questions With The Help of Rules
11 pages
ASCII Codes PDF
No ratings yet
ASCII Codes PDF
16 pages
Reasoning
100% (1)
Reasoning
79 pages
Errors in Translation and How To Avoid Them, Jasmina Djordjevic
No ratings yet
Errors in Translation and How To Avoid Them, Jasmina Djordjevic
16 pages
Communication Skills Notes
100% (2)
Communication Skills Notes
122 pages
ROUND1
No ratings yet
ROUND1
45 pages
Coding in Communication System: Channel Coding) Will Be Addressed
No ratings yet
Coding in Communication System: Channel Coding) Will Be Addressed
5 pages
Source Coding Ompression
No ratings yet
Source Coding Ompression
34 pages
Chapter 2 - Edited
No ratings yet
Chapter 2 - Edited
45 pages
Unit - VI Error Control Coding: Objectives
No ratings yet
Unit - VI Error Control Coding: Objectives
31 pages
DLP - MIL (Codes & Conventions)
100% (2)
DLP - MIL (Codes & Conventions)
6 pages
An Efficient Bar QR Code Recognition System For Consumer Service Applications
No ratings yet
An Efficient Bar QR Code Recognition System For Consumer Service Applications
5 pages
RC RX2 TX2
No ratings yet
RC RX2 TX2
12 pages
Sridevi2021 PDF
No ratings yet
Sridevi2021 PDF
10 pages
SEMINAR On LDPC Code For Ccsds Standard
No ratings yet
SEMINAR On LDPC Code For Ccsds Standard
32 pages
Coding
No ratings yet
Coding
12 pages
2022PHDDECMA0187
No ratings yet
2022PHDDECMA0187
14 pages
(Solved) (Vanilla RNN (Recurrent Neural Networks) For Parity Function) Let... - Course Hero
No ratings yet
(Solved) (Vanilla RNN (Recurrent Neural Networks) For Parity Function) Let... - Course Hero
4 pages
A Review of Design of Binary Golay Code and Extended Binary Golay Code For Error Correction
No ratings yet
A Review of Design of Binary Golay Code and Extended Binary Golay Code For Error Correction
5 pages
13 Python Programming
No ratings yet
13 Python Programming
9 pages
Sample 19714
No ratings yet
Sample 19714
16 pages
Cloud Data Security: University College of Engineering, RTU, Kota (Rajasthan)
No ratings yet
Cloud Data Security: University College of Engineering, RTU, Kota (Rajasthan)
26 pages
Design of (15, 11) Hamming Code Encoding and Decoding System Based On FPGA
No ratings yet
Design of (15, 11) Hamming Code Encoding and Decoding System Based On FPGA
4 pages