WBMT2049-T2/WI2032TH - Numerical Analysis For ODE's
WBMT2049-T2/WI2032TH - Numerical Analysis For ODE's
Lecture 1
course structure
numerical computations in finite precision
round-off errors
Lecture 1 1 / 30
Course Structure
Lecture 1 2 / 30
WBMT2049-T2/WI2032TH
Lecture 1 3 / 30
Course materials
Lecture 1 4 / 30
Studying and exam
Lecture 1 5 / 30
What is Numerical Mathematics?
Lecture 1 6 / 30
By way of definition
Numerical mathematics gives numerical answers to complex mathematical
questions. Numerical analysis studies numerical approximations (as
opposed to symbolic manipulations) for the problems of mathematical
analysis.
Lecture 1 7 / 30
Computing in finite precision
Lecture 1 8 / 30
We know from elementary algebra that
√
4 = 2, because 22 = 4
In [2]: np.sqrt(4)
Out[2]: 2.0
Computer says:
In [3]: np.sqrt(3)
Out[3]: 1.7320508075688772
Lecture 1 9 / 30
Before we believe it, let’s ask computer some simple question:
In [4]: 1.7320508075688772**2
Out[4]: 2.9999999999999996
Lecture 1 10 / 30
How big is the error?
In [7]: 3-2.9999999999999996
Out[7]: 4.440892098500626e-16
In [8]: np.finfo(float).eps
Out[8]: 2.220446049250313e-16
In [9]: 2*np.finfo(float).eps
Out[9]: 4.440892098500626e-16
Lecture 1 11 / 30
floating-point numbers and binary representation
Lecture 1 12 / 30
floating-point numbers
The floating-point representation of the real number x ∈ R in a general
number system is defined as
where:
β is the base
E is the exponent
(d0 .d1 d2 . . . dp−1 )β is the mantissa of x in base β
0 ≤ di ≤ β − 1 are integers (digits)
p is the precision
Questions: what is the value of β and the possible values of di in the
binary system? Why computers use binary system?
Lecture 1 13 / 30
storing integers
Let J be an integer. To store it in the p = 53 precision (as 53-bit binary),
we need to find d0 , . . . , d52 ∈ {0, 1} and E such that:
All integers |J| < 253 can be stored exactly. The largest in this
sequence is:
Lecture 1 14 / 30
storing rational numbers (fractions)
Consider the number x = 1/10. In the decimal system with β = 10 and
E = 0 this number can be stored with a very short mantissa (p = 2):
(1/10)β=2 = 0.0001100110011001100110011 · · · × 20
Out[11]: Decimal(’0.1000000000000000055511151231257827021181583404541015625’)
Lecture 1 15 / 30
round-off in IEEE-754
Lecture 1 16 / 30
round-off in IEEE-754
2N
252 ≤ ≤ 253 − 1
10
Looking for N:
Lecture 1 17 / 30
round-off in IEEE-754
Lecture 1 18 / 30
finite-precision approximation errors
Lecture 1 19 / 30
approximation errors
x = d0 β E −0 + d1 β E −1 + · · · + dp−1 β E −(p−1) + dp β E −p + . . .
Lecture 1 20 / 30
approximation errors
If dp ≥ 1, then:
dp ≥ dp+1 β −1 + dp+1 β −2 + . . .
This is because:
fl(x) − x dp + dp+1 β −1 + . . .
= β −p
x d0 + d1 β −1 + · · · + dp−1 β −p+1 + dp β −p + dp+1 β −p−1 + . . .
2dp
≤ β −p ≤ 2−(p−1) ,
d0
where in the last step we have assumed that β = 2 and d0 = 1, dp = 1.
Lecture 1 21 / 30
approximation errors and machine epsilon
For example,
Lecture 1 22 / 30
finite-precision floating-point arithmetic
Lecture 1 23 / 30
floating point arithmetic
Lecture 1 24 / 30
Example: computing x 2
Lecture 1 25 / 30
propagation and amplification of errors: addition
Suppose we want to compute the sum:
n
X
xi xi ∈ R.
i=1
Lecture 1 26 / 30
propagation and amplification of errors: addition
n n n n n
!
X X X X X
fl xi − xi = xi ϵi + x1 ϵcj + x2 ϵcj
i=1 i=1 i=1 j=2 j=2
n
X Xn
+ x3 ϵcj + x4 ϵcj + · · · + xn ϵcn + O(ϵ2 )
j=3 j=4
n n
!
X X 1
fl xi − xi ≤ n max |xi |ϵm + n(n + 1) max |xi |ϵm + O(ϵ2 )
i 2 i
i=1 i=1
1 2 3
= n + n max |xi |ϵm + O(ϵ2 ).
2 2 i
This is an upper bound - worst case scenario. The actual error may be
smaller. With the usual rounding strategy we can use ϵm /2 instead of ϵm ,
which gives a tighter bound.
Lecture 1 27 / 30
amplification of errors in addition: example
Let xi = 0.1 and n = 106 :
10 6
X
0.1 = (0.1)106 = 105
i=1
In [1]: res = 0
In [2]: for ii in range(int(1e6)):
...: res = res + 0.1
In [3]: res
Out[3]: 100000.00000133288 # this is not equal to 100000
In [4]: (0.5*1e6**2+1.5*1e6)*0.1*0.5*2.22e-16
Out[4]: 5.55001665e-06 # (tight) upper bound
In [4]: res-1e5
Out[4]: 1.3328826753422618e-06 # actual error
Lecture 1 28 / 30
numerically stable algorithms
In [1]: x = np.ones(np.int(1e6))
In [2]: x = x*0.1
In [3]: x
Out[3]: array([0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1])
In [4]: np.sum(x)
Out[4]: 99999.9999999998
In [5]: np.sum(x)-1e5
Out[5]: -2.0372681319713593e-10
The error is much smaller than what we get with the for loop summation!
Lecture 1 29 / 30
Question
fl(fl(x1 + x2 ) + fl(x3 + x4 ))
fl(fl(fl(x1 + x2 ) + x3 ) + x4 )
Lecture 1 30 / 30