0% found this document useful (0 votes)
21 views

WBMT2049-T2/WI2032TH - Numerical Analysis For ODE's

This document discusses numerical analysis and computations in finite precision. It covers floating point numbers and their binary representation, storing integers and fractions, and approximation errors due to rounding in finite precision calculations. The key topics are floating point representation and rounding errors inherent in numerical computations on digital computers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

WBMT2049-T2/WI2032TH - Numerical Analysis For ODE's

This document discusses numerical analysis and computations in finite precision. It covers floating point numbers and their binary representation, storing integers and fractions, and approximation errors due to rounding in finite precision calculations. The key topics are floating point representation and rounding errors inherent in numerical computations on digital computers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

WBMT2049-T2/WI2032TH – Numerical Analysis for ODE’s

Lecture 1

course structure
numerical computations in finite precision
round-off errors

Lecture 1 1 / 30
Course Structure

Lecture 1 2 / 30
WBMT2049-T2/WI2032TH

Lectures: 2 times a week, 14 lectures, 2 reserve lectures.


Information, announcements and materials for WBMT2049-T2 and
WI2032TH will be posted on Brightspace.

Lecture 1 3 / 30
Course materials

1 Study Guide: ”Werkschema-2024.pdf” – on Brightspace


2 Book: C. Vuik, F.J. Vermolen, M.B. van Gijzen, M.J. Vuik,
Numerical methods for ordinary differential equations, Delft Academic
Press/VSSD, 2016, reprint 2018 – link to free downloadable copy on
Brightspace
3 Calculus book: J. Stewart, Calculus: Early Transcendentals
4 Slides: Brightspace
5 Recorded lectures: Brightspace (old)

Lecture 1 4 / 30
Studying and exam

1 Lectures: either attend, watch recordings, or study the slides; ask


questions; try to solve exam problems until you can do it without
peeking; memorize important concepts (see Lecture 13, 14)
2 MUMIE online platform: theory questions, example exercises with
solutions, exercises with deadlines, bonus for the primary exam.
Information on Brightspace, all questions – MUMIE support team
([email protected])
3 Practicum: all questions – coordinator Dennis den Ouden - van der
Horst ([email protected]), information on
Brightspace
4 Written exam, bring your own laptop, Grasple, exam-type exercises
in Lectures 13 and 14, previous exams with solutions on Brightspace.

Lecture 1 5 / 30
What is Numerical Mathematics?

Lecture 1 6 / 30
By way of definition
Numerical mathematics gives numerical answers to complex mathematical
questions. Numerical analysis studies numerical approximations (as
opposed to symbolic manipulations) for the problems of mathematical
analysis.

Lecture 1 7 / 30
Computing in finite precision

Lecture 1 8 / 30
We know from elementary algebra that

4 = 2, because 22 = 4

Computer (e.g. Python and Numpy) tells us:

In [1]: import numpy as np

In [2]: np.sqrt(4)
Out[2]: 2.0

We do not know what the answer is to:



3 =?

Computer says:

In [3]: np.sqrt(3)
Out[3]: 1.7320508075688772

Lecture 1 9 / 30
Before we believe it, let’s ask computer some simple question:

In [4]: 1.7320508075688772**2
Out[4]: 2.9999999999999996

How about this:


In [5]: 3.0 == 2.9999999999999996
Out[5]: False

In [6]: 3.0 == 2.9999999999999999


Out[6]: True

Computer gives us wrong answers and it knows it!

Lecture 1 10 / 30
How big is the error?

In [7]: 3-2.9999999999999996
Out[7]: 4.440892098500626e-16

The magic number (machine precision, machine epsilon):

In [8]: np.finfo(float).eps
Out[8]: 2.220446049250313e-16

In [9]: 2*np.finfo(float).eps
Out[9]: 4.440892098500626e-16

Lecture 1 11 / 30
floating-point numbers and binary representation

Lecture 1 12 / 30
floating-point numbers
The floating-point representation of the real number x ∈ R in a general
number system is defined as

flβ (x) = ±d0 β E + d1 β E −1 + d2 β E −2 + · · · + dp−1 β E −(p−1)


= ±(d0 .d1 d2 . . . dp−1 )β × β E ,

where:
β is the base
E is the exponent
(d0 .d1 d2 . . . dp−1 )β is the mantissa of x in base β
0 ≤ di ≤ β − 1 are integers (digits)
p is the precision
Questions: what is the value of β and the possible values of di in the
binary system? Why computers use binary system?

Lecture 1 13 / 30
storing integers
Let J be an integer. To store it in the p = 53 precision (as 53-bit binary),
we need to find d0 , . . . , d52 ∈ {0, 1} and E such that:

J = ±d0 2E + d1 2E −1 + d1 2E −2 + · · · + d52 2E −52

All integers |J| < 253 can be stored exactly. The largest in this
sequence is:

J = 1 · 252 + 1 · 251 + · · · + 1 · 20 = 253 − 1.

Some integers |J| ≥ 253 can also be stored exactly, e.g.

J = 253 = 1 · 253 + 0 · 252 + · · · + 0 · 21 .

Many integers |J| ≥ 253 can not be stored exactly, e.g.

J = 253 + 1 ∼ 1 · 253 + 0 · 252 + · · · + 1 · 21 = 253 + 2.

Lecture 1 14 / 30
storing rational numbers (fractions)
Consider the number x = 1/10. In the decimal system with β = 10 and
E = 0 this number can be stored with a very short mantissa (p = 2):

(1/10)β=10 = 0 × 100 + 1 × 100−1 + 0 × 100−2 + · · · = 0.1 × 100

However, in the binary system with β = 2 and E = 0:

(1/10)β=2 = 0.0001100110011001100110011 · · · × 20

it has an infinite mantissa, i.e. needs infinite precision (p = ∞).


Therefore, in computers, where β = 2 and p = 53, numbers like 0.1 are
often rounded-off and only an approximation of 0.1 is stored:
In [10]: from decimal import Decimal

In [11]: Decimal.from float(0.1)

Out[11]: Decimal(’0.1000000000000000055511151231257827021181583404541015625’)

Lecture 1 15 / 30
round-off in IEEE-754

In the IEEE-754 standard of the floating point arithmetic, β = 2 and there


is a choice of precisions:
single precision, p = 24
double precision, p = 53
The double precision floating-point representation of x = 1/10 is an
approximation of x by a binary number with p = 53. It is obtained by
looking for the integer N and a 53-bit integer J, such that
1 J
∼ N , 252 ≤ J ≤ 253 − 1
10 2
2N
252 ≤ ≤ 253 − 1
10

Lecture 1 16 / 30
round-off in IEEE-754

2N
252 ≤ ≤ 253 − 1
10
Looking for N:

In [12]: 2**52 <= np.int(2**55/10) <= 2**53-1


Out[12]: False
In [13]: 2**52 <= np.int(2**56/10) <= 2**53-1
Out[13]: True
In [14]: 2**52 <= np.int(2**57/10) <= 2**53-1
Out[14]: False
In [15]: np.int(2**56/10)
Out[15]: 7205759403792794

Lecture 1 17 / 30
round-off in IEEE-754

Hence, N = 56 and J = 7205759403792794:



1 1 J 7205759403792794
∼ fl = N =
10 10 2 256
≈ 0.1000000000000000055511151231257827021181583404541015625

The rounded result 0.10000000000000001 is correct in the first 16 digits of


the mantissa.

Lecture 1 18 / 30
finite-precision approximation errors

Lecture 1 19 / 30
approximation errors

Let the actual number be

x = d0 β E −0 + d1 β E −1 + · · · + dp−1 β E −(p−1) + dp β E −p + . . .

and its finite-precision approximation

fl(x) = d0 β E −0 + d1 β E −1 + · · · + dp−1 β E −(p−1)

The relative approximation error is

fl(x) − x dp β −p + dp+1 β −p−1 + . . .


=
x d0 + d1 β −1 + · · · + dp−1 β −p+1 + dp β −p + dp+1 β −p−1 + . . .
dp + dp+1 β −1 + . . .
= β −p
d0 + d1 β −1 + · · · + dp−1 β −p+1 + dp β −p + dp+1 β −p−1 + . . .

Lecture 1 20 / 30
approximation errors
If dp ≥ 1, then:

dp ≥ dp+1 β −1 + dp+1 β −2 + . . .

This is because:

dp+1 β −1 + dp+1 β −2 + · · · ≤ (β − 1)β −1 + (β − 1)β −2 + . . .



X 1 β−1
≤ (β − 1) n
= = 1.
β β−1
n=1

Without loss of generality, we assume that d0 , dp ≥ 1. Then

fl(x) − x dp + dp+1 β −1 + . . .
= β −p
x d0 + d1 β −1 + · · · + dp−1 β −p+1 + dp β −p + dp+1 β −p−1 + . . .
2dp
≤ β −p ≤ 2−(p−1) ,
d0
where in the last step we have assumed that β = 2 and d0 = 1, dp = 1.
Lecture 1 21 / 30
approximation errors and machine epsilon

Absolute Approximation Error = fl(x) − x = xϵ


fl(x) − x
Relative Approximation Error = = ϵ, |ϵ| ≤ ϵm
x
fl(x) = x(1 + ϵ)

For example,

|0.1 − fl(0.1)| 1 7205759403792794


|ϵ| = = 10 −
|0.1| 10 256
≤ |1 − 1.00000000000000006| = 0.00000000000000006
1
= 0.6 × 10−16 ≤ ϵm = 52 = 2.220446049250313e-16
2
Sometimes, a sharper upper bound |ϵ| ≤ ϵm /2 is used.

Lecture 1 22 / 30
finite-precision floating-point arithmetic

Lecture 1 23 / 30
floating point arithmetic

Arithemtic operations, x + y , x − y , xy , x/y , will have :


data error, due to the floating point representation of the input data
x ≈ fl(x), y ≈ fl(y )
conversion error, due to the conversion of the result to the floating
point number, e.g., x + y ≈ fl(x + y ). The operations themselves are
usually carried out exactly and do not introduce any additional errors.
Hence, fl(x + y ) = fl(fl(x) + fl(y )).
connection between fl(x) and x is: fl(x) = x(1 + ϵ)

Lecture 1 24 / 30
Example: computing x 2

Example: what is the error when we compute x 2 in floating-point


arithmetic?

fl(x 2 ) = fl(fl(x) fl(x)) = fl(x(1 + ϵd ) x(1 + ϵd )) = x 2 (1 + ϵd )2 (1 + ϵc )


= x 2 (1 + 2ϵd + ϵc + ϵ2d + 2ϵd ϵc + ϵ2d ϵc ) ≈ x 2 (1 + 2ϵd + ϵc )

For √x = 3, the absolute error is
|fl(( 3)2 ) − fl(3)| ≈ |3(1 + 2ϵd + ϵc ) − 3| = 2ϵd + ϵc ≤ 3ϵm
In [16]: 3-np.sqrt(3)**2
Out[16]: 4.440892098500626e-16

Turns out to be exactly 2ϵm

Lecture 1 25 / 30
propagation and amplification of errors: addition
Suppose we want to compute the sum:
n
X
xi xi ∈ R.
i=1

In floating-point arithmetic we get:

fl(x1 + x2 ) = fl(fl(x1 ) + fl(x2 )) = (x1 (1 + ϵ1 ) + x2 (1 + ϵ2 ))(1 + ϵc2 )


= x1 + x2 + x1 (ϵ1 + ϵc2 ) + x2 (ϵ2 + ϵc2 ) + O(ϵ2 )
fl(x1 + x2 + x3 ) = fl(fl(x1 + x2 ) + x3 ) = fl(fl(x1 + x2 ) + fl(x3 ))
= x1 + x2 + x3 + x1 (ϵ1 + ϵc2 + ϵc3 )
+ x2 (ϵ2 + ϵc2 + ϵc3 ) + x3 (ϵ3 + ϵc3 ) + O(ϵ2 )
...

Lecture 1 26 / 30
propagation and amplification of errors: addition

n n n n n
!
X X X X X
fl xi − xi = xi ϵi + x1 ϵcj + x2 ϵcj
i=1 i=1 i=1 j=2 j=2
n
X Xn
+ x3 ϵcj + x4 ϵcj + · · · + xn ϵcn + O(ϵ2 )
j=3 j=4
n n
!
X X 1
fl xi − xi ≤ n max |xi |ϵm + n(n + 1) max |xi |ϵm + O(ϵ2 )
i 2 i
i=1 i=1

1 2 3
= n + n max |xi |ϵm + O(ϵ2 ).
2 2 i

This is an upper bound - worst case scenario. The actual error may be
smaller. With the usual rounding strategy we can use ϵm /2 instead of ϵm ,
which gives a tighter bound.

Lecture 1 27 / 30
amplification of errors in addition: example
Let xi = 0.1 and n = 106 :
10 6
X
0.1 = (0.1)106 = 105
i=1

In [1]: res = 0
In [2]: for ii in range(int(1e6)):
...: res = res + 0.1

In [3]: res
Out[3]: 100000.00000133288 # this is not equal to 100000

The upper bound is



1 6 2 3 6 ϵm
(10 ) + 10 0.1 , ϵm ≈ 2.22 × 10−16
2 2 2

In [4]: (0.5*1e6**2+1.5*1e6)*0.1*0.5*2.22e-16
Out[4]: 5.55001665e-06 # (tight) upper bound
In [4]: res-1e5
Out[4]: 1.3328826753422618e-06 # actual error

Lecture 1 28 / 30
numerically stable algorithms

There are ways to minimize error amplification and accumulation.

In [1]: x = np.ones(np.int(1e6))
In [2]: x = x*0.1
In [3]: x
Out[3]: array([0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1])
In [4]: np.sum(x)
Out[4]: 99999.9999999998
In [5]: np.sum(x)-1e5
Out[5]: -2.0372681319713593e-10

The error is much smaller than what we get with the for loop summation!

Lecture 1 29 / 30
Question

The numpy.sum() function uses the algorithm with pairwise sums of


elements. Show that the pairwise summation

fl(fl(x1 + x2 ) + fl(x3 + x4 ))

results in a smaller floating point error then the sequential addition

fl(fl(fl(x1 + x2 ) + x3 ) + x4 )

Lecture 1 30 / 30

You might also like