LVV::ARRAY

The Wayback Machine - https://web.archive.org/web/20181130124019/http://volnitsky.com/project/array/

Class lvv::array is faster and vector operation capable drop-in replacement for boost::array with specialisation for x86/x86_64 architecture. The boost::array promoted to tr1:: in GCC-4.4 and became std:: in C++0x.

lvv::array is plain C-array wrapped in class. Wrapping adds zero memory and speed overhead. As C-arrays the lvv::array is static-sized, meaning "known at compile time size".

Quick Start

using lvv::array;

                // These are plain C++, not a C++0x constructors.
                // Second set of curly braces is optional, but
                // some compilers issues warning if single set of braces is used.
array<float,3>  A = {{1., 2., 3.}};
array<float,3>  B;
array<float,3>  C = {{10., 20., 30.}};
array<float,3>  RES;

B   =  1.0;     // all elements are assigned `1.0f`
RES =  A+C;     // vector op
RES +=  B;      // vector op

                // you can send an array to iostream
cout  <<  "vector   A :   "  <<  A        << endl;
cout  <<  "vector   B :   "  <<  B        << endl;
cout  <<  "vector   C :   "  <<  C        << endl;
cout  <<  "vector RES :   "  <<  RES      << endl;
cout  <<  "dot product:   "  <<  dot(A,B) << endl;

Output:

vector   A :   1  2  3
vector   B :   1  1  1
vector   C :   10  20  30
vector RES :   12  23  34
dot product:   6

Speed

Some operation were substantially accelerated (explicitly specialized) using meta-programing, explicit vectorisation (with SSE), parallel programming (with OpenMP), out of order optimization, and some inline assembly. Below are some benchmarks on two-core, 2200Mhz Core2 Duo CPU:

array::sum()

title('SUM - benchmark  lvv::array.sum()')
xlabel('CPU ticks per array element')
benchmark(c[0], c[1])
_________________________________________
'plain for-loop, double      ', 3.14
'plain for-loop, float       ', 3.06
'std::accumulate<float>()    ', 3.06
'lvv::array.sum()            ', 1.74

Tick refer to CPU tick and is about 0.45 nano seconds. Sum was done for 100,000,000 float-s with values {1.f, 2.f, 1.f, 2.f, 1.f, 2.f …}. Same benchmark is in below table:

Method	Ticks per element	Computed Value	Source
plain for-loop, double	3.14	1.5e+08	`double sum=0; for (int i=0; i<N; i++) sum += A[i];`
plain for-loop, float	3.06	3.35e+07	`float sum=0; for (int i=0; i<N; i++) sum += A[i];`
std::accumulate<float>()	3.06	3.35e+07	`float sum = accumulate(A.begin(), A.end(), 0.f));`
lvv::array	1.74	1.5e+08	`float sum = A.sum();`

SSE method is selected (through meta-programming) if no summation method explicitly specified (if CPU supports SSE). Note that float plain-for-loop and std::accumulate methods have incorrect computed values due to rounding error.

array::max()

title('MAX -- lvv::array.max() benchmark')
xlabel('CPU ticks per array element')
benchmark(c[0], c[1])
_________________________________________
'plain for-loop         ', 5.81
'std::max_element()     ', 5.81
'lvv::array.sum()       ', 1.63

Maximum search was done on 100,000,000 float-s

Method	Ticks per element	Source
plain for-loop	5.81	`float max=0; for (size_t i=0; i<N; i++) if (A[i] > max) max = A[i];`
std::max_element()	5.81	`float max = *std::max_element (A.begin(), A.end());`
lvv::array	1.63	`float max = A.max()`

So far I implemented only combinations needed for my work, so it is quite incomplete. If there is no a type specialization then generic implementation is used.

Table 1. Implemented optimized specialisation
Type	sum	max	V1 OP= V2	V1 OP V2
generic	std::	std::	for-loop	for-loop
float	sse	sse	generic	generic
double	generic	generic	generic	generic
long double	generic	generic	generic	generic
int8_t	generic	generic	generic	generic
int16_t	generic	sse2	generic	generic
int32_t	generic	generic	generic	generic
int64_t	generic	generic	generic	generic
uint8_t	generic	generic	generic	generic
uint16_t	generic	generic	generic	generic
uint32_t	generic	generic	generic	generic
uint64_t	generic	generic	generic	generic

Though I’ve targeted only x86-64, some optimized specialisation (out-of-order, meta-programming, OpenMP) are platform independent. Appropriate specialisation selected automatically (but can be specified explicitly) based on CPU capabilities, array size and array element type.

Other lvv::array capabilities

Index of first element defaults to 0, but can be any number (third template parameter).
Index value for opertator[] tested if it is in valid range when NDEBUG macro is not defined (not optimized compile).
basic linear algebra functions: norm2(A), distance_norm2(A1,A2), dot(A1,A2), etc

See also sample use in test files t-*.cc and unit test u-array.cc.

How to submit patch

You can just email patch to [email protected]. All feedback is much appreciated.

If you are on Github it is even easier. See github patch-submit HOWTOs: ( 1, 2, 3 ) .

There is no hard set style rules.

Oct	NOV	Dec
	30
2017	2018	2019