The Wayback Machine - https://web.archive.org/web/20181130124019/http://volnitsky.com/project/array/

Class lvv::array is faster and vector operation capable drop-in replacement for boost::array with specialisation for x86/x86_64 architecture. The boost::array promoted to tr1:: in GCC-4.4 and became std:: in C++0x.

lvv::array is plain C-array wrapped in class. Wrapping adds zero memory and speed overhead. As C-arrays the lvv::array is static-sized, meaning "known at compile time size".

Quick Start

using lvv::array;

                // These are plain C++, not a C++0x constructors.
                // Second set of curly braces is optional, but
                // some compilers issues warning if single set of braces is used.
array<float,3>  A = {{1., 2., 3.}};
array<float,3>  B;
array<float,3>  C = {{10., 20., 30.}};
array<float,3>  RES;

B   =  1.0;     // all elements are assigned `1.0f`
RES =  A+C;     // vector op
RES +=  B;      // vector op

                // you can send an array to iostream
cout  <<  "vector   A :   "  <<  A        << endl;
cout  <<  "vector   B :   "  <<  B        << endl;
cout  <<  "vector   C :   "  <<  C        << endl;
cout  <<  "vector RES :   "  <<  RES      << endl;
cout  <<  "dot product:   "  <<  dot(A,B) << endl;

Output:

vector   A :   1  2  3
vector   B :   1  1  1
vector   C :   10  20  30
vector RES :   12  23  34
dot product:   6

Speed

Some operation were substantially accelerated (explicitly specialized) using meta-programing, explicit vectorisation (with SSE), parallel programming (with OpenMP), out of order optimization, and some inline assembly. Below are some benchmarks on two-core, 2200Mhz Core2 Duo CPU:

array::sum()

title('SUM - benchmark  lvv::array.sum()')
xlabel('CPU ticks per array element')
benchmark(c[0], c[1])
_________________________________________
'plain for-loop, double      ', 3.14
'plain for-loop, float       ', 3.06
'std::accumulate<float>()    ', 3.06
'lvv::array.sum()            ', 1.74

Tick refer to CPU tick and is about 0.45 nano seconds. Sum was done for 100,000,000 float-s with values {1.f, 2.f, 1.f, 2.f, 1.f, 2.f …}. Same benchmark is in below table:

Method Ticks per element Computed Value Source

plain for-loop, double

3.14

1.5e+08

double sum=0; for (int i=0; i<N; i++) sum += A[i];

plain for-loop, float

3.06

3.35e+07

float sum=0; for (int i=0; i<N; i++) sum += A[i];

std::accumulate<float>()

3.06

3.35e+07

float sum = accumulate(A.begin(), A.end(), 0.f));

lvv::array

1.74

1.5e+08

float sum = A.sum();

SSE method is selected (through meta-programming) if no summation method explicitly specified (if CPU supports SSE). Note that float plain-for-loop and std::accumulate methods have incorrect computed values due to rounding error.

array::max()

title('MAX -- lvv::array.max() benchmark')
xlabel('CPU ticks per array element')
benchmark(c[0], c[1])
_________________________________________
'plain for-loop         ', 5.81
'std::max_element()     ', 5.81
'lvv::array.sum()       ', 1.63

Maximum search was done on 100,000,000 float-s

Method Ticks per element Source

plain for-loop

5.81

float max=0; for (size_t i=0; i<N; i++) if (A[i] > max) max = A[i];

std::max_element()

5.81

float max = *std::max_element (A.begin(), A.end());

lvv::array

1.63

float max = A.max()

So far I implemented only combinations needed for my work, so it is quite incomplete. If there is no a type specialization then generic implementation is used.

Table 1. Implemented optimized specialisation
Type sum max V1 OP= V2 V1 OP V2

generic

std::

std::

for-loop

for-loop

float

sse

sse

generic

generic

double

generic

generic

generic

generic

long double

generic

generic

generic

generic

int8_t

generic

generic

generic

generic

int16_t

generic

sse2

generic

generic

int32_t

generic

generic

generic

generic

int64_t

generic

generic

generic

generic

uint8_t

generic

generic

generic

generic

uint16_t

generic

generic

generic

generic

uint32_t

generic

generic

generic

generic

uint64_t

generic

generic

generic

generic

Though I’ve targeted only x86-64, some optimized specialisation (out-of-order, meta-programming, OpenMP) are platform independent. Appropriate specialisation selected automatically (but can be specified explicitly) based on CPU capabilities, array size and array element type.

Other lvv::array capabilities

  • Index of first element defaults to 0, but can be any number (third template parameter).

  • Index value for opertator[] tested if it is in valid range when NDEBUG macro is not defined (not optimized compile).

  • basic linear algebra functions: norm2(A), distance_norm2(A1,A2), dot(A1,A2), etc

See also sample use in test files t-*.cc and unit test u-array.cc.

How to submit patch

You can just email patch to [email protected]. All feedback is much appreciated.

If you are on Github it is even easier. See github patch-submit HOWTOs: ( 1, 2, 3 ) .

There is no hard set style rules.

References