The Wayback Machine - https://web.archive.org/web/20120520222937/http://www.arm.com/products/processors/technologies/neon.php

Login

ARM The Architecture For The Digital World  

NEON

NEON Image
The ARM® NEON™ general-purpose SIMD engine efficiently processes current and future multimedia formats, enhancing the user experience.

NEON technology can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis by at least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD.

NEON technology is cleanly architected and works seamlessly with its own independent pipeline and register file.

NEON technology is a 128 bit SIMD (Single Instruction, Multiple Data) architecture extension for the ARM Cortex™-A series processors, designed to provide flexible and powerful acceleration for consumer multimedia applications, delivering a significantly enhanced user experience.  It has 32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide.

NEON instructions perform "Packed SIMD" processing:

  • Registers are considered as vectors of elements of the same data type
  • Data types can be: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single precision floating point
  • Instructions perform the same operation in all lanes

Diagram illustrating NEON packed SIMD processing

The ARM Cortex™-A series processors with NEON technology, as well as ARM's Mali multimedia hardware solutions are used in multimedia applications ranging from smartphones and mobile computing devices to HDTV.

 
 


NEON Enhancing User Experience

NEON enhances many multimedia user experiences:

  • Watch any video in any format
  • Edit and enhance captured videos - video stabilization
  • Anti-aliased rendering and compositing
  • Game processing
  • Process multi-megapixel photos quickly
  • Voice recognition
  • Powerful multichannel hi-fi audio processing

NEON Features and Benefits

NEON supports the widest range of multimedia codecs used for internet applications:

  • Many soft codec standards: MPEG-4, H.264, On2 VP6/7/8, Real, AVS.....
  • Ideal solution for normal size "internet streaming" decode of various formats
  • Not just for codecs - also applicable to 2D and 3D graphics and other vector processing
  • Off the shelf tools, OS support, and ecosystem support

Fewer cycles needed:

  • NEON will give 60-150% performance boost on complex video codecs
  • Individual simple DSP algorithms can show larger performance boost (4x-8x)
  • Processor can sleep sooner, resulting in overall dynamic power saving 

NEON technology features a number of elements to increase performance and simplify software development, such as: 

  • Aligned and unaligned data access allows for efficient vectorization of SIMD operations.
  • Clean instruction set architecture designed for autovectorizing compilers and hand coding.
  • Efficient access to packed arrays such as ARGB or xyz coordinates
  • Support for both integer and floating point operations ensures adaptability to a broad range of applications, from codecs to High Performance Computing to 3D graphics. 
  • Tight coupling to the ARM processor provides a single instruction stream and a unified view of memory, presenting a single development platform target with a simpler tool flow. 
  • The large NEON register file with its dual 128-bit/64-bit views enables efficient handling of data and minimizes access to memory, enhancing data throughput.

How to use NEON

OpenMAX DL library:

  • Recommended approach to accelerate AV codecs
  • Libraries released in source form, free-of-charge from the ARM website
  • Supports the following formats: MPEG-4 simple profile, H.264 baseline, JPEG, MP3, AAC
  • Supports the following functions: FIR, IIR, FFT, Dot Product, Color space conversion, de-blocking.de-ringing, rotation, scaling, composition

Vectorizing compilers:

  • Exploits NEON SIMD automatically with existing source code
  • Supported by ARM RealView Development Suite (v3.1 Pro and later)
  • Supported by gcc in versions 2007q3 and later

C intrinsics:

  • C function call interface to NEON operations
  • Supports all data types and operations supported by NEON
  • Supported in ARM RealView Development Suite (version 3.1 and later) and gcc version 2007q3 and later

Assembler:

  • For those who really want to optimize at the lowest level
  • Supported in ARM's RealView Development Suite (version 3.1 and later) and gcc version 2007q3 and later

 

NEON Support in the Open Source Community

NEON is currently supported in the following Open Source projects:

  • Android - NEON optimizations
    • Skia library, S32A_D565_Opaque  is 5x faster using NEON
  • Ubuntu 09.04 support NEON:
    • NEON versions of critical shared libraries
  • Bluez - official linux Bluetooth protocol stack
    • NEON SBC audio encoder
  • Pixman (part of Cairo 2D graphic library)
    • Compositing/alpha blending
    • X.Org, Mozilla Firefox, Fennec and Webkit browsers
    • eg fbCompositeSolidMask_nx8x0565neon  is 8x faster using NEON
  • ffmpeg - libavcodec
    • LGPL media player used in many Linux distributions
    • Video: MPEG-2, MPEG-4ASP, H.264 (AVC), VC1
    • Audio: Ogg Vorbis
  • x264 - Google Summer of Code 2009
    • GPL h.264 encoder - eg for video conferencing

 


NEON technology is supported by the industry’s largest network of Partners – the ARM Connected Community. Leading silicon, systems, design support and software providers come together to provide a complete and optimized solution for products based on NEON technology.

 

CompanyApplication
IngenientH.264, VC1, MPEG-4
 On2 Technologies VP6/7, MPEG-4, VC1, H.264, video stabilization
 Itiiam SystemsMPEG-4, MPEG-2, H.263, H.264, WMV9, VC1
 Aricent TechnologiesMPEG-4, H.263, H.264, WMV9, audio
 H.264, VC1
 Spirit DSPTEAMSpirit voice and video
 VisualOnH.264, MPEG-4, H.263, WMV
 ActimagineMobiClip
 Fraunhofer iisVideo and audio codecs
Dolby LabsMultichannel audio processing
Techno MathematicalMPEG-4
EspicoAudio and consulting

Maximise