0% found this document useful (0 votes)
242 views

NVMe

This document discusses the streamlined architecture of NVM Express (NVMe), which enables high performance PCIe solid state drives (SSDs). It describes how NVMe allows for a large number of parallel commands through its scalable queuing interface and efficient command submission and completion process. This queuing interface also enables NUMA-optimized drivers by allowing separate queues and interrupts per CPU core. NVMe supports over 232 outstanding commands through its ability to have up to 64K queues, each supporting 64K commands. It prioritizes commands using strict and weighted round robin scheduling.

Uploaded by

Mohd Mayur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
242 views

NVMe

This document discusses the streamlined architecture of NVM Express (NVMe), which enables high performance PCIe solid state drives (SSDs). It describes how NVMe allows for a large number of parallel commands through its scalable queuing interface and efficient command submission and completion process. This queuing interface also enables NUMA-optimized drivers by allowing separate queues and interrupts per CPU core. NVMe supports over 232 outstanding commands through its ability to have up to 64K queues, each supporting 64K commands. It prioritizes commands using strict and weighted round robin scheduling.

Uploaded by

Mohd Mayur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

How the Streamlined Architecture of NVM

Express Enables High Performance PCIe SSDs

Peter Onufryk
Director of Engineering
IDT

Flash Memory Summit 2012


Santa Clara, CA 1
The Need for a Large Number of
Parallel Commands
...

NAND

NAND

NAND

NAND
...

NAND

NAND

NAND

NAND
NVMe
PCIe x8 NAND Flash
Gen3 ...

NAND

NAND

NAND

NAND
Controller
BW ~6 GBps

...

...
...

...
...

NAND

NAND

NAND

NAND
8KB Page
TREAD = 75s 109 MBps Read BW
Need:
TPROG = 1ms 8 MBps Write BW
55 parallel 8KB reads
732 parallel 8KB writes

Flash Memory Summit 2012


Santa Clara, CA 2
Scalable Queuing Interface
Host
Controller
Core 0 Core 1 Core N
Managment
Admin Admin I/O I/O I/O I/O I/O I/O I/O
Submission Completion Submission Completion Submission Submission Completion Submission Completion
Queue Queue Queue Queue Queue Queue Queue Queue Queue
...

MSI-X MSI-X MSI-X MSI-X

NVMe Controller

• Enables NUMA optimized drivers


 One or more I/O submission queues, completion queue, and MSI-X interrupt per core
 High performance and low latency command issue
 No locking between cores

• Up to 232 outstanding commands


 Support for up to 64K I/O submission and completion queues
 Each queue supports up to 64K outstanding commands

Flash Memory Summit 2012


Santa Clara, CA 3
Efficient Queuing Interface
Queue Process
1 Command Host 7 Completion

Submission Completion
Queue Host Memory Queue
Ring Ring
Doorbell Doorbell
New Tail Tail Head New Head

2 8

Head Tail

Submission Queue Completion Queue


Tail Doorbell 3 4 5 6 Head Doorbell

Fetch Process Queue Generate


Command Command Completion Interrupt

NVMe Controller

Command Submission Command Processing Command Completion


1. Host writes command to 3. Controller fetches command 5. Controller writes completion to
submission queue 4. Controller processes command completion queue
2. Host writes updated submission 6. Controller generates MSI-X
queue tail pointer to doorbell interrupt
7. Host processes completion
8. Host writes updated completion
queue head pointer to doorbell
Flash Memory Summit 2012
Santa Clara, CA 4
NVMe Command Arbitration
Admin ASQ

SQ

SQ
Urgent RR

...
SQ

High Strict Priority

SQ

High SQ
RR
Medium Strict Priority
Priority
Priority
...

SQ
High WRR Priority

Low Strict Priority

SQ

Medium SQ
RR
Medium WRR Priority
WRR
Priority
...

SQ

Low WRR Priority


SQ

Low SQ
RR
Priority
...

SQ

Flash Memory Summit 2012


Santa Clara, CA 5
Fixed Sized
Commands & Completions
Submission Queue Entry (64B) Completion Queue Entry (16B)
Byte 3 Byte 2 Byte 1 Byte 0 Byte 3 Byte 2 Byte 1 Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Command Identifier FUSE Opcode 0

1 Namespace Identifier 1

DWord
2 2 SQ Identifier SQ Head Pointer
3 3 Status Field P Command Identifier
4
Metadata Pointer
5

6
PRP Entry 1
7
DWord

8
PRP Entry 2
9

10

11

12

13

14

15

Standard Fields Used By All Commands


Standard Fields Optionally Used By Commands

Flash Memory Summit 2012


Santa Clara, CA 6
Benefit of Fixed Sized Commands
Submission Queues in PCIe Memory

Element Buffer Element Buffer

...
Cmd Queue Queue
Cmd
PCIe 7 Element Element
Memory
... 3

Cmd Queue Queue


6 Element Element

Cmd Queue Queue


5 Element Element
Cmd
2
Candidate Queue Selector Cmd Queue Queue
4 Element Element
Arbiter & Element Fetch
Cmd Queue Queue
3 Element Element
NVMe
Controller Cmd Queue Cmd Queue
Front End Element Element
2 Element 1 Element
Buffer Buffer
0 ... N
Cmd Queue Queue
1 Element Element
Cmd
0
Cmd Queue Queue
Command Issue Logic 0 Element Element

Command Processing / Firmware


Fixed Sized Variable Sized
...

Commands Commands

Fixed Sized Commands Simplify Command Parsing, Arbitration, and Error Handling

Flash Memory Summit 2012


Santa Clara, CA 7
Simple Optimized Command Set
Admin Commands NVM Admin Commands
Create I/O Submission Queue Format NVM (optional)
Delete I/O Submission Queue Security Send (optional)
Create I/O Completion Queue Security Receive (optional)
Delete I/O Completion Queue
Get Log Page
Identify
Abort
Set Features NVM I/O Commands
Get Features Read
Asynchronous Event Request Write
Firmware Activate (optional) Flush
Firmware Image Download (optional) Write Uncorrectable (optional)
Compare (optional)
Dataset Management (optional)

10 Required Admin Commands


3 Required NVM I/O Commands

Flash Memory Summit 2012


Santa Clara, CA 8
NVM Creates New
Challenges and Opportunities
Physical
NAND Flash
Pages

Storage
Logical Block
Address Range 

SLC
NAND Flash

 MLC (2-bit)


NAND Flash

TLC
Logical NVMe
 PCIe NAND Flash
to Controller
Physical
Mapping
Other NVM
Wear (MRAM, PCM …)

Leveling

DRAM




NVM Controller with Tiered Storage

Flash Translation Layer

Flash Memory Summit 2012


Santa Clara, CA 9
NVMe Data Set Management Hints

Write Read Write Read


LBA LBA LBA LBA
Num LB Num LB Num LB Num LB
Host Controller
Commands

Traditional Storage Command Set

Write Read Read


LBA LBA LBA
Num LB Num LB DSM Num LB

Host DSM DSM DSM


Controller
Commands

NVMe Command Set


with Data Set Management (DSM)

Flash Memory Summit 2012


Santa Clara, CA 10
NVMe Data Set Management
Range Attributes
• Overall DSM Command
Write Read  Deallocate
LBA
Num LB
LBA
Num LB DSM  Integral write dataset
DSM DSM
 Integral read dataset
• Per DSM Range
LBA Range
DSM  Access size (in logical blocks)
LBA Range
DSM
 Written in near future
LBA Range  Sequential read
DSM
LBA Range  Sequential write
1 to 256 DSM
Ranges LBA Range
 Access latency (longer, typical,
DSM small)
LBA Range
DSM  Access frequency
LBA Range o Typical read and write
DSM
LBA Range o Infrequent read and write
DSM
o Infrequent write, frequent read
o Frequent write, infrequent read
o Frequent read and write
Flash Memory Summit 2012
Santa Clara, CA 11
Out-Of-Order Data
NAND NAND NAND

0,1,2

Read(7-0)
PCIe NVMe NAND NAND NAND
NAND Flash Erase
Controller 5 3 7
D7 D0 D6 D5 D1 D2 D4 D3

NAND NAND NAND


Buffer
6 4

Possible Sources of Out-Of-Order Data


 NAND or page TRead variation
 Target/LUN conflict
o Operations associated with same command (e.g., multiple reads to NAND)
o Different operation (e.g., previously issued program or erase)
 NAND error handling
o ECC correction time variation, read-retry, …
 Flash channel conflict

Flash Memory Summit 2012


Santa Clara, CA 12
Traditional Scatter Gather List
(SGL)
Host Controller

Host Read
Physical
Memory

C3
NAND
C1 Data

D0
Address Length C0
C4 C0 a D1
C1 b C1
C2 c D2
C3 d
C0
C2 D3

D4
Address Length
C3
C4 e D5
C2 C5 f C4
D6
C5
D7
C5

Flash Memory Summit 2012


Santa Clara, CA 13
I/O Operation and Host Memory
Process Host
Virtual Physical
Memory Memory

C5
bufPtr
read(buPtr, numBytes)
C0 C6
Page Offset
C1 C0

C2

C3 C1

numBytes C4 C7

C5

C6 C2

C7 C3

C8

C8

C4

Flash Memory Summit 2012


Santa Clara, CA 14
NVMe Physical Region Page
(PRPs) Read
Process Host
Virtual Physical
Memory Memory

C5
bufPtr
read(buPtr, numBytes) NAND
C0 C6
Page Address Offset
Data
C0 offset
C1 C0
C1 - C0
D0
C2 -
C2 C1
C3 -
D1
C3 C1 C2
D2
Page Address Offset
numBytes C4 C7 C3
C4 -
D3
C5 -
C5 C4
C6 -
D4
C7 -
C6 C2 C5
D5
C7 C3 C6
Page Address Offset
D6
C8 -
C8 C7
- -
D7
- -
C8 C8
- -
D8
C4

Flash Memory Summit 2012


Santa Clara, CA 15
Summary
• Scalable and Efficient Queuing Interface
 Low overhead command issue and completion
 Parallel command execution
• Fixed Sized Commands
 Straightforward command fetch, parsing and arbitration
• Simple Command Set (3 required I/O commands)
 Fast command processing
• Data Set Management Hints
 Controller optimization of data placement
• Physical Region Pointers
 Simplified out-of-order data delivery

Flash Memory Summit 2012


Santa Clara, CA 16

You might also like