Virtualization Hyper V On Windows
Virtualization Hyper V On Windows
Hyper-V specifically provides hardware virtualization. That means each virtual machine
runs on virtual hardware. Hyper-V lets you create virtual hard drives, virtual switches,
and a number of other virtual devices all of which can be added to virtual machines.
Experiment with other operating systems. Hyper-V makes it very easy to create and
remove different operating systems.
Test software on multiple operating systems using multiple virtual machines. With
Hyper-V, you can run them all on a single desktop or laptop computer. These
virtual machines can be exported and then imported into any other Hyper-V
system, including Azure.
System requirements
Hyper-V is available on 64-bit versions of Windows 10 Pro, Enterprise, and Education. It
is not available on the Home edition.
Upgrade from Windows 10 Home edition to Windows 10 Pro by opening Settings >
Update and Security > Activation. Here you can visit the store and purchase an
upgrade.
Most computers run Hyper-V, however each virtual machine runs a completely separate
operating system. You can generally run one or more virtual machines on a computer
with 4GB of RAM, though you'll need more resources for additional virtual machines or
to install and run resource intense software like games, video editing, or engineering
design software.
For more information about Hyper-V's system requirements and how to verify that
Hyper-V runs on your machine, see the Hyper-V Requirements Reference.
As a reminder, you'll need to have a valid license for any operating systems you use in
the VMs.
For information about which operating systems are supported as guests in Hyper-V on
Windows, see Supported Windows Guest Operating Systems and Supported Linux Guest
Operating Systems.
Limitations
Programs that depend on specific hardware will not work well in a virtual machine. For
example, games or applications that require processing with GPUs might not work well.
Also, applications relying on sub-10ms timers such as live music mixing applications or
high precision times could have issues running in a virtual machine.
Next step
Install Hyper-V on Windows 10
Install Hyper-V on Windows 10
Article • 04/26/2022
Hyper-V can be enabled in many ways including using the Windows 10 control panel,
PowerShell or using the Deployment Imaging Servicing and Management tool (DISM).
This documents walks through each option.
Check Requirements
Windows 10 Enterprise, Pro, or Education
64-bit Processor with Second Level Address Translation (SLAT).
CPU support for VM Monitor Mode Extension (VT-c on Intel CPUs).
Minimum of 4 GB memory.
Upgrade from Windows 10 Home edition to Windows 10 Pro by opening up Settings >
Update and Security > Activation.
PowerShell
PowerShell
For more information about DISM, see the DISM Technical Reference.
We've been building new tools for creating virtual machines so the instructions have
changed significantly over the past three releases.
c. Pick the .iso or .vhdx that you want to turn into a new virtual machine.
d. If the image is a Linux image, deselect the Secure Boot option.
2. In Hyper-V Manager, Find Quick Create in the right hand Actions menu.
4. Click Connect to start your virtual machine. Don't worry about editing the settings,
you can go back and change them any time.
You may be prompted to ‘Press any key to boot from CD or DVD’. Go ahead and
do so. As far as it knows, you're installing from a CD.
Congratulations, you have a new virtual machine. Now you're ready to install the
operating system.
Now that you have walked through the basics of deploying Hyper-V, creating virtual
machines and managing these virtual machines, let’s explore how you can automate
many of these activities with PowerShell.
PowerShell
3. To learn more about a particular PowerShell command use Get-Help . For instance
running the following command returns information about the Get-VM Hyper-V
command.
PowerShell
Get-Help Get-VM
The output shows you how to structure the command, what the required and optional
parameters are, and the aliases that you can use.
PowerShell
Get-VM
2. To return a list of only powered on virtual machines add a filter to the Get-VM
command. A filter can be added by using the Where-Object command. For more
information on filtering see the Using the Where-Object documentation.
PowerShell
Get-VM | where {$_.State -eq 'Running'}
3. To list all virtual machines in a powered off state, run the following command. This
command is a copy of the command from step 2 with the filter changed from
'Running' to 'Off'.
PowerShell
PowerShell
2. To start all currently powered off virtual machines, get a list of those machines and
pipe the list to the Start-VM command:
PowerShell
PowerShell
Create a VM checkpoint
To create a checkpoint using PowerShell, select the virtual machine using the Get-VM
command and pipe this to the Checkpoint-VM command. Finally give the checkpoint a
name using -SnapshotName . The complete command looks like the following:
PowerShell
PowerShell
$VMName = "VMNAME"
$VM = @{
Name = $VMName
MemoryStartupBytes = 2147483648
Generation = 2
NewVHDSizeBytes = 53687091200
BootDevice = "VHD"
SwitchName = (Get-VMSwitch).Name
New-VM @VM
Enhanced Session Mode lets Hyper-V connect to virtual machines using RDP (remote
desktop protocol). Not only does this improve your general virtual machine viewing
experience, connecting with RDP also allows the virtual machine to share devices with
your computer. Since it's on by default in Windows 10, you're probably already using
RDP to connect to your Windows virtual machines. This article highlights some of the
benefits and hidden options in the connection settings dialogue.
This article shows you how to see your session type, enter enhanced session mode, and
configure your session settings.
You are currently running in enhanced session mode. Clicking this icon will reconnect to
your virtual machine in basic mode.
You are currently running in basic session mode but enhanced session mode is available.
Clicking this icon will reconnect to your virtual machine in enhanced session mode.
You are currently running in basic mode. Enhanced session mode isn't available for this
virtual machine.
First, log back in to the VM using Basic Mode. Search for "Sign-In Options" in the
Settings app or Start menu.
On this page, turn "Require Windows Hello sign-in for
Microsoft accounts" off.
Windows 11 Windows 10
Now, sign out of the VM or reboot before closing the Virtual Machine Connect window.
To share devices with your virtual machine or to change those default settings:
To share other devices, such as USB devices or your C: drive, select the "More..." menu:
From there you can select the devices you'd like to share with the virtual machine. The
system drive (Windows C:) is especially helpful for file sharing.
To change those settings or to add microphone passthrough (so you can record audio in
a virtual machine):
Since your virtual machine is probably running locally, the "play on this computer" and
"play on remote computer" options will yield the same results.
Powershell
vmconnect.exe
One of the great benefits to virtualization is the ability to easily save the state of a virtual
machine. In Hyper-V this is done through the use of virtual machine checkpoints. You
may want to create a virtual machine checkpoint before making software configuration
changes, applying a software update, or installing new software. If a system change were
to cause an issue, the virtual machine can be reverted to the state at which it was when
then checkpoint was taken.
Standard Checkpoints: takes a snapshot of the virtual machine and virtual machine
memory state at the time the checkpoint is initiated. A snapshot is not a full
backup and can cause data consistency issues with systems that replicate data
between different nodes such as Active Directory. Hyper-V only offered standard
checkpoints (formerly called snapshots) prior to Windows 10.
Production Checkpoints: uses Volume Shadow Copy Service or File System Freeze
on a Linux virtual machine to create a data-consistent backup of the virtual
machine. No snapshot of the virtual machine memory state is taken.
Production checkpoints are selected by default however this can be changed using
either Hyper-V manager or PowerShell.
Note: The Hyper-V PowerShell module has several aliases so that checkpoint and
snapshot can be used interchangeably.
This document uses checkpoint, however be aware that you may see similar
commands using the term snapshot.
The following commands can be run to change the checkpoint with PowerShell.
PowerShell
PowerShell
PowerShell
Creating checkpoints
Creates a checkpoint of the type configured for the virtual machine. See the Configuring
Checkpoint Type section earlier in this document for instructions on how to change this
type.
To create a checkpoint:
Using PowerShell
PowerShell
When the checkpoint process has completed, view a list of checkpoints for a virtual
machine use the Get-VMCheckpoint command.
PowerShell
Applying checkpoints
If you want to revert your virtual machine to a previous point-in-time, you can apply an
existing checkpoint.
Using Hyper-V Manager
Create Checkpoint and Apply: Creates a new checkpoint of the virtual machine
before it applies the earlier checkpoint.
Apply: Applies only the checkpoint that you have chosen. You cannot undo this
action.
Cancel: Closes the dialog box without doing anything.
Using PowerShell
PowerShell
PowerShell
Renaming checkpoints
Many checkpoints are created at a specific point. Giving them an identifiable name
makes it easier to remember details about the system state when the checkpoint was
created.
By default, the name of a checkpoint is the name of the virtual machine combined with
the date and time the checkpoint was taken. This is the standard format:
Names are limited to 100 characters, and the name cannot be blank.
Using PowerShell
PowerShell
Deleting checkpoints
Deleting checkpoints can help create space on your Hyper-V host.
Behind the scenes, checkpoints are stored as .avhdx files in the same location as the
.vhdx files for the virtual machine. When you delete a checkpoint, Hyper-V merges the
.avhdx and .vhdx files for you. Once completed, the checkpoint's .avhdx file will be
deleted from the file system.
Using PowerShell
PowerShell
Exporting checkpoints
Export bundles the checkpoint as a virtual machine so the checkpoint can be moved to a
new location. Once imported, the checkpoint is restored as a virtual machine. Exported
checkpoints can be used for backup.
Using PowerShell
PowerShell
1. In Hyper-V Manager, right-click the name of the virtual machine, and click
Settings.
2. In the Management section, select Checkpoints or Checkpoint File Location.
3. In Checkpoint File Location, enter the path to the folder where you would like to
store the files.
4. Click Apply to apply your changes. If you are done, click OK to close the dialog
box.
Standard checkpoint
1. Log into your virtual machine and create a text file on the desktop.
2. Open the file with Notepad and enter the text ‘This is a Standard Checkpoint.’ Do
not save the file or close Notepad.
3. Change the checkpoint to standard -- instructions here.
4. Create a new checkpoint.
Now that a checkpoint exists, make a modification to the virtual machine and then apply
the checkpoint to revert the virtual machine back to the saved state.
1. Close the text file if it is still open and delete it from the virtual machine's desktop.
2. Open Hyper-V Manager, right click on the standard checkpoint, and select Apply.
3. Select Apply on the Apply Checkpoint notification window.
Once the checkpoint has been applied, notice that not only is the text file present, but
the system is in the exact state that it was when the checkpoint was created. In this case
Notepad is open and the text file loaded.
Production checkpoint
Let’s now examine production checkpoints. This process is almost identical to working
with a standard checkpoint, however will have slightly different results. Before beginning
make sure you have a virtual machine and that you have changes the checkpoint type to
Production checkpoints.
1. Log into the virtual machine and create a new text file. If you followed the previous
exercise, you can use the existing text file.
2. Enter ‘This is a Production Checkpoint.’ into the text file, save the file but do not
close Notepad.
3. Open Hyper-V Manager, right click on the virtual machine, and select Checkpoint.
4. Click OK on the Production Checkpoint Created Window.
Now that a checkpoint exists make a modification to the system and then apply the
checkpoint to revert the virtual machine back to the saved state.
1. Close the text file if it is still open and delete it from the virtual machine's desktop.
2. Open Hyper-V Manager, right click on the production checkpoint, and select
Apply.
3. Select Apply on the Apply Checkpoint notification window.
Once the production checkpoint has been applied, noticed that the virtual machine is in
an off state.
You can use PowerShell Direct to run arbitrary PowerShell in a Windows 10 or Windows
Server 2016 virtual machine from your Hyper-V host regardless of network configuration
or remote management settings.
Requirements
Operating system requirements:
If you're managing older virtual machines, use Virtual Machine Connection (VMConnect)
or configure a virtual network for the virtual machine.
Configuration requirements:
2. Run one of the following commands to create an interactive session using the
virtual machine name or GUID:
PowerShell
You should see the VMName as the prefix for your PowerShell prompt as shown:
[VMName]: PS C:\>
Any command run will be running on your virtual machine. To test, you can run
ipconfig or hostname to make sure that these commands are running in the virtual
machine.
4. When you're done, run the following command to close the session:
PowerShell
Exit-PSSession
Note: If your session won't connect, see the troubleshooting for potential causes.
2. Run one of the following commands to create a session using the virtual machine
name or GUID:
PowerShell
The command will execute on the virtual machine, if there is output to the console,
it'll be printed to your console. The connection will be closed automatically as soon
as the command runs.
To run a script:
2. Run one of the following commands to create a session using the virtual machine
name or GUID:
PowerShell
The script will execute on the virtual machine. The connection will be closed
automatically as soon as the command runs.
Persistent PowerShell sessions are incredibly useful when writing scripts that coordinate
actions across one or more remote machines. Once created, persistent sessions exist in
the background until you decide to delete them. This means you can reference the same
session over and over again with Invoke-Command or Enter-PSSession without passing
credentials.
By the same token, sessions hold state. Since persistent sessions persist, any variables
created in a session or passed to a session will be preserved across multiple calls. There
are a number of tools available for working with persistent sessions. For this example,
we will use New-PSSession and Copy-Item to move data from the host to a virtual
machine and from a virtual machine to the host.
PowerShell
Warning:
There is a bug in builds before 14500. If credentials aren't explicitly specified with -
Credential flag, the service in the guest will crash and will need to be restarted. If
To copy C:\host_path\data.txt to the virtual machine from the host machine, run:
PowerShell
PowerShell
PowerShell
Remove-PSSession $s
Troubleshooting
There are a small set of common error messages surfaced through PowerShell Direct.
Here are the most common, some causes, and tools for diagnosing issues.
Potential causes:
The most likely issue is that PowerShell Direct isn't supported by your host operating
system.
You can check your Windows build by running the following command:
PowerShell
[System.Environment]::OSVersion.Version
If you are running a supported build, it is also possible your version of PowerShell does
not run PowerShell Direct. For PowerShell Direct and JEA, the major version must be 5 or
later.
You can check your PowerShell version build by running the following command:
PowerShell
$PSVersionTable.PSVersion
For Enter-PSSession between host builds 10240 and 12400, all errors below reported
as "A remote session might have ended".
Error message:
Potential causes:
You can use the Get-VM cmdlet to check to see which VMs are running on the host.
Error message:
Potential causes:
One of the reasons listed above -- they all are equally applicable to New-PSSession
A bug in current builds where credentials must be explicitly passed with -
Credential . When this happens, the entire service hangs in the guest operating
system and needs to be restarted. You can check if the session is still available with
Enter-PSSession.
To work around the credential issue, log into the virtual machine using VMConnect,
open PowerShell, and restart the vmicvmsession service using the following PowerShell:
PowerShell
Potential causes:
Administrator credentials can be passed to the virtual machine with the -Credential
parameter or by entering them manually when prompted.
Potential causes:
Enter-PSSession : The input VMName parameter does not resolve to any virtual
machine.
Potential causes:
You can use the Get-VM cmdlet to check that the credentials you're using have the
Hyper-V administrator role and to see which VMs are running locally on the host and
booted.
Get early access to pre-release features for Hyper-V on Windows Server 2016 Technical
Preview to try out in your development or test environments. You can be the first to see
the latest Hyper-V features and help shape the product by providing early feedback.
Here are some more reasons why these are for non-production environments only:
1. On the Windows desktop, click the Start button and type any part of the name
Windows PowerShell.
2. Right-click Windows PowerShell and select Run as Administrator.
3. Use the New-VM cmdlet with the -Prerelease flag to create the pre-release virtual
machine. For example, run the following command where VM Name is the name of
the virtual machine that you want to create.
PowerShell
New-VM -Name <VM Name> -Prerelease
To create a virtual machine that uses an existing virtual hard disk or a new hard
disk, see the PowerShell examples in Create a virtual machine in Hyper-V on
Windows Server 2016 Technical Preview.
To create a new virtual hard disk that boots to an operating system image, see the
PowerShell example in Deploy a Windows Virtual Machine in Hyper-V on Windows
10.
The examples covered in those articles work for Hyper-V hosts that run Windows 10 or
Windows Server 2016 Technical Preview. But right now, you can only use the -Prerelease
flag to create a pre-release virtual machine on Hyper-V hosts that run Windows Server
2016 Technical Preview.
See also
Virtualization Blog - Learn about the pre-release features that are available and
how to try them out.
Supported virtual machine configuration versions - Learn how to check the virtual
machine configuration version and which versions are supported by Microsoft.
Run Hyper-V in a Virtual Machine with
Nested Virtualization
Article • 05/09/2023
Nested virtualization is a feature that allows you to run Hyper-V inside of a Hyper-V
virtual machine (VM). This is helpful for running a Visual Studio phone emulator in a
virtual machine, or testing configurations that ordinarily require several hosts.
7 Note
Prerequisites
7 Note
The guest can be any Windows supported guest operating system. Newer Windows
operating systems may support enlightenments that improve performance.
7 Note
When using Windows Server 2019 as the first level VM, the number of vCPUs
should be 225 or less.
Note that simply enabling nested virtualization will have no effect on dynamic memory
or runtime memory resize. The incompatibility only occurs while Hyper-V is running in
the VM.
Networking Options
There are two options for networking with nested virtual machines:
PowerShell
First, a virtual NAT switch must be created in the host virtual machine (the "middle" VM).
Note that the IP addresses are just an example, and will vary across environments:
PowerShell
PowerShell
Each nested virtual machine must have an IP address and gateway assigned to it. Note
that the gateway IP must point to the NAT adapter from the previous step. You may also
want to assign a DNS server:
PowerShell
Nested virtualization makes this hardware support available to guest virtual machines.
The diagram below shows Hyper-V without nesting. The Hyper-V hypervisor takes full
control of the hardware virtualization capabilities (orange arrow), and does not expose
them to the guest operating system.
In contrast, the diagram below shows Hyper-V with nested virtualization enabled. In this
case, Hyper-V exposes the hardware virtualization extensions to its virtual machines.
With nesting enabled, a guest virtual machine can install its own hypervisor and run its
own guest VMs.
In Fall Creators Update, Quick Create expanded to include a virtual machine gallery.
While there is a set of images provided by Microsoft and Microsoft partners, the gallery
can also list your own images.
Gallery architecture
The virtual machine gallery is a graphical view for a set of virtual machine sources
defined in the Windows registry. Each virtual machine source is a path (local path or URI)
to a JSON file with virtual machines as list items.
The list of virtual machines you see in the gallery is the full contents of the first source,
followed by the contents of the second source, so on and so forth until all of the
available virtual machines have been listed. The list is dynamically created every time
you launch the gallery.
Type: REG_MULTI_SZ
Virtual machines made from a virtual hard drive have a few configuration requirements:
1. Built to support UEFI firmware. If they're created using Hyper-V, that's a Generation
2 VM.
2. The virtual hard drive should be at least 20GB - keep in mind, that's the max size.
Hyper-V will not take space the VM isn't actively using.
5. Create Virtual Machine. If the virtual machine boots correctly, it's ready for the
gallery.
Text information:
name - required - this is the name that appears in the left column and also at the
top of the virtual machine view.
publisher - required
version - required
The following PowerShell command will provide today's date in the proper format
and put it on the clipboard:
PowerShell
Pictures:
logo - required
symbol
thumbnail
PowerShell
The below JSON template has starter items and the gallery's schema. If you edit it in
VSCode, it will automatically provide IntelliSense.
JSON
"$schema":
"https://raw.githubusercontent.com/MicrosoftDocs/Virtualization-
Documentation/live/hyperv-tools/vmgallery/vm-gallery-schema.json",
"images": [
"name": "",
"version": "",
"locale": "",
"publisher": "",
"lastUpdated": "",
"description": [
""
],
"disk": {
"uri": "",
"hash": ""
},
"logo": {
"uri": "",
"hash": ""
},
"symbol": {
"uri": "",
"hash": ""
},
"thumbnail": {
"uri": "",
"hash": ""
1. Open regedit.exe
2. Navigate to Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\Virtualization\
If it doesn't already exist, go to the Edit menu, navigate through New to Multi-
String Value
Troubleshooting
Resources
There are a handfull of gallery scripts and helpers in GitHub link .
See a sample gallery entry here . This is the JSON file that defines the in-box gallery.
Set up a NAT network
Article • 01/17/2023
Windows 10 Hyper-V allows native network address translation (NAT) for a virtual
network.
Requirements:
Note: Currently, you are limited to one NAT network per host. For additional details
on the Windows NAT (WinNAT) implementation, capabilities, and limitations, please
reference the WinNAT capabilities and limitations blog
NAT Overview
NAT gives a virtual machine access to network resources using the host computer's IP
address and a port through an internal Hyper-V Virtual Switch.
Additionally, NAT allows multiple virtual machines to host applications that require
identical (internal) communication ports by mapping these to unique external ports.
For all of these reasons, NAT networking is very common for container technology (see
Container Networking).
PowerShell
3. Find the interface index of the virtual switch you just created.
Console
PS C:\> Get-NetAdapter
The internal switch will have a name like vEthernet (SwitchName) and an Interface
Description of Hyper-V Virtual Ethernet Adapter . Take note of its ifIndex to use
in the next step.
PowerShell
In order to configure the gateway, you'll need a bit of information about your
network:
IPAddress -- NAT Gateway IP specifies the IPv4 or IPv6 address to use as the
NAT gateway IP.
The generic form will be a.b.c.1 (e.g. 172.16.0.1). While the
final position doesn’t have to be .1, it usually is (based on prefix length). This
IP address is in the range of addresses used by the guest virtual machines.
For example if the guest VMs use IP range 172.16.0.0, then you can use an IP
address 172.16.0.100 as the NAT Gateway.
PrefixLength -- NAT Subnet Prefix Length defines the NAT local subnet size
(subnet mask).
The subnet prefix length will be an integer value between 0
and 32.
0 would map the entire internet, 32 would only allow one mapped IP.
Common values range from 24 to 12 depending on how many IPs need to be
attached to the NAT.
InterfaceIndex -- ifIndex is the interface index of the virtual switch, which you
determined in the previous step.
PowerShell
PowerShell
In order to configure the gateway, you'll need to provide information about the
network and NAT Gateway:
Name -- NATOutsideName describes the name of the NAT network. You'll use
this to remove the NAT network.
InternalIPInterfaceAddressPrefix -- NAT subnet prefix describes both the
NAT Gateway IP prefix from above as well as the NAT Subnet Prefix Length
from above.
For our example, run the following to setup the NAT network:
PowerShell
Congratulations! You now have a virtual NAT network! To add a virtual machine, to the
NAT network follow these instructions.
Since WinNAT by itself does not allocate and assign IP addresses to an endpoint (e.g.
VM), you will need to do this manually from within the VM itself - i.e. set IP address
within range of NAT internal prefix, set default gateway IP address, set DNS server
information. The only caveat to this is when the endpoint is attached to a container. In
this case, the Host Network Service (HNS) allocates and uses the Host Compute Service
(HCS) to assign the IP address, gateway IP, and DNS info to the container directly.
User has connected VMs to a NAT network through an internal vSwitch named “VMNAT”
and now wants to install Windows Container feature with docker engine
PS C:\> Get-NetNat “VMNAT”| Remove-NetNat (this will remove the NAT but keep
the internal vSwitch).
Edit the arguments passed to the docker daemon (dockerd) by adding –fixed-
cidr=<container prefix> parameter. This tells docker to create a default nat
network with the IP subnet <container prefix> (e.g. 192.168.1.0/24) so that
HNS can allocate IPs from this prefix.
PS C:\> Get-NetNat | Remove-NetNAT (again, this will remove the NAT but keep
the internal vSwitch)
Docker/HNS will assign IPs to Windows containers and Admin will assign IPs to VMs
from the difference set of the two.
User has installed Windows Container feature with docker engine running and now
wants to connect VMs to the NAT network
PS C:\> Get-NetNat | Remove-NetNat (this will remove the NAT but keep the
internal vSwitch)
Edit the arguments passed to the docker daemon (dockerd) by adding -b “none”
option to the end of docker daemon (dockerd) command to tell docker not to
create a default NAT network.
PS C:\> Get-Netnat | Remove-NetNAT (again, this will remove the NAT but keep
the internal vSwitch)
Docker/HNS will assign IPs to Windows containers and Admin will assign IPs to VMs
from the difference set of the two.
In the end, you should have two internal VM switches and one NetNat shared between
them.
We will detail the Docker 4 Windows - Docker Beta - Linux VM co-existing with the
Windows Container feature on the same host as an example. This workflow is subject
to change
Removes any previously existing container networks (i.e. deletes vSwitch, deletes
NetNat, cleans up)
6. Remove-NetNAT
Removes both DockerNAT and nat NAT networks (keeps internal vSwitches)
Docker will use the user-defined NAT network as the default to connect Windows
containers
In the end, you should have two internal vSwitches – one named DockerNAT and the
other named nat. You will only have one NAT network (10.0.0.0/17) confirmed by
running Get-NetNat. IP addresses for Windows containers will be assigned by the
Windows Host Network Service (HNS) from the 10.0.76.0/24 subnet. Based on the
existing MobyLinux.ps1 script, IP addresses for Docker 4 Windows will be assigned from
the 10.0.75.0/24 subnet.
Troubleshooting
To see if this may be the problem, make sure you only have one NAT:
PowerShell
Get-NetNat
PowerShell
Get-NetNat | Remove-NetNat
Make sure you only have one “internal” vmSwitch for the application or feature (e.g.
Windows containers). Record the name of the vSwitch
PowerShell
Get-VMSwitch
Check to see if there are private IP addresses (e.g. NAT default Gateway IP Address –
usually x.y.z.1) from the old NAT still assigned to an adapter
PowerShell
PowerShell
We have seen reports of multiple NAT networks created inadvertently. This is due to a
bug in recent builds (including Windows Server 2016 Technical Preview 5 and Windows
10 Insider Preview builds). If you see multiple NAT networks, after running docker
network ls or Get-ContainerNetwork, please perform the following from an elevated
PowerShell:
PowerShell
$keys = Get-ChildItem
"HKLM:\SYSTEM\CurrentControlSet\Services\vmsmp\parameters\SwitchList"
foreach($key in $keys)
$newKeyPath = $KeyPath+"\"+$key.PSChildName
Remove-NetNat -Confirm:$false
Get-ContainerNetwork | Remove-ContainerNetwork
Stop-Service docker
Reboot the operating system prior executing the subsequent commands ( Restart-
Computer )
PowerShell
Get-NetNat | Remove-NetNat
Start-Service docker
See this setup guide for multiple applications using the same NAT to rebuild your NAT
environment, if necessary.
References
Read more about NAT networks
Create a virtual network
Article • 04/26/2022
Your virtual machines will need a virtual network to share a network with your computer.
Creating a virtual network is optional -- if your virtual machine doesn't need to be
connected to the internet or a network, skip ahead to creating a Windows Virtual
Machine.
This exercise walks through creating an external virtual switch. Once completed, your
Hyper-V host will have a virtual switch that can connect virtual machines to the internet
through your computer's network connection.
2. Select the server in the left pane, or click "Connect to Server..." in the right pane.
3. In Hyper-V Manager, select Virtual Switch Manager... from the 'Actions' menu on
the right.
4. Under the 'Virtual Switches' section, select New virtual network switch.
5. Under 'What type of virtual switch do you want to create?', select External.
7. Under ‘Virtual Switch Properties’, give the new switch a name such as External VM
Switch.
8. Under ‘Connection Type’, ensure that External Network has been selected.
9. Select the physical network card to be paired with the new virtual switch. This is
the network card that is physically connected to the network.
10. Select Apply to create the virtual switch. At this point you will most likely see the
following message. Click Yes to continue.
PowerShell
PS C:\> Get-NetAdapter
Name InterfaceDescription
ifIndex Status MacAddress LinkSpeed
2. Select the network adapter to use with the Hyper-V switch and place an instance in
a variable named $net.
PowerShell
3. Execute the following command to create the new Hyper-V virtual switch.
PowerShell
NAT networking
Network Address Translation (NAT) gives a virtual machine access to your computer's
network by combining the host computer's IP address with a port through an internal
Hyper-V Virtual Switch.
To set up a NAT network and connect it to a virtual machine, follow the NAT networking
user guide.
) Important
The two switch approach does not support External vSwitch over wireless card and
should be used for testing purposes only.
This document walks through creating a simple program built on Hyper-V sockets.
Supported Host OS
Supported Guest OS
Bash
CONFIG_VSOCKET=y
CONFIG_HYPERV_VSOCKETS=y
Getting started
Requirements:
C/C++ compiler. If you don't have one, checkout Visual Studio Community
Windows 10 SDK -- pre-installed in Visual Studio 2015 with Update 3 and later.
A computer running one of the host operating systems above with at least one
vitual machine. -- this is for testing your application.
Note: The API for Hyper-V sockets became publicly available in Windows 10
Anniversary Update. Applications that use HVSocket will run on any Windows 10
host and guest but can only be developed with a Windows SDK later than build
14290.
The following PowerShell will register a new application named "HV Socket Demo". This
must be run as administrator. Manual instructions below.
PowerShell
$service.SetValue("ElementName", $friendlyName)
$service.PSChildName | clip.exe
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\Virtualization\GuestCommunicationServices\
In this registry location, you'll see several GUIDs. Those are our in-box services.
Service GUID
To register your own service, create a new registry key using your own GUID and friendly
name.
The friendly name will be associated with your new application. It will appear in
performance counters and other places where a GUID isn't appropriate.
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\Virtualization\GuestCommunicationServices\
999E53D4-3D5C-4C3E-8779-BED06EC056E1\
YourGUID\
Note: The Service GUID for a Linux guest uses the VSOCK protocol which addresses
via a svm_cid and svm_port rather than a guids. To bridge this inconsistency with
Windows the well-known GUID is used as the service template on the host which
translates to a port in the guest. To customize your Service GUID simply change the
first "00000000" to the port number desired. Ex: "00000ac9" is port 2761.
C++
struct __declspec(uuid("00000000-facb-11e6-bd58-64006a7986d3"))
VSockTemplate{};
/*
*/
PowerShell
(New-Guid).Guid | clip.exe
// Windows
);
// Linux guest
// Windows
// Linux guest
The function definition is copied below for convinience, read more about bind here.
C
// Windows
int bind(
_In_ SOCKET s,
);
// Linux guest
socklen_t addrlen);
In contrast to the socket address (sockaddr) for a standard Internet Protocol address
family ( AF_INET ) which consists of the host machine's IP address and a port number on
that host, the socket address for AF_HYPERV uses the virtual machine's ID and the
application ID defined above to establish a connection. If binding from a Linux guest
AF_VSOCK uses the svm_cid and the svm_port .
Since Hyper-V sockets do not depend on a networking stack, TCP/IP, DNS, etc. the
socket endpoint needed a non-IP, not hostname, format that still unambiguously
describes the connection.
// Windows
struct SOCKADDR_HV
ADDRESS_FAMILY Family;
USHORT Reserved;
GUID VmId;
GUID ServiceId;
};
// Linux guest
struct sockaddr_vm {
__kernel_sa_family_t svm_family;
sizeof(sa_family_t) -
sizeof(unsigned short) -
};
PowerShell
Service ID – GUID, described above, with which the application is registered in the
Hyper-V host registry.
There is also a set of VMID wildcards available when a connection isn't to a specific
virtual machine.
VMID Wildcards
HV_GUID_BROADCAST FFFFFFFF-FFFF-
FFFF-FFFF-
FFFFFFFFFFFF
* HV_GUID_PARENT
The parent of a virtual machine is its host. The parent of a container is
the container's host.
Connecting from a container running in a virtual machine will
connect to the VM hosting the container.
Listening on this VmId accepts connection
from:
(Inside containers): Container host.
(Inside VM: Container host/ no container): VM
host.
(Not inside VM: Container host/ no container): Not supported.
Supported socket commands
Socket()
Bind()
Connect()
Send()
Listen()
Accept()
Useful links
Complete WinSock API
There have been two Hyper-V WMI namespaces (or versions of the Hyper-V WMI API).
This document contains references to resources for converting code that talks to our old
WMI namespace to the new one. Initially, this article will serve as a repository for API
information and sample code / scripts that can be used to help port any programs or
scripts that use Hyper-V WMI APIs from the v1 namespace to the v2 namespace.
MSDN Samples
Hyper-V virtual machine migration sample
Changing The MAC Address Of NIC Using The Hyper-V WMI V2 Namespace
You can run 3 or 4 basic virtual machines on a host that has 4GB of RAM, though you'll
need more resources for more virtual machines. On the other end of the spectrum, you
can also create large virtual machines with 32 processors and 512GB RAM, depending
on your physical hardware.
Windows 10 Enterprise
Windows 10 Pro
Windows 10 Education
Windows 10 Home
Windows 10 Mobile
Windows 10 Mobile Enterprise
Hardware Requirements
Although this document does not provide a complete list of Hyper-V compatible
hardware, the following items are necessary:
Final Check
If all OS, hardware and compatibility requirements are met then you will see Hyper-V in
Control Panel: Turn Windows features on or off and it will have 2 options.
1. Hyper-V platform
2. Hyper-V Management Tools
7 Note
If you see Windows Hypervisor Platform instead of Hyper-V in Control Panel: Turn
Windows features on or >off your system may not be compatible for Hyper-V,
then cross check above requirements.
If you run systeminfo on an existing Hyper-V
host, the Hyper-V Requirements section reads:
This article lists the operating system combinations supported in Hyper-V on Windows.
It also serves as an introduction to integration services and other factors in support.
Microsoft has tested these host/guest combinations. Issues with these combinations
may receive attention from Product Support Services.
For issues found in other operating systems that have been certified by the
operating system vendor to run on Hyper-V, support is provided by the vendor.
For issues found in other operating systems, Microsoft submits the issue to the
multi-vendor support community, TSANet .
In order to be supported, all operating systems (guest and host) must be up to date.
Check Windows Update for critical updates.
Windows 8.1 32
Windows 8 32
Windows 10 can run as a guest operating system on Windows 8.1 and Windows
Server 2012 R2 Hyper-V hosts.
SUSE
Oracle Linux
Ubuntu
FreeBSD
For more information, including support information on past versions of Hyper-V, see
Linux and FreeBSD Virtual Machines on Hyper-V.
Hyper-V
Reference
This reference provides cmdlet descriptions and syntax for all Hyper-V-specific cmdlets.
It lists the cmdlets in alphabetical order based on the verb at the beginning of the
cmdlet.
Hyper-V
Add-VMDvdDrive Adds a DVD drive to a virtual machine.
Get-VMNetworkAdapterTeamMapping
Remove-VMNetworkAdapterTeamMapping
Set-VMNetworkAdapterTeamMapping
Integration services (often called integration components), are services that allow the
virtual machine to communicate with the Hyper-V host. Many of these services are
conveniences while others can be quite important to the virtual machine's ability to
function correctly.
This article is a reference for each integration service available in Windows. It will also
act as a starting point for any information related to specific integration services or their
history.
User Guides:
Quick Reference
Name Windows Service Linux Daemon Description Impact
Name Name on VM
when
disabled
Description: Tells the Hyper-V host that the virtual machine has an operating system
installed and that it booted correctly.
The heartbeat service makes it possible to answer basic questions like "did the virtual
machine boot?".
When Hyper-V reports that a virtual machine state is "running" (see the example below),
it means Hyper-V set aside resources for a virtual machine; it does not mean that there
is an operating system installed or functioning. This is where heartbeat becomes useful.
The heartbeat service tells Hyper-V that the operating system inside the virtual machine
has booted.
PowerShell
Description: Allows the Hyper-V host to request that the virtual machine shutdown. The
host can always force the virtual machine to turn off, but that is like flipping the power
switch as opposed to selecting shutdown.
Description: Synchronizes the virtual machine's system clock with the system clock of
the physical computer.
The data exchange service (sometimes called KVP) shares small amounts of machine
information between virtual machine and the Hyper-V host using key-value pairs (KVP)
through the Windows registry. The same mechanism can also be used to share
customized data between the virtual machine and the host.
Key-value pairs consist of a “key” and a “value”. Both the key and the value are strings,
no other data types are supported. When a key-value pair is created or changed, it is
visible to the guest and the host. The key-value pair information is transferred across the
Hyper-V VMbus and does not require any kind of network connection between the
guest and the Hyper-V host.
The data exchange service is a great tool for preserving information about the virtual
machine -- for interactive data sharing or data transfer, use PowerShell Direct.
User Guides:
Using key-value pairs to share information between the host and guest on Hyper-
V.
Description: Allows Volume Shadow Copy Service to back up applications and data on
the virtual machine.
The Volume Shadow Copy Requestor integration service is required for Volume Shadow
Copy Service (VSS). The Volume Shadow Copy Service (VSS) captures and copies images
for backup on running systems, particularly servers, without unduly degrading the
performance and stability of the services they provide. This integration service makes
that possible by coordinating the virtual machine's workloads with the host's backup
process.
Description: Provides an interface for the Hyper-V host to bidirectionally copy files to or
from the virtual machine.
Impact: When disabled, the host can not copy files to and from the guest using Copy-
VMFile . Read more about the Copy-VMFile cmdlet.
Notes:
Impact: Disabling this service prevents the host from being able to connect to the virtual
machine with PowerShell Direct.
Notes:
PowerShell Direct allows PowerShell management inside a virtual machine from the
Hyper-V host regardless of any network configuration or remote management settings
on either the Hyper-V host or the virtual machine. This makes it easier for Hyper-V
Administrators to automate and script management and configuration tasks.
User Guides:
Partitions do not have access to the physical processor, nor do they handle the
processor interrupts. Instead, they have a virtual view of the processor and run in a
virtual memory address region that is private to each guest partition. The hypervisor
handles the interrupts to the processor, and redirects them to the respective partition.
Hyper-V can also hardware accelerate the address translation between various guest
virtual address spaces by using an Input Output Memory Management Unit (IOMMU)
which operates independent of the memory management hardware used by the CPU.
An IOMMU is used to remap physical memory addresses to the addresses that are used
by the child partitions.
Child partitions also do not have direct access to other hardware resources and are
presented a virtual view of the resources, as virtual devices (VDevs). Requests to the
virtual devices are redirected either via the VMBus or the hypervisor to the devices in the
parent partition, which handles the requests. The VMBus is a logical inter-partition
communication channel. The parent partition hosts Virtualization Service Providers
(VSPs) which communicate over the VMBus to handle device access requests from child
partitions. Child partitions host Virtualization Service Consumers (VSCs) which redirect
device requests to VSPs in the parent partition via the VMBus. This entire process is
transparent to the guest operating system.
Virtual Devices can also take advantage of a Windows Server Virtualization feature,
named Enlightened I/O, for storage, networking, graphics, and input subsystems.
Enlightened I/O is a specialized virtualization-aware implementation of high level
communication protocols (such as SCSI) that utilize the VMBus directly, bypassing any
device emulation layer. This makes the communication more efficient but requires an
enlightened guest that is hypervisor and VMBus aware. Hyper-V enlightened I/O and a
hypervisor aware kernel is provided via installation of Hyper-V integration services.
Integration components, which include virtual server client (VSC) drivers, are also
available for other client operating systems. Hyper-V requires a processor that includes
hardware assisted virtualization, such as is provided with Intel VT or AMD Virtualization
(AMD-V) technology.
Glossary
APIC – Advanced Programmable Interrupt Controller – A device which allows
priority levels to be assigned to its interrupt outputs.
Child Partition – Partition that hosts a guest operating system - All access to
physical memory and devices by a child partition is provided via the Virtual
Machine Bus (VMBus) or the hypervisor.
Hypercall – Interface for communication with the hypervisor - The hypercall
interface accommodates access to the optimizations provided by the hypervisor.
Hypervisor – A layer of software that sits between the hardware and one or more
operating systems. Its primary job is to provide isolated execution environments
called partitions. The hypervisor controls and arbitrates access to the underlying
hardware.
IC – Integration component – Component that allows child partitions to
communication with other partitions and the hypervisor.
I/O stack – Input/output stack
MSR – Memory Service Routine
Root Partition – Sometimes called parent partition. Manages machine-level
functions such as device drivers, power management, and device hot
addition/removal. The root (or parent) partition is the only partition that has direct
access to physical memory and devices.
VID – Virtualization Infrastructure Driver – Provides partition management services,
virtual processor management services, and memory management services for
partitions.
VMBus – Channel-based communication mechanism used for inter-partition
communication and device enumeration on systems with multiple active virtualized
partitions. The VMBus is installed with Hyper-V Integration Services.
VMMS – Virtual Machine Management Service – Responsible for managing the
state of all virtual machines in child partitions.
VMWP – Virtual Machine Worker Process – A user mode component of the
virtualization stack. The worker process provides virtual machine management
services from the Windows Server 2008 instance in the parent partition to the
guest operating systems in the child partitions. The Virtual Machine Management
Service spawns a separate worker process for each running virtual machine.
VSC – Virtualization Service Client – A synthetic device instance that resides in a
child partition. VSCs utilize hardware resources that are provided by Virtualization
Service Providers (VSPs) in the parent partition. They communicate with the
corresponding VSPs in the parent partition over the VMBus to satisfy a child
partitions device I/O requests.
VSP – Virtualization Service Provider – Resides in the root partition and provide
synthetic device support to child partitions over the Virtual Machine Bus (VMBus).
WinHv – Windows Hypervisor Interface Library - WinHv is essentially a bridge
between a partitioned operating system’s drivers and the hypervisor which allows
drivers to call the hypervisor using standard Windows calling conventions
WMI – The Virtual Machine Management Service exposes a set of Windows
Management Instrumentation (WMI)-based APIs for managing and controlling
virtual machines.
Hypervisor Top Level Functional
Specification
Article • 05/20/2022
This specification is provided under the Microsoft Open Specification Promise. Read
the following for further details about the Microsoft Open Specification Promise.
Glossary
Partition - Hyper-V supports isolation in terms of a partition. A partition is a logical
unit of isolation, supported by the hypervisor, in which operating systems execute.
Root Partition - The root partition (a.k.a the"parent" or"host") is a privileged
management partition. The root partition manages machine-level functions such
as device drivers, power management, and device addition/removal. The
virtualization stack runs in the parent partition and has direct access to the
hardware devices. The root partition then creates the child partitions which host
the guest operating systems.
Child Partition - The child partition (a.k.a. the"guest") hosts a guest operating
system. All access to physical memory and devices by a child partition is provided
via the Virtual Machine Bus (VMBus) or the hypervisor.
Hypercall - Hypercalls are an interface for communication with the hypervisor.
Specification Style
The document assumes familiarity with the high-level hypervisor architecture.
This specification is informal; that is, the interfaces are not specified in a formal
language. Nevertheless, it is a goal to be precise. It is also a goal to specify which
behaviors are architectural and which are implementation-specific. Callers should not
rely on behaviors that fall into the latter category because they may change in future
implementations.
Previous Versions
Release Document
Windows Server 2016 (Revision C) Hypervisor Top Level Functional Specification v5.0c.pdf
Windows Server 2012 R2 (Revision B) Hypervisor Top Level Functional Specification v4.0b.pdf
Reserved Values
This specification documents some fields as “reserved.” These fields may be given
specific meaning in future versions of the hypervisor architecture. For maximum forward
compatibility, clients of the hypervisor interface should follow the guidance provided
within this document. In general, two forms of guidance are provided. Preserve value
(documented as RsvdP in diagrams and ReservedP in code segments) – For maximum
forward compatibility, clients should preserve the value within this field. This is typically
done by reading the current value, modifying the values of the non-reserved fields, and
writing the value back. Zero value (documented as RsvdZ in diagrams and ReservedZ in
code segments) – For maximum forward compatibility, clients should zero the value
within this field.
Reserved fields within read-only structures are simply documented as Rsvd in diagrams
and simply as Reserved in code segments. For maximum forward compatibility, the
values within these fields should be ignored. Clients should not assume these values will
always be zero.
System physical addresses (SPAs) define the physical address space of the
underlying hardware as seen by the CPUs. There is only one system physical
address space for the entire machine.
Guest physical addresses (GPAs) define the guest’s view of physical memory. GPAs
can be mapped to underlying SPAs. There is one guest physical address space per
partition.
Guest virtual addresses (GVAs) are used within the guest when it enables address
translation and provides a valid guest page table.
All three of these address spaces are up to 264 bytes in size. The following types are
thus defined:
Many hypervisor interfaces act on pages of memory rather than single bytes. The
minimum page size is architecture-dependent. For x64, it is defined as 4K.
Endianness
The hypervisor interface is designed to be endian-neutral (that is, it should be possible
to port the hypervisor to a big-endian or little-endian system), but some of the data
structures defined later in this specification assume little-endian layout. Such data
structures will need to be amended if and when a big-endian port is attempted.
Guest software interacts with the hypervisor through a variety of mechanisms. Many of
these mirror the traditional mechanisms used by software to interact with the underlying
processor. As such, these mechanisms are architecture-specific. On the x64 architecture,
the following mechanisms are used:
Hypervisor Discovery
Before using any hypervisor interfaces, software should first determine whether it’s
running within a virtualized environment. On x64 platforms that conform to this
specification, this is done by executing the CPUID instruction with an input (EAX) value
of 1. Upon execution, code should check bit 31 of register ECX (the “hypervisor present
bit”). If this bit is set, a hypervisor is present. In a non-virtualized environment, the bit
will be clear.
If the “hypervisor present bit” is set, additional CPUID leafs can be queried for more
information about the conformant hypervisor and its capabilities. Two such leaves are
guaranteed to be available: 0x40000000 and 0x40000001 . Subsequently-numbered leaves
may also be available.
EBX Reserved
ECX Reserved
EDX Reserved
These two leaves allow the guest to query the hypervisor vendor ID and interface
independently. The vendor ID is provided only for informational and diagnostic
purposes. It is recommended that software only base compatibility decisions on the
interface signature reported through leaf 0x40000001 .
EAX The maximum input value for hypervisor CPUID information. On Microsoft hypervisors,
this will be at least 0x40000005 .
Register Information Provided
EBX 0x7263694D—“Micr”
ECX 0x666F736F—“osof”
EAX 0x31237648—“Hv#1”
EBX Reserved
ECX Reserved
EDX Reserved
Hypervisors conforming to the “Hv#1” interface also provide at least the following
leaves.
31-9 Reserved
4 Support for passing hypercall input parameter block via XMM registers is
available
13 DisableHypervisorAvailable
14 ExtendedGvaRangesForFlushVirtualAddressListAvailable
16 Reserved
17 SintPollingModeAvailable
18 HypercallMsrLockAvailable
Register Bits Information Provided
22 Reserved
25- Reserved
24
31- Reserved
27
EAX 0 Recommend using hypercall for address space switches rather than MOV to CR3
instruction.
1 Recommend using hypercall for local TLB flushes rather than INVLPG or MOV to
CR3 instructions.
2 Recommend using hypercall for remote TLB flushes rather than inter-processor
interrupts.
3 Recommend using MSRs for accessing APIC registers EOI, ICR and TPR rather
than their memory-mapped counterparts.
5 Recommend using relaxed timing for this partition. If used, the VM should
disable any watchdog timeouts that rely on the timely delivery of external
interrupts.
8 Reserved.
Register Bits Information Provided
16 Reserved
31- Reserved
19
31- Reserved
7
EDX Reserved
ECX The maximum number of physical interrupt vectors available for interrupt remapping.
EDX Reserved
8 HPET is requested.
15 Reserved
31-25 Reserved
EBX Reserved
ECX Reserved
EDX Reserved
2 AccessSynicRegs
3 Reserved
4 AccessIntrCtrlRegs
5 AccessHypercallMsrs
6 AccessVpIndex
11-7 Reserved
12 AccessReenlightenmentControls
31-13 Reserved
EBX Reserved
ECX Reserved
4 XmmRegistersForFastHypercallAvailable
14-5 Reserved
15 FastHypercallOutputAvailable
16 Reserved
17 SintPollingModeAvailable
31-18 Reserved
16 Reserved
22 Indicates support for the enlightened TLB on AMD platforms. ASID flushes do
not affect TLB entries derived from the NPT. Hypercalls must be used to
invalidate NPT TLB entries. Also indicates support for the
HvFlushGuestPhysicalAddressSpace and HvFlushGuestPhysicalAddressList
hypercalls.
31- Reserved
21
Register Bits Information Provided
31- Reserved
1
ECX Reserved
EDX Reserved
Versioning
The hypervisor version information is encoded in leaf 0x40000002 . Two version numbers
are provided: the main version and the service version.
The main version includes a major and minor version number and a build number. These
correspond to Microsoft Windows release numbers. The service version describes
changes made to the main version.
Clients are strongly encouraged to check for hypervisor features by using CPUID leaves
0x40000003 through 0x40000005 rather than by comparing against version ranges.
Hypercall Interface
Article • 05/02/2022
The hypervisor provides a calling mechanism for guests. Such calls are referred to as
hypercalls. Each hypercall defines a set of input and/or output parameters. These
parameters are specified in terms of a memory-based data structure. All elements of the
input and output data structures are padded to natural boundaries up to 8 bytes (that is,
two-byte elements must be on two-byte boundaries and so on).
A second hypercall calling convention can optionally be used for a subset of hypercalls –
in particular, those that have two or fewer input parameters and no output parameters.
When using this calling convention, the input parameters are passed in general-purpose
registers.
A third hypercall calling convention can optionally be used for a subset of hypercalls
where the input parameter block is up to 112 bytes. When using this calling convention,
the input parameters are passed in registers, including the volatile XMM registers.
Input and output data structures must both be placed in memory on an 8-byte
boundary and padded to a multiple of 8 bytes in size. The values within the padding
regions are ignored by the hypervisor.
For output, the hypervisor is allowed to (but not guaranteed to) overwrite padding
regions. If it overwrites padding regions, it will write zeros.
Hypercall Classes
There are two classes of hypercalls: simple and rep (short for “repeat”). A simple
hypercall performs a single operation and has a fixed-size set of input and output
parameters. A rep hypercall acts like a series of simple hypercalls. In addition to a fixed-
size set of input and output parameters, rep hypercalls involve a list of fixed-size input
and/or output elements.
When a caller initially invokes a rep hypercall, it specifies a rep count that indicates the
number of elements in the input and/or output parameter list. Callers also specify a rep
start index that indicates the next input and/or output element that should be
consumed. The hypervisor processes rep parameters in list order – that is, by increasing
element index.
For subsequent invocations of the rep hypercall, the rep start index indicates how many
elements have been completed – and, in conjunction with the rep count value – how
many elements are left. For example, if a caller specifies a rep count of 25, and only 20
iterations are completed within the time constraints, the hypercall returns control back
to the calling virtual processor after updating the rep start index to 20. When the
hypercall is re-executed, the hypervisor will resume at element 20 and complete the
remaining 5 elements.
Hypercall Continuation
A hypercall can be thought of as a complex instruction that takes many cycles. The
hypervisor attempts to limit hypercall execution to 50μs or less before returning control
to the virtual processor that invoked the hypercall. Some hypercall operations are
sufficiently complex that a 50μs guarantee is difficult to make. The hypervisor therefore
relies on a hypercall continuation mechanism for some hypercalls – including all rep
hypercall forms.
Most simple hypercalls are guaranteed to complete within the prescribed time limit.
However, a small number of simple hypercalls might require more time. These hypercalls
use hypercall continuation in a similar manner to rep hypercalls. In such cases, the
operation involves two or more internal states. The first invocation places the object (for
example, the partition or virtual processor) into one state, and after repeated
invocations, the state finally transitions to a terminal state. For each hypercall that
follows this pattern, the visible side effects of intermediate internal states is described.
Simple hypercalls that use hypercall continuation may involve multiple internal states
that are externally visible. Such calls comprise multiple atomic operations.
Each hypercall action may read input parameters and/or write results. The inputs to each
action can be read at any granularity and at any time after the hypercall is made and
before the action is executed. The results (that is, the output parameters) associated
with each action may be written at any granularity and at any time after the action is
executed and before the hypercall returns.
The guest must avoid the examination and/or manipulation of any input or output
parameters related to an executing hypercall. While a virtual processor executing a
hypercall will be incapable of doing so (as its guest execution is suspended until the
hypercall returns), there is nothing to prevent other virtual processors from doing so.
Guests behaving in this manner may crash or cause corruption within their partition.
Alignment Requirements
Callers must specify the 64-bit guest physical address (GPA) of the input and/or output
parameters. GPA pointers must by 8-byte aligned. If the hypercall involves no input or
output parameters, the hypervisor ignores the corresponding GPA pointer.
The input and output parameter lists cannot overlap or cross page boundaries.
Hypercall input and output pages are expected to be GPA pages and not “overlay”
pages. If the virtual processor writes the input parameters to an overlay page and
specifies a GPA within this page, hypervisor access to the input parameter list is
undefined.
The hypervisor will validate that the calling partition can read from the input page
before executing the requested hypercall. This validation consists of two checks: the
specified GPA is mapped and the GPA is marked readable. If either of these tests fails,
the hypervisor generates a memory intercept message. For hypercalls that have output
parameters, the hypervisor will validate that the partition can be write to the output
page. This validation consists of two checks: the specified GPA is mapped and the GPA is
marked writable.
Hypercall Inputs
Callers specify a hypercall by a 64-bit value called a hypercall input value. It is formatted
as follows:
Fast 16 Specifies whether the hypercall uses the register-based calling convention: 0
= memory-based, 1 = register-based
Rep Count 43- Total number of reps (for rep call, must be zero otherwise)
32
Rep Start 59- Starting index (for rep call, must be zero otherwise)
Index 48
For rep hypercalls, the rep count field indicates the total number of reps. The rep start
index indicates the particular repetition relative to the start of the list (zero indicates that
the first element in the list is to be processed). Therefore, the rep count value must
always be greater than the rep start index.
Register mapping for hypercall inputs when the Fast flag is zero:
The hypercall input value is passed in registers along with a GPA that points to the input
and output parameters.
On x64, the register mappings depend on whether the caller is running in 32-bit (x86) or
64-bit (x64) mode. The hypervisor determines the caller’s mode based on the value of
EFER.LMA and CS.L. If both of these flags are set, the caller is assumed to be a 64-bit
caller.
Register mapping for hypercall inputs when the Fast flag is one:
The hypercall input value is passed in registers along with the input parameters.
A variable sized header is similar to a fixed hypercall input (aligned to 8 bytes and sized
to a multiple of 8 bytes). The caller must specify how much data it is providing as input
headers. This size is provided as part of the hypercall input value (see “Variable header
size” in table above).
Since the fixed header size is implicit, instead of supplying the total header size, only the
variable portion is supplied in the input controls:
It is illegal to specify a non-zero variable header size for a hypercall that is not explicitly
documented as accepting variable sized input headers. In such a case the hypercall will
result in a return code of HV_STATUS_INVALID_HYPERCALL_INPUT .
It is possible that for a given invocation of a hypercall that does accept variable sized
input headers that all the header input fits entirely within the fixed size header. In such
cases the variable sized input header is zero-sized and the corresponding bits in the
hypercall input should be set to zero.
In all other regards, hypercalls accepting variable sized input headers are otherwise
similar to fixed size input header hypercalls with regards to calling conventions. It is also
possible for a variable sized header hypercall to additionally support rep semantics. In
such a case the rep elements lie after the header in the usual fashion, except that the
header's total size includes both the fixed and variable portions. All other rules remain
the same, e.g. the first rep element must be 8 byte aligned.
Availability of the XMM fast hypercall interface is indicated via the “Hypervisor Feature
Identification” CPUID Leaf (0x40000003):
Bit 4: support for passing hypercall input via XMM registers is available.
Note that there is a separate flag to indicate support for XMM fast output. Any attempt
to use this interface when the hypervisor does not indicate availability will result in a
#UD fault.
Register Mapping (Input Only)
The hypercall input value is passed in registers along with the input parameters. The
register mappings depend on whether the caller is running in 32-bit (x86) or 64-bit (x64)
mode. The hypervisor determines the caller’s mode based on the value of EFER.LMA and
CS.L. If both of these flags are set, the caller is assumed to be a 64-bit caller. If the input
parameter block is smaller than 112 bytes, any extra bytes in the registers are ignored.
Hypercall Outputs
All hypercalls return a 64-bit value called a hypercall result value. It is formatted as
follows:
For rep hypercalls, the reps complete field is the total number of reps complete and not
relative to the rep start index. For example, if the caller specified a rep start index of 5,
and a rep count of 10, the reps complete field would indicate 10 upon successful
completion.
The hypercall result value is passed back in registers. The register mapping depends on
whether the caller is running in 32-bit (x86) or 64-bit (x64) mode (see above). The
register mapping for hypercall outputs is as follows:
The ability to return output via XMM registers is indicated via the “Hypervisor Feature
Identification” CPUID Leaf (0x40000003):
Bit 15: support for returning hypercall output via XMM registers is available.
Note that there is a separate flag to indicate support for XMM fast input. Any attempt to
use this interface when the hypervisor does not indicate availability will result in a #UD
fault.
Registers that are not being used to pass input parameters can be used to return
output. In other words, if the input parameter block is smaller than 112 bytes (rounded
up to the nearest 16 byte aligned chunk), the remaining registers will return hypercall
output.
Volatile Registers
Hypercalls will only modify the specified register values under the following conditions:
1. RAX (x64) and EDX:EAX (x86) are always overwritten with the hypercall result value
and output parameters, if any.
2. Rep hypercalls will modify RCX (x64) and EDX:EAX (x86) with the new rep start
index.
3. HvCallSetVpRegisters can modify any registers that are supported with that
hypercall.
4. RDX, R8, and XMM0 through XMM5, when used for fast hypercall input, remain
unmodified. However, registers used for fast hypercall output can be modified,
including RDX, R8, and XMM0 through XMM5. Hyper-V will only modify these
registers for fast hypercall output, which is limited to x64.
Hypercall Restrictions
Hypercalls may have restrictions associated with them for them to perform their
intended function. If all restrictions are not met, the hypercall will terminate with an
appropriate error. The following restrictions will be listed, if any apply:
The rep start index is not less than the rep count.
The return code HV_STATUS_SUCCESS indicates that no error condition was detected.
This register’s value is initially zero. A non-zero value must be written to the Guest OS ID
MSR before the hypercall code page can be enabled (see Establishing the Hypercall
Interface). If this register is subsequently zeroed, the hypercall code page will be
disabled.
23:16 Service Indicates the service version (for example, "service pack" number)
Version
47:40 OS ID Indicates the OS variant. Encoding is unique to the vendor. Microsoft operating
systems are encoded as follows: 0=Undefined, 1=MS-DOS®, 2=Windows®
3.x, 3=Windows® 9x, 4=Windows® NT (and derivatives), 5=Windows® CE
62:48 Vendor Indicates the guest OS vendor. A value of 0 is reserved. See list of vendors
ID below.
63 OS Type Indicates the OS types. A value of 0 indicates a proprietary, closed source OS. A
value of 1 indicates an open source OS.
Vendor values are allocated by Microsoft. To request a new vendor, please file an issue
on the GitHub virtualization documentation repository
(https://aka.ms/VirtualizationDocumentationIssuesTLFS ).
Vendor Value
Microsoft 0x0001
HPE 0x0002
LANCOM 0x0200
62:56 OS Type OS type (e.g., Linux, FreeBSD, etc.). See list of known OS types below
OS Type values are allocated by Microsoft. To request a new OS Type, please file an issue
on the GitHub virtualization documentation repository
(https://aka.ms/VirtualizationDocumentationIssuesTLFS ).
OS Type Value
Linux 0x1
FreeBSD 0x2
Xen 0x3
Illumos 0x4
63:12 Hypercall GPFN - Indicates the Guest Physical Page Number of the hypercall Read/write
page
11:2 RsvdP. Bits should be ignored on reads and preserved on writes. Reserved
1 Locked. Indicates if the MSR is immutable. If set, this MSR is locked thereby Read/write
preventing the relocation of the hypercall page. Once set, only a system
reset can clear the bit.
The hypercall page can be placed anywhere within the guest’s GPA space, but must be
page-aligned. If the guest attempts to move the hypercall page beyond the bounds of
the GPA space, a #GP fault will result when the MSR is written.
This MSR is a partition-wide MSR. In other words, it is shared by all virtual processors in
the partition. If one virtual processor successfully writes to the MSR, another virtual
processor will read the same value.
Before the hypercall page is enabled, the guest OS must report its identity by writing its
version signature to a separate MSR (HV_X64_MSR_GUEST_OS_ID). If no guest OS
identity has been specified, attempts to enable the hypercall will fail. The enable bit will
remain zero even if a one is written to it. Furthermore, if the guest OS identity is cleared
to zero after the hypercall page has been enabled, it will become disabled.
The hypercall page appears as an “overlay” to the GPA space; that is, it covers whatever
else is mapped to the GPA range. Its contents are readable and executable by the guest.
Attempts to write to the hypercall page will result in a protection (#GP) exception. After
the hypercall page has been enabled, invoking a hypercall simply involves a call to the
start of the page.
The following is a detailed list of the steps involved in establishing the hypercall page:
1. The guest reads CPUID leaf 1 and determines whether a hypervisor is present by
checking bit 31 of register ECX.
2. The guest reads CPUID leaf 0x40000000 to determine the maximum hypervisor
CPUID leaf (returned in register EAX) and CPUID leaf 0x40000001 to determine the
interface signature (returned in register EAX). It verifies that the maximum leaf
value is at least 0x40000005 and that the interface signature is equal to “Hv#1”.
This signature implies that HV_X64_MSR_GUEST_OS_ID , HV_X64_MSR_HYPERCALL and
HV_X64_MSR_VP_INDEX are implemented.
3. The guest writes its OS identity into the MSR HV_X64_MSR_GUEST_OS_ID if that
register is zero.
4. The guest reads the Hypercall MSR ( HV_X64_MSR_HYPERCALL ).
5. The guest checks the Enable Hypercall Page bit. If it is set, the interface is already
active, and steps 6 and 7 should be omitted.
6. The guest finds a page within its GPA space, preferably one that is not occupied by
RAM, MMIO, and so on. If the page is occupied, the guest should avoid using the
underlying page for other purposes.
7. The guest writes a new value to the Hypercall MSR ( HV_X64_MSR_HYPERCALL ) that
includes the GPA from step 6 and sets the Enable Hypercall Page bit to enable the
interface.
8. The guest creates an executable VA mapping to the hypercall page GPA.
9. The guest consults CPUID leaf 0x40000003 to determine which hypervisor facilities
are available to it.
After the interface has been established, the guest can initiate a
hypercall. To do so, it populates the registers per the hypercall protocol and issues
a CALL to the beginning of the hypercall page. The guest should assume the
hypercall page performs the equivalent of a near return (0xC3) to return to the
caller. As such, the hypercall must be invoked with a valid stack.
A partition can query its privileges through the “Hypervisor Feature Identification”
CPUID Leaf (0x40000003). See HV_PARTITION_PRIVILEGE_MASK for a description of all
privileges.
To determine the guest crash capabilities, guest partitions may read the
HV_X64_MSR_CRASH_CTL register. The supported set of actions and capabilities
supported by the hypervisor is reported.
To invoke a supported hypervisor guest crash action, a guest partition writes to the
HV_X64_MSR_CRASH_CTL register, specifying the desired action. Two variations are
supported: CrashNotify by itself, and CrashMessage in combination with CrashNotify.
For each occurrence of a guest crash, at most a single write to MSR
HV_X64_MSR_CRASH_CTL should be performed, specifying one of the two variations.
CrashMessage This action is used in combination with CrashNotify to specify a crash message to
the hypervisor. When selected, the values of P3 and P4 are treated as the
location and size of the message. HV_X64_MSR_CRASH_P3 is the guest physical
address of the message, and HV_X64_MSR_CRASH_P4 is the length in bytes of
the message (maximum of 4096 bytes).
CrashNotify This action indicates to the hypervisor that the guest partition has completed
writing the desired data into the guest crash parameter MSRs (i.e., P0 thru P4),
and the hypervisor should proceed with logging the contents of these MSRs.
Virtual Processors
Article • 03/17/2023
A special value HV_ANY_VP can be used in certain situations to specify “any virtual
processor”. A value of HV_VP_INDEX_SELF can be used to specify one’s own VP index.
A partition which possesses the AccessGuestIdleMsr privilege may trigger entry into the
virtual processor idle sleep state through a read to the hypervisor-defined MSR
HV_X64_MSR_GUEST_IDLE . The virtual processor will be woken when an interrupt arrives,
A guest specifies the location of the overlay page (in GPA space) by writing to the Virtual
VP Assist MSR (0x40000073). The format of the Virtual VP Assist Page MSR is as follows:
The hypervisor indicates to the guest OS the number of times a spinlock acquisition
should be attempted before indicating an excessive spin situation to the hypervisor. This
count is returned in CPUID leaf 0x40000004. A value of 0 indicates that the guest OS
should not notify the hypervisor about long spinlock acquisition.
Compatibility
The virtual MMU exposed by the hypervisor is generally compatible with the physical
MMU found within an x64 processor. The following guest-observable differences exist:
The CR3.PWT and CR3.PCD bits may not be supported in some hypervisor
implementations. On such implementations, any attempt by the guest to set these
flags through a MOV to CR3 instruction or a task gate switch will be ignored.
Attempts to set these bits programmatically through HvSetVpRegisters or
HvSwitchVirtualAddressSpace may result in an error.
The PWT and PCD bits within a leaf page table entry (for example, a PTE for 4-K
pages and a PDE for large pages) specify the cacheability of the page being
mapped. The PAT, PWT, and PCD bits within non-leaf page table entries indicate
the cacheability of the next page table in the hierarchy. Some hypervisor
implementations may not support these flags. On such implementations, all page
table accesses performed by the hypervisor are done by using write-back cache
attributes. This affects, in particular, accessed and dirty bits written to the page
table entries. If the guest sets the PAT, PWT, or PCD bits within non-leaf page table
entries, an “unsupported feature” message may be generated when a virtual
processor accesses a page that is mapped by that page table.
The CR0.CD (cache disable) bit may not be supported in some hypervisor
implementations. On such implementations, the CR0.CD bit must be set to 0. Any
attempt by the guest to set this flag through a MOV to CR0 instruction will be
ignored. Attempts to set this bit programmatically through HvSetVpRegisters will
result in an error.
The PAT (page address type) MSR is a per-VP register. However, when all the virtual
processors in a partition set the PAT MSR to the same value, the new effect
becomes a partition-wide effect.
For reasons of security and isolation, the INVD instruction will be virtualized to act
like a WBINVD instruction, with some differences. For security purposes, CLFLUSH
should be used instead.
The INVLPG instruction invalidates the translation for a single page from the
processor’s TLB. If the specified virtual address was originally mapped as a 4-K
page, the translation for this page is removed from the TLB. If the specified virtual
address was originally mapped as a “large page” (either 2 MB or 4 MB, depending
on the MMU mode), the translation for the entire large page is removed from the
TLB. The INVLPG instruction flushes both global and non-global translations.
Global translations are defined as those which have the “global” bit set within the
page table entry.
The MOV to CR3 instruction and task switches that modify CR3 invalidate
translations for all non-global pages within the processor’s TLB.
A MOV to CR4 instruction that modifies the CR4.PGE (global page enable) bit, the
CR4.PSE (page size extensions) bit, or CR4.PAE (page address extensions) bit
invalidates all translations (global and non-global) within the processor’s TLB.
Note that all of these invalidation operations affect only one processor. To invalidate
translations on other processors, software must use a software-based “TLB shoot-down”
mechanism (typically implemented by using inter-process interrupts).
On some systems (those with sufficient virtualization support in hardware), the legacy
TLB management instructions may be faster for local or remote (cross-processor) TLB
invalidation. Guests who are interested in optimal performance should use the CPUID
leaf 0x40000004 to determine which behaviors to implement using hypercalls:
When a virtual processor accesses a page through its GVA space, the hypervisor honors
the cache attribute bits (PAT, PWT, and PCD) within the guest page table entry used to
map the page. These three bits are used as an index into the partition’s PAT (page
address type) register to look up the final cacheability setting for the page.
Pages accessed directly through the GPA space (for example, when paging is disabled
because CR0.PG is cleared) use a cacheability defined by the MTRRs. If the hypervisor
implementation doesn’t support virtual MTRRs, WB cacheability is assumed.
Mixing Cache Types between a Partition and the
Hypervisor
Guests should be aware that some pages within its GPA space may be accessed by the
hypervisor. The following list, while not exhaustive, provides several examples:
The hypervisor always performs accesses to hypercall parameters and overlay pages by
using the WB cacheability setting.
Virtual Interrupt Controller
Article • 05/26/2021
The hypervisor virtualizes interrupt delivery to virtual processors. This is done through
the use of a synthetic interrupt controller (SynIC) which is an extension of a virtualized
local APIC; that is, each virtual processor has a local APIC instance with the SynIC
extensions. These extensions provide a simple inter-partition communication
mechanism which is described in the following chapter.
Interrupts delivered to a
partition fall into two categories: external and internal. External interrupts originate from
other partitions or devices, and internal interrupts originate from within the partition
itself.
Local APIC
The SynIC is a superset of a local APIC. The interface to this APIC is given by a set of 32-
bit memory mapped registers. This local APIC (including the behavior of the memory
mapped registers) is generally compatible with the local APIC on P4/Xeon systems as
described in Intel’s and AMD's documentation.
The hypervisor’s local APIC virtualization may deviate from physical APIC operation in
the following minor ways:
On physical systems, the IA32_APIC_BASE MSR can be different for each processor
in the system. The hypervisor may require that this MSR contains the same value
for all virtual processors within a partition. As such, this MSR may be treated as a
partition-wide value. If a virtual processor modifies this register, the value may
effectively propagate to all virtual processors within the partition.
The IA32_APIC_BASE MSR defines a “global enable” bit for enabling or disabling
the APIC. The virtualized APIC may always be enabled. If so, this bit will always be
set to 1.
The hypervisor’s local APIC may not be able to generate virtual SMIs (system
management interrupts).
If multiple virtual processors within a partition are assigned identical APIC IDs,
behavior of targeted interrupt delivery is boundedly undefined. That is, the
hypervisor is free to deliver the interrupt to just one virtual processor, all virtual
processors with the specified APIC ID, or no virtual processors. This situation is
considered a guest programming error.
Some of the memory mapped APIC registers may be accessed by way of virtual
MSRs.
The hypervisor may not allow a guest to modify its APIC IDs.
The remaining parts of this section describe only those aspects of SynIC functionality
that are extensions of the local APIC.
HV_X64_MSR_EOI
HV_X64_MSR_ICR
Bits Description Attributes
HV_X64_MSR_TPR
This MSR is intended to accelerate access to the TPR in 32-bit mode guest partitions.
64-bit mode guest partitions should set the TPR by way of CR8.
Hypercall Description
EOI Assist
One field in the Virtual Processor Assist Page is the EOI Assist field. The EOI Assist field
resides at offset 0 of the overlay page and is 32-bits in size. The format of the EOI assist
field is as follows:
The guest OS performs an EOI by atomically writing zero to the EOI Assist field of the
virtual VP assist page and checking whether the “No EOI required” field was previously
zero. If it was, the OS must write to the HV_X64_APIC_EOI MSR thereby triggering an
intercept into the hypervisor. The following code is recommended to perform an EOI:
btr [rcx], 0
jc NoEoiRequired
wrmsr
NoEoiRequired:
The hypervisor sets the “No EOI required” bit when it injects a virtual interrupt if the
following conditions are satisfied:
If, at a later time, a lower priority interrupt is requested, the hypervisor clears the “No
EOI required” such that a subsequent EOI causes an intercept.
In case of nested interrupts, the EOI intercept is avoided only for the highest priority
interrupt. This is necessary since no count is maintained for the number of EOIs
performed by the OS. Therefore only the first EOI can be avoided and since the first EOI
clears the “No EOI Required” bit, the next EOI generates an intercept. However nested
interrupts are rare, so this is not a problem in the common case.
Note that devices and/or the I/O APIC (physical or synthetic) need not be notified of an
EOI for an edge-triggered interrupt – the hypervisor intercepts such EOIs only to update
the virtual APIC state. In some cases, the virtual APIC state can be lazily updated – in
such cases, the “NoEoiRequired” bit is set by the hypervisor indicating to the guest that
an EOI intercept is not necessary. At a later instant, the hypervisor can derive the state of
the local APIC depending on the current value of the “NoEoiRequired” bit.
Enabling and disabling this enlightenment can be done at any time independently of the
interrupt activity and the APIC state at that moment. While the enlightenment is
enabled, conventional EOIs can still be performed irrespective of the “No EOI required”
value but they will not realize the performance benefit of the enlightenment.
Inter-Partition Communication
Article • 05/26/2021
The hypervisor provides two simple mechanisms for one partition to communicate with
another: messages and events. In both cases, notification is signaled by using the SynIC
(synthetic interrupt controller).
SynIC Messages
The hypervisor provides a simple inter-partition communication facility that allows one
partition to send a parameterized message to another partition. (Because the message is
sent asynchronously, it is said to be posted.) The destination partition may be notified of
the arrival of this message through an interrupt. Messages may be sent explicitly using
the HvCallPostMessage hypercall or implicitly by the hypervisor.
Messages
When a message is sent, the hypervisor selects a free message buffer. The set of
available message buffers depends on the event that triggered the sending of the
message.
The hypervisor marks the message buffer “in use” and fills in the message header with
the message type, payload size, and information about the sender. Finally, it fills in the
message payload. The contents of the payload depend on the event that triggered the
message.
The hypervisor then appends the message buffer to a receiving message queue. The
receiving message queue depends on the event that triggered the sending of the
message. For all message types, SINTx is either implicit (in the case of intercept
messages), explicit (in the case of timer messages) or specified by a port ID (in the case
of guest messages). The target virtual processor is either explicitly specified or chosen
by the hypervisor when the message is enqueued. Virtual processors whose SynIC or
SIM page is disabled will not be considered as potential targets. If no targets are
available, the hypervisor terminates the operation and returns an error to the caller.
The hypervisor then determines whether the specified SINTx message slot within the
SIM page for the target virtual processor is empty. If the message type in the message
slot is equal to HvMessageTypeNone (that is, zero), the message slot is assumed to be
empty. In this case, the hypervisor dequeues the message buffer and copies its contents
to the message slot within the SIM page. The hypervisor may copy only the number of
payload bytes associated with the message. The hypervisor also attempts to generate an
edge-triggered interrupt for the specified SINTx. If the APIC is software disabled or the
SINTx is masked, the interrupt is lost. The arrival of this interrupt notifies the guest that a
new message has arrived. If the SIM page is disabled or the message slot within the SIM
page is not empty, the message remains queued, and no interrupt is generated.
As with any fixed-priority interrupt, the interrupt is not acknowledged by the virtual
processor until the PPR (process priority register) is less than the vector specified in the
SINTx register and interrupts are not masked by the virtual processor (rFLAGS[IF] is set
to 1).
Multiple message buffers with the same SINTx can be queued to a virtual processor. In
this case, the hypervisor will deliver the first message (that is, write it to the SIM page)
and leave the others queued until one of three events occur:
In all three cases, the hypervisor will scan one or more message buffer queues and
attempt to deliver additional messages. The hypervisor also attempts to generate an
edge-triggered interrupt, indicating that a new message has arrived.
SIM Page
The SIM page consists of a 16-element array of 256-byte messages (see HV_MESSAGE
data structure). Each array element (also known as a message slot) corresponds to a
single synthetic interrupt source (SINTx). A message slot is said to be “empty” if the
message type of the message in the slot is equal to HvMessageTypeNone.
The address for the SIM page is specified in the SIMP register. The address of the SIM
page should be unique for each virtual processor. Programming these pages to overlap
other instances of the SIEF or SIM pages or any other overlay page (for example, the
hypercall page) will result in undefined behavior.
Read and write accesses by a virtual processor to the SIM page behave like read and
write accesses to RAM. However, the hypervisor’s SynIC implementation also writes to
the pages in response to certain events.
Upon virtual processor creation and reset, the SIM page is cleared to zero.
Examine the message that was deposited into the SIM message slot.
Copy the contents of the message to another location and set the message type
within the message slot to HvMessageTypeNone.
Indicate the end of interrupt for the vector by writing to the APIC’s EOI register.
Perform any actions implied by the message.
Message Sources
The classes of events that can trigger the sending of a message are as follows:
Message Buffers
A message buffer is used internally to the hypervisor to store a message until it is
delivered to the recipient. The hypervisor maintains several sets of message buffers.
Messages successfully posted by a guest have been queued for delivery by the
hypervisor. Actual delivery and reception by the target partition is dependent upon its
correct operation. Partitions may disable delivery of messages to particular virtual
processors by either disabling its SynIC or disabling the SIMP.
Breaking a connection will not affect undelivered (queued) messages. Deletion of the
target port will always free all of the port’s message buffers, whether they are available
or contain undelivered (queued) messages.
Messages arrive in the order in which they have been successfully posted. If the
receiving port is associated with a specific virtual processor, then messages will arrive in
the same order in which they were posted. If the receiving port is associated with
HV_ANY_VP, then messages are not guaranteed to arrive in any particular order.
As with any fixed-priority external interrupt, the interrupt is not acknowledged by the
virtual processor until the process priority register (PPR) is less than the vector specified
in the SINTx register and interrupts are not masked by the virtual processor (rFLAGS[IF]
is set to 1).
SIEF Page
The SIEF page consists of a 16-element array of 256-byte event flags (see
HV_SYNIC_EVENT_FLAGS). Each array element corresponds to a single synthetic
interrupt source (SINTx).
The address for the SIEF page is specified in the SIEF register. The address of the SIEF
page should be unique for each virtual processor. Programming these pages to overlap
other instances of the SIEF or SIM pages or any other overlay page (for example, the
hypercall page) will result in undefined behavior.
Read and write accesses by a virtual processor to the SIEF page behave like read and
write accesses to RAM. However, the hypervisor’s SynIC implementation also writes to
the pages in response to certain events.
Upon virtual processor creation and reset, the SIEF page is cleared to zero.
Examine the event flags and determine which ones, if any, are set.
Clear one or more event flags by using a locked (atomic) operation such as LOCK
AND or LOCK CMPXCHG.
Indicate the end of interrupt for the vector by writing to the APIC’s EOI register.
Perform any actions implied by the event flags that were set.
Connections are allocated from the sender’s memory pool. When a connection is
created, it must be associated with a valid port. This binding creates a simple, one-way
communication channel. If a port is subsequently deleted, its connection, while it
remains, becomes useless.
SynIC MSRs
In addition to the memory-mapped registers defined for a local APIC, the following
model-specific registers (MSRs) are defined in the SynIC. Each virtual processor has its
own copy of these registers, so they can be programmed independently.
SCONTROL Register
This register is used to control SynIC behavior of the virtual processor.
At virtual processor creation time and upon processor reset, the value of this SCONTROL
(SynIC control register) is 0x0000000000000000. Thus, message queuing and event flag
notifications will be disabled.
0 Enable When set, this virtual processor will allow message queuing and Read /
event flag notifications to be posted to its SynIC. When clear, write
message queuing and event flag notifications cannot be directed to
this virtual processor.
SVERSION Register
This is a read-only register, and it returns the version number of the SynIC. Attempts to
write to this register result in a #GP fault.
SIEFP Register
At virtual processor creation time and upon processor reset, the value of this SIEFP
(synthetic interrupt event flags page) register is 0x0000000000000000. Thus, the SIEFP is
disabled by default. The guest must enable it by setting bit 0. If the specified base
address is beyond the end of the partition’s GPA space, the SIEFP page will not be
accessible to the guest. When modifying the register, guests should preserve the value
of the reserved bits (1 through 11) for future compatibility.
63:12 Base Base address (in GPA space) of SIEFP (low 12 bits assumed to Read /
Address be disabled) Write
SIMP Register
At virtual processor creation time and upon processor reset, the value of this SIMP
(synthetic interrupt message page) register is 0x0000000000000000. Thus, the SIMP is
disabled by default. The guest must enable it by setting bit 0. If the specified base
address is beyond the end of the partition’s GPA space, the SIMP page will not be
accessible to the guest. When modifying the register, guests should preserve the value
of the reserved bits (1 through 11) for future compatibility.
63:12 Base Base address (in GPA space) of SIMP (low 12 bits assumed to Read /
Address be disabled) Write
SINTx Registers
At virtual processor creation time, the default value of all SINTx (synthetic interrupt
source) registers is 0x0000000000010000. Thus, all synthetic interrupt sources are
masked by default. The guest must unmask them by programming an appropriate
vector and clearing bit 16.
Setting the polling bit will have the effect of unmasking an interrupt source, except that
an actual interrupt is not generated.
The AutoEOI flag indicates that an implicit EOI should be performed by the hypervisor
when an interrupt is delivered to the virtual processor. In addition, the hypervisor will
automatically clear the corresponding flag in the “in-service register” (ISR) of the virtual
APIC. If the guest enables this behavior, then it must not perform an EOI in its interrupt
service routine.
The AutoEOI flag can be turned on at any time, though the guest must
perform an explicit EOI on an in-flight interrupt The timing consideration makes it
difficult to know whether a particular interrupt needs EOI or not, so it is recommended
that once SINT is unmasked, its settings are not changed. Likewise, the AutoEOI flag can
be turned off at any time, though the same concerns about in-flight interrupts apply
Valid values for vector are 16-255 inclusive. Specifying an invalid vector number results
in #GP.
EOM Register
A write to the end of message (EOM) register by the guest causes the hypervisor to scan
the internal message buffer queue(s) associated with the virtual processor. If a message
buffer queue contains a queued message buffer, the hypervisor attempts to deliver the
message. Message delivery succeeds if the SIM page is enabled and the message slot
corresponding to the SINTx is empty (that is, the message type in the header is set to
HvMessageTypeNone). If a message is successfully delivered, its corresponding internal
message buffer is dequeued and marked free. If the corresponding SINTx is not masked,
an edge-triggered interrupt is delivered (that is, the corresponding bit in the IRR is set).
This register can be used by guests to “poll” for messages. It can also be used as a way
to drain the message queue for a SINTx that has been disabled (that is, masked).
If the message queues are all empty, a write to the EOM register is a no-op.
The hypervisor provides simple timing services. These are based on a constant-rate
reference time source (typically the ACPI timer on x64 systems).
Reference Counter
The hypervisor maintains a per-partition reference time counter. It has the characteristic
that successive accesses to it return strictly monotonically increasing (time) values as
seen by any and all virtual processors of a partition. Furthermore, the reference counter
is rate constant and unaffected by processor or bus speed transitions or deep processor
power savings states. A partition’s reference time counter is initialized to zero when the
partition is created. The reference counter for all partitions count at the same rate, but at
any time, their absolute values will typically differ because partitions will have different
creation times.
The reference counter continues to count up as long as at least one virtual processor is
not explicitly suspended.
The partition reference time enlightenment uses a virtual TSC value, an offset and a
multiplier to enable a guest partition to compute the normalized reference time since
partition creation, in 100nS units. The mechanism also allows a guest partition to
atomically compute the reference time when the guest partition is migrated to a
platform with a different TSC rate, and provides a fallback mechanism to support
migration to platforms without the constant rate TSC feature.
This facility is not intended to be used a source of wall clock time, since the reference
time computed using this facility will appear to stop during the time that a guest
partition is saved until the subsequent restore.
The hypervisor provides a partition-wide virtual reference TSC page which is overlaid on
the partition’s GPA space. A partition’s reference time stamp counter page is accessed
through the Reference TSC MSR.
typedef struct
UINT32 Reserved1;
UINT64 Reserved2[509];
} HV_REFERENCE_TSC_PAGE;
At the guest partition creation time, the value of the reference TSC MSR is
0x0000000000000000. Thus, the reference TSC page is disabled by default. The guest
must enable the reference TSC page by setting bit 0. If the specified base address is
beyond the end of the partition’s GPA space, the reference TSC page will not be
accessible to the guest. When modifying the register, guests should preserve the value
of the reserved bits (1 through 11) for future compatibility.
The multiplication is a 64 bit multiplication, which results in a 128 bit number which is
then shifted 64 times to the right to obtain the high 64 bits.
The TscScale value is used to adjust the Virtual TSC value across migration events to
mitigate TSC frequency changes from one platform to another.
The TscSequence value is used to synchronize access to the enlightened reference time
if the scale and/or the offset fields are changed during save/restore or live migration.
This field serves as a sequence number which is incremented whenever the scale and/or
the offset fields are modified. A special value of 0x0 is used to indicate that this facility is
no longer a reliable source of reference time and the VM must fall back to a different
source.
The recommended code for computing the partition reference time using this
enlightenment is shown below:
C
do
StartSequence = ReferenceTscPage->TscSequence;
if (StartSequence == 0)
// the moment, and the Reference Time can only be obtained from
ReferenceTime = rdmsr(HV_X64_MSR_TIME_REF_COUNT);
return ReferenceTime;
Tsc = rdtsc();
Scale = ReferenceTscPage->TscScale;
Offset = ReferenceTscPage->TscOffset;
EndSequence = ReferenceTscPage->TscSequence;
return ReferenceTime;
Synthetic Timers
Synthetic timers provide a mechanism for generating an interrupt after some specified
time in the future. Both one-shot and periodic timers are supported. A synthetic timer
sends a message to a specified SynIC SINTx (synthetic interrupt source) upon expiration,
or asserts an interrupt, depending on how it is configured.
The hypervisor guarantees that a timer expiration signal will never be delivered before
the expiration time. The signal may arrive any time after the expiration time.
Periodic Timers
The hypervisor attempts to signal periodic timers on a regular basis. However, if the
virtual processor used to signal the expiration is not available, some of the timer
expirations may be delayed. A virtual processor may be unavailable because it is
suspended (for example, during intercept handling) or because the hypervisor’s
scheduler decided that the virtual processor should not be scheduled on a logical
processor (for example, because another virtual processor is using the logical processor
or the virtual processor has exceeded its quota).
If a virtual processor is unavailable for a sufficiently long period of time, a full timer
period may be missed. In this case, the hypervisor uses one of two techniques.
The first technique involves timer period modulation, in effect shortening the period
until the timer “catches up”. If a significant number of timer signals have been missed,
the hypervisor may be unable to compensate by using period modulation. In this case,
some timer expiration signals may be skipped completely.
For timers that are marked as lazy, the hypervisor uses a second technique for dealing
with the situation in which a virtual processor is unavailable for a long period of time. In
this case, the timer signal is deferred until this virtual processor is available. If it doesn’t
become available until shortly before the next timer is due to expire, it is skipped
entirely.
11:4 ApicVector - Controls the asserted interrupt vector in direct mode Read /
Write
3 AutoEnable - Set if writing the corresponding counter implicitly causes the Read /
timer to be enabled Write
If AutoEnable is set, then writing a non-zero value to the corresponding count register
will cause Enable to be set and activate the counter. Otherwise, Enable should be set
after writing the corresponding count register in order to activate the counter. For
information about the Count register, see the following section.
If a one-shot is enabled and the specified count is in the past, it will expire immediately.
It is not permitted to set the SINTx field to zero for an enabled timer (that is not in direct
mode). If attempted, the timer will be marked disabled (that is, bit 0 cleared)
immediately.
Writing the configuration register of a timer that is already enabled may result in
undefined behavior. For example, merely changing a timer from one-shot to periodic
may not produce what is intended. Timers should always be disabled prior to changing
any other properties.
63:0 Count—expiration time for one-shot timers, duration for periodic timers Read / Write
The value programmed into the Count register is a time value measured in 100
nanosecond units. Writing the value zero to the Count register will stop the counter,
thereby disabling the timer, independent of the setting of AutoEnable in the
configuration register.
Note that the Count register is permitted to wrap. Wrapping will have no effect on the
behavior of the timer, regardless of any timer property.
For one-shot timers, it represents the absolute timer expiration time. The timer expires
when the reference counter for the partition is equal to or greater than the specified
count value.
For periodic timers, the count represents the period of the timer. The first period begins
when the synthetic timer is enabled.
Unlike regular synthetic timers that accumulate time when the guest has halted (ie: gone
idle), the Synthetic Time-Unhalted Timer accumulates time only while the guest is not
halted.
Virtual Secure Mode (VSM) is a set of hypervisor capabilities and enlightenments offered
to host and guest partitions which enables the creation and management of new
security boundaries within operating system software. VSM is the hypervisor facility on
which Windows security features including Device Guard, Credential Guard, virtual TPMs
and shielded VMs are based. These security features were introduced in Windows 10
and Windows Server 2016.
VSM enables operating system software in the root and guest partitions to create
isolated regions of memory for storage and processing of system security assets. Access
to these isolated regions is controlled and granted solely through the hypervisor, which
is a highly privileged, highly trusted part of the system’s Trusted Compute Base (TCB).
Because the hypervisor runs at a higher privilege level than operating system software
and has exclusive control of key system hardware resources such as memory access
permission controls in the CPU MMU and IOMMU early in system initialization, the
hypervisor can protect these isolated regions from unauthorized access, even from
operating system software (e.g., OS kernel and device drivers) with supervisor mode
access (i.e. CPL0, or “Ring 0”).
With this architecture, even if normal system level software running in supervisor mode
(e.g. kernel, drivers, etc.) is compromised by malicious software, the assets in isolated
regions protected by the hypervisor can remain secured.
Virtual Trust Levels are hierarchical, with higher levels being more privileged than lower
levels. VTL0 is the least privileged level, with VTL1 being more privileged than VTL0,
VTL2 being more privileged than VTL1, etc.
#define HV_NUM_VTLS 2
Each VTL has its own set of memory access protections. These access protections are
managed by the hypervisor in a partition’s physical address space, and thus cannot be
modified by system level software running in the partition.
Since more privileged VTLs can enforce their own memory protections, higher VTLs can
effectively protect areas of memory from lower VTLs. In practice, this allows a lower VTL
to protect isolated memory regions by securing them with a higher VTL. For example,
VTL0 could store a secret in VTL1, at which point only VTL1 could access it. Even if VTL0
is compromised, the secret would be safe.
VTL Protections
There are multiple facets to achieving isolation between VTLs:
Memory Access Protections: Each VTL maintains a set of guest physical memory
access protections. Software running at a particular VTL can only access memory in
accordance with these protections.
Virtual Processor State: Virtual processors maintain separate per-VTL state. For
example, each VTL defines a set of a private VP registers. Software running at a
lower VTL cannot access the higher VTL’s private virtual processor’s register state.
Interrupts: Along with a separate processor state, each VTL also has its own
interrupt subsystem (local APIC). This allows higher VTLs to process interrupts
without risking interference from a lower VTL.
Overlay Pages: Certain overlay pages are maintained per-VTL such that higher VTLs
have reliable access. E.g. there is a separate hypercall overlay page per VTL.
63 Dr6Shared Read
46 DenyLowerVtlStartup Read
Dr6Shared indicates to the guest whether Dr6 is a shared register between the VTLs.
MvecVtlMask indicates to the guest the VTLs for which Mbec can be enabled.
DenyLowerVtlStartup indicates to the guest whether a Vtl can deny a VP reset by a lower
VTL.
HvRegisterVsmPartitionStatus
HvRegisterVsmPartitionStatus is a per-partition read-only register that is shared across
all VTLs. This register provides information about which VTLs have been enabled for the
partition, which VTLs have Mode Based Execution Controls enabled, as well as the
maximum VTL allowed.
typedef union
UINT64 AsUINT64;
struct
UINT64 MaximumVtl : 4;
};
} HV_REGISTER_VSM_PARTITION_STATUS;
HvRegisterVsmVpStatus
HvRegisterVsmVpStatus is a read-only register and is shared across all VTLs. It is a per-
VP register, meaning each virtual processor maintains its own instance. This register
provides information about which VTLs have been enabled, which is active, as well as
the MBEC mode active on a VP.
typedef union
UINT64 AsUINT64;
struct
UINT64 ActiveVtl : 4;
UINT64 ActiveMbecEnabled : 1;
};
} HV_REGISTER_VSM_VP_STATUS;
ActiveVtl is the ID of the VTL context that is currently active on the virtual processor.
EnabledVtlSet is a bitmap of the VTL’s that are enabled on the virtual processor.
VTL Enablement
To begin using a VTL, a lower VTL must initiate the following:
1. Enable the target VTL for the partition. This makes the VTL generally available for
the partition.
2. Enable the target VTL on one or more virtual processors. This makes the VTL
available for a VP, and sets its initial context. It is recommended that all VPs have
the same enabled VTLs. Having a VTL enabled on some VPs (but not all) can lead
to unexpected behavior.
3. Once the VTL is enabled for a partition and VP, it can begin setting access
protections once the EnableVtlProtection flag has been set.
Virtual processors have one “context” per VTL. If a VTL is switched, the VTL's private
state is also switched.
VTL Configuration
Once a VTL has been enabled, its configuration can be changed by a VP running at an
equal or higher VTL.
Partition Configuration
Partition-wide attributes can be configured using the HvRegisterVsmPartitionConfig
register. There is one instance of this register for each VTL (greater than 0) on every
partition.
typedef union
UINT64 AsUINT64;
struct
UINT64 EnableVtlProtection : 1;
UINT64 DefaultVtlProtectionMask : 4;
UINT64 ZeroMemoryOnReset : 1;
UINT64 DenyLowerVtlStartup : 1;
UINT64 ReservedZ : 2;
UINT64 InterceptVpStartup : 1;
} HV_REGISTER_VSM_PARTITION_CONFIG;
Once a VTL has been enabled, the EnableVtlProtection flag must be set before it can
begin applying memory protections.
This flag is write-once, meaning that once it has
been set, it cannot be modified.
A higher VTL can set a different default memory protection policy by specifying
DefaultVtlProtectionMask in HV_REGISTER_VSM_PARTITION_CONFIG. This mask must be
set at the time the VTL is enabled. It cannot be changed once it is set, and is only
cleared by a partition reset.
Bit Description
0 Read
1 Write
DenyLowerVtlStartup
InterceptVpStartup
typedef union
UINT64 AsUINT64;
struct
UINT64 MbecEnabled : 1;
UINT64 TlbLocked : 1;
};
} HV_REGISTER_VSM_VP_SECURE_VTL_CONFIG;
Each VTL (higher than 0) has an instance of this register for every VTL lower than itself.
For example, VTL2 would have two instances of this register – one for VTL1, and a
second for VTL0.
MbecEnabled
This field configures whether MBEC is enabled for the lower VTL.
TlbLocked
This field locks the lower VTL’s TLB. This capability can be used to prevent lower VTLs
from causing TLB invalidations which might interfere with a higher VTL. When this bit is
set, all address space flush requests from the lower VTL are blocked until the lock is
lifted.
To unlock the TLB, the higher VTL can clear this bit. Also, once a VP returns to a lower
VTL, it releases all TLB locks which it holds at the time.
VTL Entry
A VTL is “entered” when a VP switches from a lower VTL to a higher one. This can
happen for the following reasons:
1. VTL call: this is when software explicitly wishes to invoke code in a higher VTL.
2. Secure interrupt: if an interrupt is received for a higher VTL, the VP will enter the
higher VTL.
3. Secure intercept: certain actions will trigger a secure interrupt (accessing certain
MSRs for example).
Once a VTL is entered, it must voluntarily exit. A higher VTL cannot be preempted by a
lower VTL.
VTL Call
A “VTL call” is when a lower VTL initiates an entry into a higher VTL (for example, to
protect a region of memory with the higher VTL) through the HvCallVtlCall hypercall.
VTL calls preserve the state of shared registers across VTL switches. Private registers are
preserved on a per-VTL level. The exception to these restrictions are the registers
required by the VTL call sequence. The following registers are required for a VTL call:
A VTL call can only switch into the next highest VTL. In other words, if there are multiple
VTLs enabled, a call cannot “skip” a VTL.
The following actions result in a #UD exception:
A VTL call initiated from a processor mode which is anything but the most
privileged on the system (architecture specific).
A VTL call from real mode (x86/x64)
A VTL call on a virtual processor where the target VTL is disabled (or has not been
already enabled).
A VTL call with an invalid control input value
VTL Exit
A switch to a lower VTL is known as a “return”. Once a VTL has finished processing, it
can initiate a VTL return in order to switch to a lower VTL. The only way a VTL return can
occur is if a higher VTL voluntarily initiates one. A lower VTL can never preempt a higher
one.
VTL Return
A “VTL return” is when a higher VTL initiates a switch into a lower VTL through the
HvCallVtlReturn hypercall. Similar to a VTL call, private processor state is switched out,
and shared state remains in place. If the lower VTL has explicitly called into the higher
VTL, the hypervisor increments the higher VTL’s instruction pointer before the return is
complete so that it may continue after a VTL call.
A VTL Return code sequence requires the use of the following registers:
63:1 RsvdZ
Fast Return
As a part of processing a return, the hypervisor can restore the lower VTL’s register state
from the HV_VP_VTL_CONTROL structure. For example, after processing a secure
interrupt, a higher VTL may wish to return without disrupting the lower VTL’s state.
Therefore, the hypervisor provides a mechanism to simply restore the lower VTL’s
registers to their pre-call value stored in the VTL control structure.
If this behavior is not necessary, a higher VTL can use a “fast return”. A fast return is
when the hypervisor does not restore register state from the control structure. This
should be utilized whenever possible to avoid unnecessary processing.
This field can be set with bit 0 of the VTL return input. If it is set to 0, the registers are
restored from the HV_VP_VTL_CONTROL structure. If this bit is set to 1, the registers are
not restored (a fast return).
The code sequences to execute VTL calls and returns may be accessed by executing
specific instructions in the hypercall page. The call/return chunks are located at an offset
in the hypercall page determined by the HvRegisterVsmCodePageOffset virtual register.
This is a read-only and partition-wide register, with a separate instance per-VTL.
A VTL can execute a VTL call/return using the CALL instruction. A CALL to the correct
location in the hypercall page will initiate a VTL call/return.
C
typedef union
UINT64 AsUINT64;
struct
};
} HV_REGISTER_VSM_CODE_PAGE_OFFSETS;
To summarize, the steps for calling a code sequence using the hypercall page are as
follows:
Higher VTLs have a high degree of control over the type of memory access permissible
by lower VTLs. There are three basic types of protections that can be specified by a
higher VTL for a particular GPA page: Read, Write, and eXecute. These are defined in the
following table:
Name Description
Execute Controls whether instruction fetches are allowed for a memory page.
1. No access
2. Read-only, no execute
3. Read-only, execute
4. Read/write, no execute
5. Read/write, execute
If “mode based execution control (MBEC)” is enabled, user and kernel mode execute
protections can be set separately.
Higher VTLs can set the memory protection for a GPA through the
HvCallModifyVtlProtectionMask hypercall.
A conformant interface is expected to not overlay any non-RAM type over RAM.
Apart from the traditional three memory protections (read, write, execute), MBEC
introduces a distinction between user-mode and kernel-mode for execute protections.
Thus, if MBEC is enabled, a VTL has the opportunity to set four types of memory
protections:
Name Description
Name Description
User Controls whether instruction fetches generated in user-mode are allowed for a
Mode memory page. NOTE: If MBEC is disabled, this setting is ignored.
Execute
(UMX)
Kernel Controls whether instruction fetches generated in kernel-mode are allowed for a
Mode memory page. NOTE: If MBEC is disabled, this setting controls both user-mode and
Execute kernel-mode execute accesses.
(UMX)
Memory marked with the “User-Mode Execute” protections would only be executable
when the virtual processor is running in user-mode. Likewise, “Kernel-Mode Execute”
memory would only be executable when the virtual processor is running in kernel-mode.
KMX and UMX can be independently set such that execute permissions are enforced
differently between user and kernel mode. All combinations of UMX and KMX are
supported, except for KMX=1, UMX=0. The behavior of this combination is undefined.
MBEC is disabled by default for all VTLs and virtual processors. When MBEC is disabled,
the kernel-mode execute bit determines memory access restriction. Thus, if MBEC is
disabled, KMX=1 code is executable in both kernel and user-mode.
Descriptor Tables
Any user-mode code that accesses descriptor tables must be in GPA pages marked as
KMX=UMX=1. User-mode software accessing descriptor tables from a GPA page
marked KMX=0 is unsupported and results in a general protection fault.
MBEC configuration
To make use of Mode-based execution control, it must be enabled at two levels:
1. When the VTL is enabled for a partition, MBEC must be enabled using
HvCallEnablePartitionVtl
2. MBEC must be configured on a per-VP and per-VTL basis, using
HvRegisterVsmVpSecureVtlConfig.
MBEC Interaction with Supervisor Mode Execution Prevention
(SMEP)
State which is preserved per VTL (a.k.a. private state) is saved by the hypervisor across
VTL transitions. If a VTL switch is initiated, the hypervisor saves the current private state
for the active VTL, and then switches to the private state of the target VTL. Shared state
remains active regardless of VTL switches.
Private State
In general, each VTL has its own control registers, RIP register, RSP register, and MSRs.
Below is a list of specific registers and MSRs which are private to each VTL.
Private MSRs:
Private registers:
RIP, RSP
RFLAGS
CR0, CR3, CR4
DR7
IDTR, GDTR
CS, DS, ES, FS, GS, SS, TR, LDTR
TSC
DR6 (*dependent on processor type. Read HvRegisterVsmCapabilities virtual
register to determine shared/private status)
Shared State
VTLs share state in order to cut down on the overhead of switching contexts. Sharing
state also allows some necessary communication between VTLs. Most general purpose
and floating point registers are shared, as are most architectural MSRs. Below is the list
of specific MSRs and registers that are shared among all VTLs:
Shared MSRs:
HV_X64_MSR_TSC_FREQUENCY
HV_X64_MSR_VP_INDEX
HV_X64_MSR_VP_RUNTIME
HV_X64_MSR_RESET
HV_X64_MSR_TIME_REF_COUNT
HV_X64_MSR_GUEST_IDLE
HV_X64_MSR_DEBUG_DEVICE_OPTIONS
MTRRs
MCG_CAP
MCG_STATUS
Shared registers:
Real Mode
Real mode is not supported for any VTL greater than 0. VTLs greater than 0 can run in
32-bit or 64-bit mode.
Each VTL has its own interrupt controller, which is only active if the virtual processor is
running in that particular VTL. If a virtual processor switches VTL states, the interrupt
controller active on the processor is also switched.
An interrupt targeted at a VTL which is higher than the active VTL will cause an
immediate VTL switch. The higher VTL can then receive the interrupt. If the higher VTL is
unable to receive the interrupt because of its TPR/CR8 value, the interrupt is held as
“pending” and the VTL does not switch. If there are multiple VTLs with pending
interrupts, the highest VTL takes precedence (without notice to the lower VTL).
When an interrupt is targeted at a lower VTL, the interrupt is not delivered until the next
time the virtual processor transitions into the targeted VTL. INIT and startup IPIs
targeted at a lower VTL are dropped on a virtual processor with a higher VTL enabled.
Since INIT/SIPI is blocked, the HvCallStartVirtualProcessor hypercall should be used to
start processors.
RFLAGS.IF
For the purposes of switching VTLs, RFLAGS.IF does not affect whether a secure
interrupt triggers a VTL switch. If RFLAGS.IF is cleared to mask interrupts, interrupts into
higher VTLs will still cause a VTL switch to a higher VTL. Only the higher VTL’s TPR/CR8
value is taken into account when deciding whether to immediately interrupt.
This behavior also affects pending interrupts upon a VTL return. If the RFLAGS.IF bit is
cleared to mask interrupts in a given VTL, and the VTL returns (to a lower VTL), the
hypervisor will reevaluate any pending interrupts. This will cause an immediate call back
to the higher VTL.
typedef union
UINT64 AsUINT64;
struct
UINT64 Vector : 8;
UINT64 Enabled : 1;
UINT64 AutoReset : 1;
UINT64 AutoEoi : 1;
};
} HV_REGISTER_VSM_VINA;
Each VTL on each VP has its own VINA instance, as well as its own version of
HvRegisterVsmVina. The VINA facility will generate an edge triggered interrupt to the
currently active higher VTL when an interrupt for the lower VTL is ready for immediate
delivery.
In order to prevent a flood of interrupts occurring when this facility is enabled, the VINA
facility includes some limited state. When a VINA interrupt is generated, the VINA
facility’s state is changed to “Asserted.” Sending an end-of-interrupt to the SINT
associated with the VINA facility will not clear the “Asserted” state. The asserted state
can only be cleared in one of two ways:
1. The state can manually be cleared by writing to the VinaAsserted field of the
HV_VP_VTL_CONTROL structure.
2. The state is automatically cleared on the next entry to the VTL if the “auto-reset on
VTL entry” option is enabled in the HvRegisterVsmVina register.
This allows code running at a secure VTL to just be notified of the first interrupt that is
received for a lower VTL. If a secure VTL wishes to be notified of additional interrupts, it
can clear the VinaAsserted field of the VP assist page, and it will be notified of the next
new interrupt.
Secure Intercepts
The hypervisor allows a higher VTL to install intercepts for events that take place in the
context of a lower VTL. This gives higher VTLs an elevated level of control over lower-
VTL resources. Secure intercepts can be used to protect system-critical resources, and
prevent attacks from lower-VTLs.
A secure intercept is queued to the higher VTL, and that VTL is made runnable on the
VP.
Control register access Attempting to access a set of control registers specified by a higher VTL.
Nested Intercepts
Multiple VTLs can install secure intercepts for the same event in a lower VTL. Thus, a
hierarchy is established to decide where nested intercepts are notified. The following list
is the order of where intercept are notified:
1. Lower VTL
2. Higher VTL
typedef union
UINT64 AsUINT64;
struct
UINT64 Cr0Write : 1;
UINT64 Cr4Write : 1;
UINT64 XCr0Write : 1;
UINT64 IA32MiscEnableRead : 1;
UINT64 IA32MiscEnableWrite : 1;
UINT64 MsrLstarRead : 1;
UINT64 MsrLstarWrite : 1;
UINT64 MsrStarRead : 1;
UINT64 MsrStarWrite : 1;
UINT64 MsrCstarRead : 1;
UINT64 MsrCstarWrite : 1;
UINT64 ApicBaseMsrRead : 1;
UINT64 ApicBaseMsrWrite : 1;
UINT64 MsrEferRead : 1;
UINT64 MsrEferWrite : 1;
UINT64 GdtrWrite : 1;
UINT64 IdtrWrite : 1;
UINT64 LdtrWrite : 1;
UINT64 TrWrite : 1;
UINT64 MsrSysenterCsWrite : 1;
UINT64 MsrSysenterEipWrite : 1;
UINT64 MsrSysenterEspWrite : 1;
UINT64 MsrSfmaskWrite : 1;
UINT64 MsrTscAuxWrite : 1;
UINT64 MsrSgxLaunchControlWrite : 1;
};
} HV_REGISTER_CR_INTERCEPT_CONTROL;
Mask Registers
To allow for finer control, a subset of control registers also have corresponding mask
registers. Mask registers can be used to install intercepts on a subset of the
corresponding control registers. Where a mask register is not defined, any access (as
defined by HvX64RegisterCrInterceptControl) will trigger an intercept.
This capability is only available to guest partitions. It must be enabled per virtual
machine. Nested virtualization is not supported in a Windows root partition.
Term Definition
L2 Root A root Windows operating system, running within the context of a Hyper-V virtual
machine.
L2 Guest A nested virtual machine, running within the context of a Hyper-V virtual machine.
The hypervisor exposes an “enlightened VMCS” feature which can be used to control
virtualization-related processor behavior using a data structure in guest physical
memory. This data structure can be modified using normal memory access instructions,
thus there is no need for the L1 hypervisor to execute VMREAD or VMWRITE or
VMPTRLD instructions.
After the L1 hypervisor performs a VM entry with an enlightened VMCS, the VMCS is
considered active on the processor. An enlightened VMCS can only be active on a single
processor at the same time. The L1 hypervisor can execute a VMCLEAR instruction to
transition an enlightened VMCS from the active to the non-active state. Any VMREAD or
VMWRITE instructions while an enlightened VMCS is active is unsupported and can
result in unexpected behavior.
Clean Fields
The L0 hypervisor may choose to cache parts of the enlightened VMCS. The enlightened
VMCS clean fields control which parts of the enlightened VMCS are reloaded from guest
memory on a nested VM entry. The L1 hypervisor must clear the corresponding VMCS
clean fields every time it modifies the enlightened VMCS, otherwise the L0 hypervisor
might use a stale version.
The clean fields enlightenment is controlled via the synthetic “CleanFields” field of the
enlightened VMCS. By default, all bits are set such that the L0 hypervisor must reload
the corresponding VMCS fields for each nested VM entry.
Feature Discovery
Support for an enlightened VMCS interface is reported with CPUID leaf 0x40000004.
The enlightened VMCS structure is versioned to account for future changes. Each
enlightened VMCS structure contains a version field, which is reported by the L0
hypervisor.
In cases where architectural feature discovery mechanisms indicate support for a VMCS
field for which no enlightened VMCS field is defined, the L1 hypervisor should not
enable the feature if it chooses to use enlightened VMCS.
The Hyper-V L0 hypervisor will not indicate support for a VMCS field for which no
enlightened VMCS field or exception is defined. If another L0 hypervisor needs a new
enlightened VMCS field or exception to be defined, please contact Microsoft.
The L1 hypervisor may collaborate with the L0 hypervisor to make MSR accesses more
efficient. It can enable enlightened MSR bitmaps by setting the corresponding field in
the enlightened VMCS / VMCB fields to 1. When enabled, the L0 hypervisor does not
monitor the MSR bitmaps for changes. Instead, the L1 hypervisor must invalidate the
corresponding clean field after making changes to one of the MSR bitmaps.
Support for the enlightened MSR bitmap is reported in CPUID leaf 0x4000000A.
typedef union
UINT64 AsUINT64;
struct
};
} HV_REENLIGHTENMENT_CONTROL;
The specified vector must correspond to a fixed APIC interrupt. TargetVp specifies the
virtual processor index.
TSC Emulation
A guest partition may be live migrated between two machines with different TSC
frequencies. In those cases, the TscScale value from the reference TSC page may need to
be recomputed.
The L0 hypervisor optionally emulates all TSC accesses after a migration until the L1
hypervisor has had the opportunity to recompute the TscScale value. The L1 hypervisor
can opt into TSC Emulation by writing to the HV_X64_MSR_TSC_EMULATION_CONTROL
MSR. If opted in, the L0 hypervisor emulates TSC accesses after a migration takes place.
The L1 hypervisor can query if TSC accesses are currently being emulated using the
HV_X64_MSR_TSC_EMULATION_STATUS MSR. For example, the L1 hypervisor could
subscribe to Live Migration notifications and query the TSC status after it receives the
migration interrupt. It can also turn off TSC emulation (after it updates the TscScale
value) using this MSR.
C
typedef union
UINT64 AsUINT64;
struct
};
} HV_TSC_EMULATION_CONTROL;
typedef union
UINT64 AsUINT64;
struct
UINT64 InProgress : 1;
};
} HV_TSC_EMULATION_STATUS;
Virtual TLB
The virtual TLB exposed by the hypervisor may be extended to cache translations from
L2 GPAs to GPAs. As with the TLB on a logical processor, the virtual TLB is a non-
coherent cache, and this non-coherence is visible to guests. The hypervisor exposes
operations to manage the TLB.
When in use, the virtual TLB tags all cached mappings with an identifier of the nested
context (VMCS or VMCB) that created them. In response to a direct virtual flush
hypercall from a L2 guest, the L0 hypervisor invalidates all cached mappings created by
nested contexts where
The VmId is the same as the caller’s VmId
Either the VpId is contained in the specified ProcessorMask or
HV_FLUSH_ALL_PROCESSORS is specified
Configuration
Before enabling it, the L1 hypervisor must configure the following additional fields of
the enlightened VMCS / VMCB:
VpId: ID of the virtual processor that the enlightened VMCS / VMCB controls.
VmId: ID of the virtual machine that the enlightened VMCS / VMCB belongs to.
PartitionAssistPage: Guest physical address of the partition assist page.
The L1 hypervisor must also expose the following capabilities to its guests via CPUID.
UseHypercallForLocalFlush
UseHypercallForRemoteFlush
struct
UINT32 TlbLockCount;
} VM_PARTITION_ASSIST_PAGE;
Synthetic VM-Exit
If the TlbLockCount of the caller’s partition assist page is non-zero, the L0 hypervisor
delivers a VM-Exit with a synthetic exit reason to the L1 hypervisor after handling a
direct virtual flush hypercall.
The L1 hypervisor can decide how and where to use second level address spaces. Each
second level address space is identified by a guest defined 64-bit ID value. On Intel
platforms, this value is the same as the EPT pointer. On AMD platforms, the value equals
the nCR3 VMCB field.
Compatibility
The second level address translation capability exposed by the hypervisor is generally
compatible with VMX or SVM support for address translation. However, the following
guest-observable differences exist:
Internally, the hypervisor may use shadow page tables that translate L2 GPAs to
SPAs. In such implementations, these shadow page tables appear to software as
large TLBs. However, several differences may be observable. First, shadow page
tables can be shared between two virtual processors, whereas traditional TLBs are
per-processor structures and are independent. This sharing may be visible because
a page access by one virtual processor can fill a shadow page table entry that is
subsequently used by another virtual processor.
Some hypervisor implementations may use internal write protection of guest page
tables to lazily flush MMU mappings from internal data structures (for example,
shadow page tables). This is architecturally invisible to the guest because writes to
these tables will be handled transparently by the hypervisor. However, writes
performed to the underlying GPA pages by other partitions or by devices may not
trigger the appropriate TLB flush.
On some hypervisor implementations, a second level page fault might not
invalidate cached mappings.
Hypercall Description
On AMD platforms, all TLB entries are architecturally tagged with an ASID (address
space identifier). Invalidation of the ASID causes all TLB entires associated with the ASID
to be invalidated. The nested hypervisor can optionally opt into an "enlightened TLB" by
setting EnlightenedNptTlb to "1" in HV_SVM_ENLIGHTENED_VMCB_FIELDS. If the nested
hypervisor opts into the enlightenment, ASID invalidations just flush TLB entires derived
from first level address translation (i.e. the virtual address space). To flush TLB entries
derived from the nested page table (NPT) and force the L0 hypervisor to rebuild shadow
page tables, the HvCallFlushGuestPhysicalAddressSpace or
HvCallFlushGuestPhysicalAddressList hypercalls must be used.
To find the index of the underlying processor, callers should first use
HV_X64_MSR_NESTED_VP_INDEX.
Hyper-V allows you to backup virtual machines, from the host operating system, without
the need to run custom backup software inside the virtual machine. There are several
approaches that are available for developers to utilize depending on their needs.
WMI Export
Developers can export the backup data through the Hyper-V WMI interfaces (as used in
the above example). Hyper-V will compile the changes into a virtual hard drive and copy
the file to the requested location. This method is easy to use, works for all scenarios and
is remotable. However, the virtual hard drive generated often creates a large amount of
data to transfer over the network.
Win32 APIs
Developers can use the SetVirtualDiskInformation, GetVirtualDiskInformation and
QueryChangesVirtualDisk APIs on the Virtual Hard Disk Win32 API set as documented
here.
Note that to use these APIs, Hyper-V WMI still needs to be used to create
reference points on associated virtual machines. These Win32 APIs then allow for
efficient access to the data of the backed up virtual machine. The Win32 APIs do have
several limitations: