RVA, part 2
In the previous post I wrote about the RVA and how that helps to get the IL code of a method. In the example written there, I loaded the assembly to the memory to get a method's code. This time I'll explain how to get the code without loading the assembly...
All assemblies are valid PE (Portable Executable) files and the methods' IL code is stored in the .text section of it. Unfortunately, the physical layout of the assembly is different on the disk and in the memory. But if we could translate the RVA address to a physical offset in the file, we could read the code directly from there.
To do this we should get familiar with the PE format which is described in the Partition II Metadata.doc's 23rd (Metadata Physical Layout) and 24th (File Format Extensions to PE) section.
As the document says, an assembly has 4 important parts:
- PE Headers,
- CLI Header,
- CLI Data: metadata, IL method bodies, fix-ups,
- Native Image Sections.
An assembly always starts with a 128 byte long MS-DOS Header. (The first two bytes are always 4D 5A which are the "MZ" letters in ASCII. These are the initials of Mark Zbikowski, an architect of MS-DOS.)
This header contains the so called lfanew value which is a 4 byte unsigned integer - encoded as little-endian -, and can be found at the 0x3c offset. This value tells where does the PE signature start. This signature should be "PE\0\0" followed by the PE File Header immediately. This header is 20 bytes long.
The next section is the PE Optional Header which is 224 bytes long. We can skip both, they don't contain any value which is important for us.
After these headers should come the section headers. Each section header is 40 bytes long. The .text section's header can also be found here. This always starts with the ".text\0\0\0" string (0x2E 0x74 0x65 0x78 0x74 0x00 0x00 0x00) string (the section's name). This part is the most important to get the IL code. The section header contains two very important values. At the 12th offset can be found the Virtual Address of the section. This is an unsigned little-endian integer which shows at which address will this section be loaded in the memory. The other important value is the Pointer To Raw Data which is also an unsigned little-endian integer and can be found at the 20th offset. This tells where does this section start in the file on the disk.
Thus if we want to find a methods IL code we should use the following equation:
address = RVA - Virtual Address + Pointer To Raw Data
Once we have this address we just have to move to it in the file and read the IL code just like in the previous example.
I created a small sample which demonstrates all this.
It contains a method which is able to search for the .text section in the file and store its Pointer To Raw Data and Virtual Address values in a property (to avoid using ref arguments, and I prefer properties anyway :-)). This is the ReadHeader() method below which should be called only once to initialize the properties.
The ReadILCode() method is able to read the IL code of a method which has a Tiny format header.
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;
namespace Blog3
{
public class Program
{
public readonly static Guid IID_IMetaDataImport = new Guid("7DAC8207-D3AE-4c75-9B67-92801A497D44");
private static uint virtualAddress = 0;
private static uint VirtualAddress
{
get
{
return virtualAddress;
}
set
{
virtualAddress = value;
}
}
private static uint pointerToRawData = 0;
private static uint PointerToRawData
{
get
{
return pointerToRawData;
}
set
{
pointerToRawData = value;
}
}
static void Main(string[] args)
{
Console.Write("Please enter the full path of the assembly: ");
//Read the path of the assembly from the console.
string assemblyPath = Console.ReadLine();
Console.Write("Fully qualified name of the class : ");
//Read the name of the class from the console.
string className = Console.ReadLine();
Console.Write("Name of the method: ");
//Read the name of the method from the console.
string methodName = Console.ReadLine();
//Open the assembly with Unmanaged Metadata API.
IMetaDataDispenserEx dispenser = new MetaDataDispenserEx();
IMetaDataImport import = null;
object rawScope = null;
Guid metaDataImportGuid = IID_IMetaDataImport;
dispenser.OpenScope(assemblyPath, 0, ref metaDataImportGuid, out rawScope);
import = (IMetaDataImport)rawScope;
//Search for the desired class.
uint typeDefToken = 0;
import.FindTypeDefByName(className, 0, out typeDefToken);
//Search for the desired method.
uint methodDefToken = 0;
import.FindMethod(typeDefToken, methodName, null, 0, out methodDefToken);
char[] methodDefName = new char[1024];
uint methodDefCount = 0;
uint attributes = 0;
IntPtr signature;
uint signatureCount = 0;
uint rva = 0;
uint implementationFlags = 0;
//Get the properties of the method (including its RVA).
import.GetMethodProps(methodDefToken, out typeDefToken,
methodDefName, Convert.ToUInt32(methodDefName.Length),
out methodDefCount, out attributes, out signature,
out signatureCount, out rva, out implementationFlags);
FileStream fileStream = new FileStream(assemblyPath, FileMode.Open);
BinaryReader binaryReader = new BinaryReader(fileStream);
//Read the header of the file.
ReadHeader(binaryReader);
//Read the methods IL code.
ReadILCode(binaryReader, rva);
binaryReader.Close();
Console.ReadLine();
}
private static void ReadHeader(BinaryReader binaryReader)
{
//Move to the beginning of the assembly.
binaryReader.BaseStream.Position = 0;
//Read the MS-DOS header.
byte[] dosHeader = binaryReader.ReadBytes(128);
//Read the lfanew value.
uint lfanew = BitConverter.ToUInt32(dosHeader, 0x3c);
//Move to the section headers which starts at the following address:
//lfanew + PE Signature length (24 bytes) + PE Optional Header (224 bytes).
binaryReader.BaseStream.Seek(lfanew + 24 + 224, SeekOrigin.Begin);
bool textSectionFound = false;
//Check all the section headers until we find the .text.
do
{
//Read the first 8 bytes from the section header which is the name.
byte[] sectionNameBytes = binaryReader.ReadBytes(8);
string sectionName = UTF8Encoding.UTF8.GetString(sectionNameBytes);
textSectionFound = (sectionName == ".text\0\0\0");
//When we have found the .text section then store the
//Pointer to Raw Data and the Virtual Address values.
if (textSectionFound)
{
binaryReader.ReadBytes(4);
PointerToRawData = binaryReader.ReadUInt32();
binaryReader.ReadBytes(4);
VirtualAddress = binaryReader.ReadUInt32();
}
else
{
//Otherwise skip the rest of the section header and move to the next one.
binaryReader.ReadBytes(32);
}
}
while (textSectionFound);
}
private static void ReadILCode(BinaryReader binaryReader, uint rva)
{
//Move to the beginning of the IL code.
binaryReader.BaseStream.Position = rva - PointerToRawData + VirtualAddress;
//Read the method header.
byte methodHeader = binaryReader.ReadByte();
int methodLength = 0;
//If the header is a Tiny format then read the IL code.
if ((methodHeader & 0x3) == 0x2)
{
methodLength = methodHeader >> 2;
byte[] methodCode = new byte[methodLength];
int methodCodeIndex = 0;
//Read the method's IL code until the end and write it to the console.
while (methodCodeIndex < methodLength)
{
methodCodeIndex++;
Console.Write(string.Format("{0} ", binaryReader.ReadByte().ToString("X").PadLeft(2, '0')));
}
}
}
}
}
Running this program, the output for me looks like this:
C:\Projects\Blog3\bin\Debug>Blog3.exe
Please enter the full path of the assembly: C:\Projects\Blog2\Test
Assembly\bin\Debug\TestAssembly.dll
Fully qualified name of the class : TestAssembly.Class1
Name of the method: Test
00 72 15 00 00 70 28 15 00 00 0A 00 2A
The output is the same like in the previous example, thus it seems to be correct. :-)
Theory
All assemblies are valid PE (Portable Executable) files and the methods' IL code is stored in the .text section of it. Unfortunately, the physical layout of the assembly is different on the disk and in the memory. But if we could translate the RVA address to a physical offset in the file, we could read the code directly from there.
To do this we should get familiar with the PE format which is described in the Partition II Metadata.doc's 23rd (Metadata Physical Layout) and 24th (File Format Extensions to PE) section.
As the document says, an assembly has 4 important parts:
- PE Headers,
- CLI Header,
- CLI Data: metadata, IL method bodies, fix-ups,
- Native Image Sections.
An assembly always starts with a 128 byte long MS-DOS Header. (The first two bytes are always 4D 5A which are the "MZ" letters in ASCII. These are the initials of Mark Zbikowski, an architect of MS-DOS.)
This header contains the so called lfanew value which is a 4 byte unsigned integer - encoded as little-endian -, and can be found at the 0x3c offset. This value tells where does the PE signature start. This signature should be "PE\0\0" followed by the PE File Header immediately. This header is 20 bytes long.
The next section is the PE Optional Header which is 224 bytes long. We can skip both, they don't contain any value which is important for us.
After these headers should come the section headers. Each section header is 40 bytes long. The .text section's header can also be found here. This always starts with the ".text\0\0\0" string (0x2E 0x74 0x65 0x78 0x74 0x00 0x00 0x00) string (the section's name). This part is the most important to get the IL code. The section header contains two very important values. At the 12th offset can be found the Virtual Address of the section. This is an unsigned little-endian integer which shows at which address will this section be loaded in the memory. The other important value is the Pointer To Raw Data which is also an unsigned little-endian integer and can be found at the 20th offset. This tells where does this section start in the file on the disk.
Thus if we want to find a methods IL code we should use the following equation:
address = RVA - Virtual Address + Pointer To Raw Data
Once we have this address we just have to move to it in the file and read the IL code just like in the previous example.
Demonstration
I created a small sample which demonstrates all this.
It contains a method which is able to search for the .text section in the file and store its Pointer To Raw Data and Virtual Address values in a property (to avoid using ref arguments, and I prefer properties anyway :-)). This is the ReadHeader() method below which should be called only once to initialize the properties.
The ReadILCode() method is able to read the IL code of a method which has a Tiny format header.
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;
namespace Blog3
{
public class Program
{
public readonly static Guid IID_IMetaDataImport = new Guid("7DAC8207-D3AE-4c75-9B67-92801A497D44");
private static uint virtualAddress = 0;
private static uint VirtualAddress
{
get
{
return virtualAddress;
}
set
{
virtualAddress = value;
}
}
private static uint pointerToRawData = 0;
private static uint PointerToRawData
{
get
{
return pointerToRawData;
}
set
{
pointerToRawData = value;
}
}
static void Main(string[] args)
{
Console.Write("Please enter the full path of the assembly: ");
//Read the path of the assembly from the console.
string assemblyPath = Console.ReadLine();
Console.Write("Fully qualified name of the class : ");
//Read the name of the class from the console.
string className = Console.ReadLine();
Console.Write("Name of the method: ");
//Read the name of the method from the console.
string methodName = Console.ReadLine();
//Open the assembly with Unmanaged Metadata API.
IMetaDataDispenserEx dispenser = new MetaDataDispenserEx();
IMetaDataImport import = null;
object rawScope = null;
Guid metaDataImportGuid = IID_IMetaDataImport;
dispenser.OpenScope(assemblyPath, 0, ref metaDataImportGuid, out rawScope);
import = (IMetaDataImport)rawScope;
//Search for the desired class.
uint typeDefToken = 0;
import.FindTypeDefByName(className, 0, out typeDefToken);
//Search for the desired method.
uint methodDefToken = 0;
import.FindMethod(typeDefToken, methodName, null, 0, out methodDefToken);
char[] methodDefName = new char[1024];
uint methodDefCount = 0;
uint attributes = 0;
IntPtr signature;
uint signatureCount = 0;
uint rva = 0;
uint implementationFlags = 0;
//Get the properties of the method (including its RVA).
import.GetMethodProps(methodDefToken, out typeDefToken,
methodDefName, Convert.ToUInt32(methodDefName.Length),
out methodDefCount, out attributes, out signature,
out signatureCount, out rva, out implementationFlags);
FileStream fileStream = new FileStream(assemblyPath, FileMode.Open);
BinaryReader binaryReader = new BinaryReader(fileStream);
//Read the header of the file.
ReadHeader(binaryReader);
//Read the methods IL code.
ReadILCode(binaryReader, rva);
binaryReader.Close();
Console.ReadLine();
}
private static void ReadHeader(BinaryReader binaryReader)
{
//Move to the beginning of the assembly.
binaryReader.BaseStream.Position = 0;
//Read the MS-DOS header.
byte[] dosHeader = binaryReader.ReadBytes(128);
//Read the lfanew value.
uint lfanew = BitConverter.ToUInt32(dosHeader, 0x3c);
//Move to the section headers which starts at the following address:
//lfanew + PE Signature length (24 bytes) + PE Optional Header (224 bytes).
binaryReader.BaseStream.Seek(lfanew + 24 + 224, SeekOrigin.Begin);
bool textSectionFound = false;
//Check all the section headers until we find the .text.
do
{
//Read the first 8 bytes from the section header which is the name.
byte[] sectionNameBytes = binaryReader.ReadBytes(8);
string sectionName = UTF8Encoding.UTF8.GetString(sectionNameBytes);
textSectionFound = (sectionName == ".text\0\0\0");
//When we have found the .text section then store the
//Pointer to Raw Data and the Virtual Address values.
if (textSectionFound)
{
binaryReader.ReadBytes(4);
PointerToRawData = binaryReader.ReadUInt32();
binaryReader.ReadBytes(4);
VirtualAddress = binaryReader.ReadUInt32();
}
else
{
//Otherwise skip the rest of the section header and move to the next one.
binaryReader.ReadBytes(32);
}
}
while (textSectionFound);
}
private static void ReadILCode(BinaryReader binaryReader, uint rva)
{
//Move to the beginning of the IL code.
binaryReader.BaseStream.Position = rva - PointerToRawData + VirtualAddress;
//Read the method header.
byte methodHeader = binaryReader.ReadByte();
int methodLength = 0;
//If the header is a Tiny format then read the IL code.
if ((methodHeader & 0x3) == 0x2)
{
methodLength = methodHeader >> 2;
byte[] methodCode = new byte[methodLength];
int methodCodeIndex = 0;
//Read the method's IL code until the end and write it to the console.
while (methodCodeIndex < methodLength)
{
methodCodeIndex++;
Console.Write(string.Format("{0} ", binaryReader.ReadByte().ToString("X").PadLeft(2, '0')));
}
}
}
}
}
Running this program, the output for me looks like this:
C:\Projects\Blog3\bin\Debug>Blog3.exe
Please enter the full path of the assembly: C:\Projects\Blog2\Test
Assembly\bin\Debug\TestAssembly.dll
Fully qualified name of the class : TestAssembly.Class1
Name of the method: Test
00 72 15 00 00 70 28 15 00 00 0A 00 2A
Verification
The output is the same like in the previous example, thus it seems to be correct. :-)