Monday, February 28, 2005

RVA, part 2

In the previous post I wrote about the RVA and how that helps to get the IL code of a method. In the example written there, I loaded the assembly to the memory to get a method's code. This time I'll explain how to get the code without loading the assembly...

Theory


All assemblies are valid PE (Portable Executable) files and the methods' IL code is stored in the .text section of it. Unfortunately, the physical layout of the assembly is different on the disk and in the memory. But if we could translate the RVA address to a physical offset in the file, we could read the code directly from there.

To do this we should get familiar with the PE format which is described in the Partition II Metadata.doc's 23rd (Metadata Physical Layout) and 24th (File Format Extensions to PE) section.

As the document says, an assembly has 4 important parts:
- PE Headers,
- CLI Header,
- CLI Data: metadata, IL method bodies, fix-ups,
- Native Image Sections.

An assembly always starts with a 128 byte long MS-DOS Header. (The first two bytes are always 4D 5A which are the "MZ" letters in ASCII. These are the initials of Mark Zbikowski, an architect of MS-DOS.)
This header contains the so called lfanew value which is a 4 byte unsigned integer - encoded as little-endian -, and can be found at the 0x3c offset. This value tells where does the PE signature start. This signature should be "PE\0\0" followed by the PE File Header immediately. This header is 20 bytes long.
The next section is the PE Optional Header which is 224 bytes long. We can skip both, they don't contain any value which is important for us.

After these headers should come the section headers. Each section header is 40 bytes long. The .text section's header can also be found here. This always starts with the ".text\0\0\0" string (0x2E 0x74 0x65 0x78 0x74 0x00 0x00 0x00) string (the section's name). This part is the most important to get the IL code. The section header contains two very important values. At the 12th offset can be found the Virtual Address of the section. This is an unsigned little-endian integer which shows at which address will this section be loaded in the memory. The other important value is the Pointer To Raw Data which is also an unsigned little-endian integer and can be found at the 20th offset. This tells where does this section start in the file on the disk.
Thus if we want to find a methods IL code we should use the following equation:
address = RVA - Virtual Address + Pointer To Raw Data

Once we have this address we just have to move to it in the file and read the IL code just like in the previous example.

Demonstration


I created a small sample which demonstrates all this.
It contains a method which is able to search for the .text section in the file and store its Pointer To Raw Data and Virtual Address values in a property (to avoid using ref arguments, and I prefer properties anyway :-)). This is the ReadHeader() method below which should be called only once to initialize the properties.
The ReadILCode() method is able to read the IL code of a method which has a Tiny format header.


using System;
using 
System.Collections.Generic;
using 
System.Text;

using 
System.IO;
using 
System.Runtime.InteropServices;

namespace 
Blog3
{
    
public class Program
    {
        
public readonly static Guid IID_IMetaDataImport = new Guid("7DAC8207-D3AE-4c75-9B67-92801A497D44");

        private static uint 
virtualAddress 0;
        private static uint 
VirtualAddress
        {
            
get
            
{
                
return virtualAddress;
            
}

            
set
            
{
                virtualAddress 
= value;
            
}
        }

        
private static uint pointerToRawData 0;
        private static uint 
PointerToRawData
        {
            
get
            
{
                
return pointerToRawData;
            
}

            
set
            
{
                pointerToRawData 
= value;
            
}
        }

        
static void Main(string[] args)
        {
            Console.Write(
"Please enter the full path of the assembly: ");
            
//Read the path of the assembly from the console.
            
string assemblyPath Console.ReadLine();

            
Console.Write("Fully qualified name of the class : ");
            
//Read the name of the class from the console.
            
string className Console.ReadLine();

            
Console.Write("Name of the method: ");
            
//Read the name of the method from the console.
            
string methodName Console.ReadLine();
            
            
//Open the assembly with Unmanaged Metadata API.
            
IMetaDataDispenserEx dispenser = new MetaDataDispenserEx();
            
IMetaDataImport import = null;
            object 
rawScope = null;
            
Guid metaDataImportGuid IID_IMetaDataImport;

            
dispenser.OpenScope(assemblyPath, 0ref metaDataImportGuid, out rawScope);
            
import (IMetaDataImport)rawScope;

            
//Search for the desired class.
            
uint typeDefToken 0;
            
import.FindTypeDefByName(className, 0out typeDefToken);

            
//Search for the desired method.
            
uint methodDefToken 0;
            
import.FindMethod(typeDefToken, methodName, null0out methodDefToken);

            char
[] methodDefName = new char[1024];
            uint 
methodDefCount 0;
            uint 
attributes 0;
            
IntPtr signature;
            uint 
signatureCount 0;
            uint 
rva 0;
            uint 
implementationFlags 0;

            
//Get the properties of the method (including its RVA).
            
import.GetMethodProps(methodDefToken, out typeDefToken,
                methodDefName, Convert.ToUInt32(methodDefName.Length),
                
out methodDefCount, out attributes, out signature,
                
out signatureCount, out rva, out implementationFlags);

            
FileStream fileStream = new FileStream(assemblyPath, FileMode.Open);
            
BinaryReader binaryReader = new BinaryReader(fileStream);

            
//Read the header of the file.
            
ReadHeader(binaryReader);
            
//Read the methods IL code.
            
ReadILCode(binaryReader, rva);

            
binaryReader.Close();

            
Console.ReadLine();
        
}

        
private static void ReadHeader(BinaryReader binaryReader)
        {
            
//Move to the beginning of the assembly.
            
binaryReader.BaseStream.Position 0;
            
//Read the MS-DOS header.
            
byte[] dosHeader binaryReader.ReadBytes(128);

            
//Read the lfanew value.
            
uint lfanew BitConverter.ToUInt32(dosHeader, 0x3c);

            
//Move to the section headers which starts at the following address:
            //lfanew + PE Signature length (24 bytes) + PE Optional Header (224 bytes).
            
binaryReader.BaseStream.Seek(lfanew + 24 224, SeekOrigin.Begin);
            bool 
textSectionFound = false;

            
//Check all the section headers until we find the .text.
            
do
            
{
                
//Read the first 8 bytes from the section header which is the name.
                
byte[] sectionNameBytes binaryReader.ReadBytes(8);
                string 
sectionName UTF8Encoding.UTF8.GetString(sectionNameBytes);
                
textSectionFound (sectionName == ".text\0\0\0");

                
//When we have found the .text section then store the 
                //Pointer to Raw Data and the Virtual Address values.
                
if (textSectionFound)
                {
                    binaryReader.ReadBytes(
4);
                    
PointerToRawData binaryReader.ReadUInt32();
                    
binaryReader.ReadBytes(4);
                    
VirtualAddress binaryReader.ReadUInt32();
                
}
                
else
                
{
                    
//Otherwise skip the rest of the section header and move to the next one.
                    
binaryReader.ReadBytes(32);
                
}
            }
            
while (textSectionFound);
        
}

        
private static void ReadILCode(BinaryReader binaryReader, uint rva)
        {
            
//Move to the beginning of the IL code.
            
binaryReader.BaseStream.Position rva - PointerToRawData + VirtualAddress;
            
//Read the method header.
            
byte methodHeader binaryReader.ReadByte();
            int 
methodLength 0;

            
//If the header is a Tiny format then read the IL code.
            
if ((methodHeader & 0x3) == 0x2)
            {
                methodLength 
methodHeader >> 2;

                byte
[] methodCode = new byte[methodLength];
                int 
methodCodeIndex 0;

                
//Read the method's IL code until the end and write it to the console.
                
while (methodCodeIndex < methodLength)
                {
                    methodCodeIndex++
;
                    
Console.Write(string.Format("{0} ", binaryReader.ReadByte().ToString("X").PadLeft(2'0')));
                
}
            }
        }
    }
}



Running this program, the output for me looks like this:
C:\Projects\Blog3\bin\Debug>Blog3.exe
Please enter the full path of the assembly: C:\Projects\Blog2\Test
Assembly\bin\Debug\TestAssembly.dll
Fully qualified name of the class : TestAssembly.Class1
Name of the method: Test
00 72 15 00 00 70 28 15 00 00 0A 00 2A

Verification


The output is the same like in the previous example, thus it seems to be correct. :-)

Saturday, February 05, 2005

RVA, part 1

When I started to use the Unmanaged Metadata API, I had to search a lot for a method which could give me the IL code just like the MethodBody.GetILAsByteArray() which is new in the .NET Framework 2.0. Well, it was a little bit more difficult than I expected and I needed to do a lot of research to find what I need...

Theory


When the IMetaDataImport.GetMethodProps() method is called it returns an unsigned integer which is the RVA. RVA actually stands for Relative Virtual Address. This value shows where the method will be placed in the memory when an assembly is loaded. This value is relative which means that the RVA has to be added to the assembly's base address to get the real address of the method's body.
When we have this value we can start to read the method's body which always starts with a header (Fat or Tiny) and continues with the IL code.

Demonstration


The following steps are necessary to get the IL code of a method:
1. Load the assembly to the memory (the Unmanaged Metadata API _will not_ load it!).
2. Get the base address of the loaded assembly.
3. Open the assembly using Unmanaged Metadata API (using the IMetaDataImport interface).
4. Get the token of the TypeDef.
5. Get the token of the MethodDef.
6. Call the IMetaDataImport.GetMethodProps() method to get the RVA of the method.
7. Read the first byte which can be found at the RVA + base address.
8. If the method has a tiny header then the read byte will contain the method's length, if it's a fat header then a few more bytes should be read (I'll discuss this in another post later).
9. Read the method's IL code.

I'll give a little sample to demonstrate how this works. :-)
Let's create a dll which contains one class and a few methods. The method should have a tiny header. Here are the conditions to achieve this:
- No local variables are allowed
- No exceptions (no exception handling to be exact)
- No extra data sections
- The operand stack must be no bigger than 8 entries

My sample looks like this:

using System;
using 
System.Collections.Generic;
using 
System.Text;

namespace 
TestAssembly
{
    
public class Class1
    {
        
public Class1()
        {
        }

        
public void Test()
        {
            Console.WriteLine(
"This is the test assembly.");
        
}
    }
}



Now create a program which is able to read from the console an assembly's path, a class' name and a method's name. Then load the given assembly to the memory, read the given method's IL code and write it on the console as hexadecimal numbers.


using System;
using 
System.Collections.Generic;
using 
System.Text;

using 
System.Diagnostics;
using 
System.IO;
using 
System.Reflection;
using 
System.Runtime.InteropServices;

namespace 
Blog2
{
    
public class Program
    {
        
public readonly static Guid IID_IMetaDataImport = new Guid("7DAC8207-D3AE-4c75-9B67-92801A497D44");

        static void 
Main(string[] args)
        {
            Console.Write(
"Please enter the full path of the assembly: ");
            
//Read the path of the assembly from the console.
            
string assemblyPath Console.ReadLine();

            
Console.Write("Fully qualified name of the class : ");
            
//Read the name of the class from the console.
            
string className Console.ReadLine();

            
Console.Write("Name of the method: ");
            
//Read the name of the method from the console.
            
string methodName Console.ReadLine();

            
//Load the assembly to the memory.
            
Assembly assembly Assembly.LoadFrom(assemblyPath);

            
//This will point to the beginning of the assembly in the memory.
            
IntPtr baseAddress = new IntPtr();
            bool 
found = false;
            string 
fileName Path.GetFileNameWithoutExtension(assemblyPath);

            int 
index 0;
            
//Search the loaded process modules for the loaded assembly.
            
ProcessModuleCollection modules Process.GetCurrentProcess().Modules;

            while 
(!found && index < modules.Count)
            {
                ProcessModule module 
modules[index++];

                if 
(module.FileName == assemblyPath)
                {
                    
//If the loaded assembly has been found, store its base address.
                    
baseAddress module.BaseAddress;
                    
found = true;
                
}
            }

            
//Open the assembly with Unmanaged Metadata API.
            
IMetaDataDispenserEx dispenser = new MetaDataDispenserEx();
            
IMetaDataImport import = null;
            object 
rawScope = null;

            
Guid metaDataImportGuid IID_IMetaDataImport;

            
dispenser.OpenScope(assemblyPath, 0ref metaDataImportGuid, out rawScope);
            
import (IMetaDataImport)rawScope;

            
//Search for the desired class.
            
uint typeDefToken 0;
            
import.FindTypeDefByName(className, 0out typeDefToken);

            
//Search for the desired method.
            
uint methodDefToken 0;
            
import.FindMethod(typeDefToken, methodName, null0out methodDefToken);

            char
[] methodDefName = new char[1024];
            uint 
methodDefCount 0;
            uint 
attributes 0;
            
IntPtr signature;
            uint 
signatureCount 0;
            uint 
rva 0;
            uint 
implementationFlags 0;

            
//Get the properties of the method (including its RVA).
            
import.GetMethodProps(methodDefToken, out typeDefToken,
                methodDefName, Convert.ToUInt32(methodDefName.Length),
                
out methodDefCount, out attributes, out signature,
                
out signatureCount, out rva, out implementationFlags);

            int 
methodIndex Convert.ToInt32(rva);
            
//Read the first byte of the method. This will be the header.
            
byte methodHeader Marshal.ReadByte(baseAddress, methodIndex);

            
//If the 2 right-most bits are 10 then this is a tiny header.
            
if ((methodHeader & 0x3) == 0x2)
            {
                
//The method's length is stored in the 6 left-most bits.
                
int methodEnd (methodHeader >> 2) + methodIndex + 1;
                
methodIndex++;

                
//Read the method's IL code until the end and write it to the console.
                
while (methodIndex < methodEnd)
                {
                    Console.Write(
string.Format("{0} ", Marshal.ReadByte(baseAddress, methodIndex++).ToString("X").PadLeft(2'0')));
                
}
            }

            Console.ReadLine()
;
        
}
    }
}



The output for me looks like this:
C:\Projects\Blog2\bin\Debug>Blog2.exe
Please enter the full path of the assembly: c:\Projects\Blog2\TestAssembly\bin\Debug\TestAssembly.dll
Fully qualified name of the class : TestAssembly.Class1
Name of the method: Test
00 72 15 00 00 70 28 15 00 00 0A 00 2A

Verification


Well, all this looks very nice but how do we know that it's really correct?
Use ildasm to verify the output.
Start ildasm, open the TestAssembly.dll, turn on the Show bytes and the Show token values options (both can be found in the View menu) and open the Test method. I get the following:
.method /*06000009*/ public hidebysig instance void
        Test() cil managed
// SIG: 20 00 01
{
  // Method begins at RVA 0x2157
  // Code size 13 (0xd)
  .maxstack 8
  IL_0000: /* 00 | */ nop
  IL_0001: /* 72 | (70)000015 */ ldstr "This is the test assembly." /* 70000015 */
  IL_0006: /* 28 | (0A)000015 */ call void [mscorlib/*23000001*/]System.Console/*01000018*/::WriteLine(string) /* 0A000015 */
  IL_000b: /* 00 | */ nop
  IL_000c: /* 2A | */ ret
} // end of method Class1::Test

So:
00
72 70 00 00 15
28 0A 00 00 15
00
2A

Similar to the output of the sample, except the tokens of course.

P.S.: An observation: if the Assembly.ReflectionOnlyLoadFrom() method is used instead of the Assembly.LoadFrom(), then the assembly really can't be found among the process modules.
P.S. 2.: Thanks for Carlos Aguilar Mares for his Code Colorizer Tool. It's really useful...

Update
I have removed unnecessary line breaks from the code and fixed a typo...