Wednesday, January 19, 2005

Tokens

In my earlier post I explained how can you read the TypeDefs from an assembly. But... what is a TypeDef?

Each definition/reference (e.g.: class definition, method definition, method reference, assembly reference, string, resource, etc.) which is stored in the assembly has an associated token.
A token is a 4-byte "identification number". For example a TypeDef's token might look like this: 02000007. Each token identifies a row in a table. The leftmost byte (02) tells what kind of token is this, in which table can it be found. The remaining 3 bytes are the row index. So, in our case, information about the TypeDef can be found in the 7th row of the TypeDef table.
These tables might also refer to each other. For example, a type's base class is also stored as a token in the TypeDef table and this token might be found in the TypeDef or in the TypeRef table.

The full list of the existing tables can be found in the 21 section of the Partition II Metadata.doc.

Each table has its own layout. The exact definition can be found in the Metadata document also. For example, the TypeDef contains the following fields:

  • Flags (TypeAttributes Flags, for example: public, protected, nestedpublic)

  • TypeName (name of the type)

  • TypeNamespace (namespace of the type)

  • Extends (TypeDef, TypeRef or TypeSpec token of the base class)

  • FieldList (index into the Field table)

  • MethodList (index into the MethodDef table)



So, when in my earlier post I called the EnumTypeDefs() method I simply enumerated over the TypeDef table and with the GetTypeDefProps() method I asked information about a specific row in it.

How can you easily view the tokens of an assembly?
Start ildasm.exe and go to the View/MetaInfo/Show menu or simply press Ctrl-M and all the tokens of the assembly will be shown.

What are tokens good for?
They are used in the IL code as parameters. IL instructions like ldfld, ldflda, stfld and stsfld need a Field token parameter; box, castclass, newobj need a Type token parameter and so on. If you check the IL code of a method using Reflection or a hex editor you'll see that those instructions are always followed by a token. Tokens in the IL code are stored as little endian unsigned integer numbers.

Tokens are also used in signatures but there they're compressed and thus stored differently. But I'll keep this subject for a later post... :-)

Token type definitions for C++ can be found in the corhdr.h. Search for "mdToken" in it...

3 Comments:

Anonymous Anonymous said...

We want more :)

Thursday, February 3, 2005 at 6:59:00 PM GMT+1  
Blogger Zsozso said...

Thanks. It's nice to see that others are also interested in this subject. :-)
Hopefully, this weekend I'll make a post about RVA also... ;-)

Saturday, February 5, 2005 at 4:35:00 PM GMT+1  
Anonymous Anonymous said...

When i search for MethodDefToken only your blog is listed.

Nice article.

Wednesday, August 10, 2005 at 7:36:00 AM GMT+2  

Post a Comment

<< Home