Posted
2 months
ago
by
Jb Evain
Let’s consider you’re writing a LINQ provider. And that you need to opimize the following LINQ query:
from Person p in db where p.Age > 18 select p;
Let’s add a constraint. The underlying storage engine stores
... [More]
data according to the field name. That would mean that when generating the query for the underlying storage system, you’ll have to map p.Age into something that the underlying storage system will understand. In that case, a field. And all you have is a MemberExpression, giving you a PropertyInfo.
The issue here is that you have no way to get the FieldInfo backing the property. If you think about it, it’s normal. The setter and the getter of a property being traditional methods, they can contain any kind of code. Meaning that you can’t always find a field backing the property.
But in that case, it’s ok, we’re only interested in those forms of properties:
public int Age { get; set; }
private string name;
public string Name {
get { return name; }
set { name = value; }
}
Of course here what’s interesting is how to actually get the field. I’ve used the Reflection based CIL reader I wrote about yesterday. I disassemble the body of either a getter or a setter of the property, and if it matches a simple IL pattern, that is, if it looks to be a property backed by a field, I simply return the field.
To do the actual IL matching, I re-implemented something Rodrigo and I wrote when we were working on instrumenting assemblies at db4o. The code itself is pretty neat.
Anyway, that’s another opportunity to write a simple extension method:
public static FieldInfo GetBackingField (this PropertyInfo self)
Again, you’re more than welcome to have a look at the implementation. Don’t forget that it depends on the Reflection based CIL reader. [Less]
Posted
2 months
ago
by
Jb Evain
As I was writing, earlier this month, when I worked on a static aspect weaver, the first library we used, to programmatically retrieve the CIL bytecode, was a library published by Lutz Roeder (the original author of the most famous Reflector tool)
... [More]
, called ILReader.
It suffered from a number of limitation, and you were tied to the whole System.Reflection infrastructure. Which, during the .net 1.0 time, was somewhat limited, and lacked a few features required to get access to every single detail in an assembly, including the CIL bytecode. It evolved since, for instance, starting from .net 2.0, there’s a GetILAsByteArray on a MethodBody used to get the raw CIL code.
Anyway, most of those concerns were addressed by Cecil, but still, for some use-cases, it could be nice to be able to have access to the CIL bytecode at a higher level of abstraction than a plain raw byte array.
On .net, you can use a library also named ILReader, but it has a few checks that are specific to .net, there’s no information about a license of the code, and also, I’m not especially fond of the way instructions are represented.
So last time, for an hack I’ll soon write about, I extracted Mono.Cecil’s Instruction type, and wrote a cute extension method, or rock, as I like to call them. Its signature:
public IList<Instruction> GetInstructions (this MethodBase self)
I would have loved to declare the extension method on the System.Reflection.MethodBody type, to make things more consistent with the methods it already has, but there’s no cross platform way to get a System.Reflection.MethodBase from a System.Reflection.MethodBody.
Anyway, it’s terribly easy to use if you’ve already used Cecil. The only difference is that for branches, the operand is the offset as an integer, not the target instruction. As a sample usage, here’s a (very) incomplete CIL reflection based disassembler:
static void PrintByteCode (MethodInfo method)
{
foreach (Instruction instruction in method.GetInstructions ())
PrintInstruction (instruction);
}
static void PrintInstruction (Instruction instruction)
{
Console.Write ("{0}: {1} ",
Labelize (instruction.Offset),
instruction.OpCode.Name);
switch (instruction.OpCode.OperandType) {
case OperandType.InlineNone :
break;
case OperandType.InlineSwitch :
var branches = instruction.Operand as int [];
for (int i = 0; i < branches.Length; i++) {
if (i > 0)
Console.Write (", ");
Console.Write (Labelize (branches [i]));
}
break;
case OperandType.ShortInlineBrTarget :
case OperandType.InlineBrTarget :
Console.Write (Labelize ((int) instruction.Operand));
break;
case OperandType.InlineString :
Console.Write ("\"{0}\"", instruction.Operand);
break;
default :
Console.WriteLine (instruction.Operand);
break;
}
Console.WriteLine ();
}
And of course, you’re welcome to have a look at the implementation, under the MIT/X11 license. [Less]
Posted
2 months
ago
by
Jb Evain
I always complained about the fact that debug symbols were not portable between different CLR implementations. The .net CLR consumes pdb files, which is an undocumented format. Another file format was added to the ECMA-335 in a late revision. I wrote
... [More]
about this file format a while ago.
To sum up, it was added very late while Mono already started to use its own format (mdb) and the .net CLR doesn’t understand it anyway. So even if it’s not a bad format (it could use some improvements, like a GUID heap similar to the one in a .net assembly), basically no one uses it in the real world.
As mentioned in a recent post, the CCI contains an interesting piece of code, a managed pdb reader, licensed under the Ms-PL. I extracted it, and used it to be able to better share debug symbols between the .net CLR and Mono.
pdb2mdb
Robert Jordan, a long time Mono contributor, first wrote a tool named pdb2mdb, to convert a pdb to a mdb. The issue is that it was based on a combination of COM and the mixed mode assembly ISymWrapper which comes with the .net framework. All in all, it means that this version of pdb2mdb could only run on on the .net framework on Windows.
With the managed pdb reader, it was very easy to write a fully managed pdb2mdb tool. It’s now available in svn, and it will come with every other developer tool, such as ilasm or the linker. It’s very easy to use. Say you’re deploying a .net application on Linux, you have an assembly Foo.dll, and a Foo.pdb file, just use:
pdb2mdb Foo.dll
And the tool will generate a file Foo.dll.mdb, that Mono can use to display line information in stack traces.
Mono.Cecil.Pdb
Mono.Cecil.Pdb is an assembly that you use together with Cecil, to have line information at the IL level. It’s used by tools such as Gendarme, or MoMA, to help diagnose and locate issues.
I’ve integrated the managed reader, and the folks from NDepend were kind enough to beta test it. After a few fixes, the managed reader passed all NDepend tests, and was performing a lot better than its unmanaged counterpart. It’s now the default, and only the pdb writer uses the ISymWrapper approach.
It would be an interesting challenge for someone to try to write a managed writer from the information gathered in the reader. It may not be easy though. [Less]
Posted
3 months
ago
by
Jb Evain
Back when I was working at db4o, we had fun implementing a mechanism somehow similar to LINQ, to have strongly typed queries expressed using code itself. The implementation uses Mono.Cecil and Cecil.FlowAnalysis to decompile a delegate into an AST
... [More]
, that db4o query optimizer can process.
Since .net 3.5, an API, System.Linq.Expressions, can be used to get representation of a C# lambda expression into an object graph : an expression tree. .net 4.0 will add support for statements to this API, but as far as I know, the language itself hasn’t been updated to produce these new nodes.
Anyway, a few days ago, someone on Stack Overflow, asked how to turn a delegate into a LINQ expression tree. As there’s no builtin feature to do that, it’s not a straightforward process. You basically have to decompile the compiled method, and turned it into an object graph. I guess it’s a good thing that I’m working on a decompiler, if I need to decompile something.
So tonight I wrote a short spike to verify the feasibility of my idea, and it turns out to be pretty simple. Here is its usage:
static void Main ()
{
Func<int, int> magic = i => i * 42;
Expression<Func<int, int>> exp =
DelegateConverter.ToExpression (magic);
Console.WriteLine (exp.ToString ());
// prints: i => i * 42
Console.WriteLine (exp.Compile ().Invoke (1));
// prints: 42
}
DelegateConverter is implemented as a simple visitor which walks a Cecil.Decompiler AST, and generates, if possible, the according Linq Expression Tree. Pretty cool isn’t it?
You can browse the code of the spike. Keep in mind that it’s nowhere to be complete, and that it’s just a proof of concept. Still, I think it’s a pretty cool usage of the Cecil.Decompiler library. [Less]
Posted
3 months
ago
by
Jb Evain
Quite a number of friends pinged me about the recent release of the CCI, under the Ms-PL, and were curious about my take on it, and its effect on Cecil and its ecosystem.
First of all, there’s a bit of a story here, and I’ll
... [More]
write it here for those who like me, love software history. Back in the years 2003 and 2004, I was working with Thomas Gil, one of my mentor and programming hero, on one of the first static aspect weaver on .net, AspectDNG, now abandoned. I was actively researching better ways to do CIL injection.
We went from raw IL text manipulation, to Reflection and Reflection.Emit using Lutz’s ILReader library, to RAIL, until I decide to work on Cecil.
In the meantime, I’ve stumbled upon ILMerge, a tool from Mike Barnett, and mailed him to ask what powered the tool, and he put me in contact with Herman Venter, the man behind the CCI effort. I wrote Herman a couple of mails, in a terrible English, and begged him to push for a release of the CCI under a license we could use in AspectDNG. That was in March 2004.
As you can guess, it quite didn’t work out at that time, so I started working on Cecil. A few weeks after, Miguel blogged about the need of such library. He already had the Mono Linker in mind. I mailed him, got SVN access, checked in the beginning of Cecil, got Sébastien interested, etc.
I had the opportunity to be invited by Microsoft to attend an informal AOP workshop the year later, and to met with Herman, which I remember as a very nice person. I am not sure he remembers the terribly shy kid that did a terrible presentation in a terrible English. But all in all, I’m happy that five years later, my request went through.
Now the CCI release in its own CodePlex page is not really a big event, as it was already released and licensed under the Ms-PL, as it’s part of Sandcastle.
Anyway, Cecil is quite mature in its current form, it’s used by a fair number of (known) applications (please help to improve the list), and I’m currently working on two things.
The first one is a refactoring of Cecil, which vastly reduces memory consumption as well as reading/writing time. Hopefully I’ll have a beta in a month or so. We have great plans for this version of Cecil, and it’s consuming a lot of my time, more on this later.
The second one is an extensible decompiler, Cecil.Decompiler, that will greatly benefit from the Cecil refactoring. The time I dedicate to it is a bit phagocyted by the Cecil refactoring right now, but it’s certainly one of my favorite project.
The CCI is a combination of Cecil, the decompiler, and something to write a decompiled AST back, which will be the natural evolution of the Cecil decompiler. Note that the CCI decompilation/compilation process is not extensible. Now that it’s open source, you can hack it yourself, sadly, the CCI code is well, a bit messy to be polite, or not exactly a joy to read. Also you probably won’t be able to contribute back to the CCI.
Anyway, it does its job alright, and so does Cecil. Choice is always good, let’s welcome the CCI in the small family of such tools. I, for one, will surprisingly keep hacking on and with Cecil :)
To conclude on a very positive note, the fantastic thing about this release is that the CCI contains a fully managed PDB reader and writer. That’s great news as so far, we failed to get any details about this file format. This means that we can now implement a fully managed Mono.Cecil.Pdb support, and that’s just great.
UPDATE: it appears that only the PDB reader is fully managed, the PDB writer is just a wrapper over the COM stuff, just like the current implementation of Mono.Cecil.Pdb. Well, at least it’s a start. [Less]