See my Droidcon London 2009 presentation below as SlideShare presentation or download it from here. Watch the podcast here.
Showing posts with label dedexer. Show all posts
Showing posts with label dedexer. Show all posts
Wednesday, December 2, 2009
Monday, November 23, 2009
Droidcon+ODEX file disassembly
First, the advertisement. I will make a longer presentation at Droidcon London 2009 about Dalvik bytecode in general, using Dedexer examples. This will be a longer version of my previous, short presentation (also in podcast). If central London is convenient for you, please, come. Otherwise I will share the presentation after the event.
To celebrate the event, I finished the symbolic ODEX disassembly feature in Dedexer (look for version 1.8). This means that instead of ugly offsets, Dedexer now correctly decompiles the method and field names for execute-inline, iget/iput-quick and nvoke-virtual-quick instruction families if the dependency files are available. So instead of this:
.line 3041
invoke-virtual-quick {v5},vtable #0x2c
move-result-object v2
.line 3042
iget-object-quick v3,v5,[obj+0x28]
invoke-virtual-quick {v3},vtable #0xe
move-result-object v0
.line 3043
execute-inline {v2},inline #0x4
move-result v1
You will get this:
.line 3041
invoke-virtual-quick {v5},android/app/Activity/android/app/Activity/getPackageName ; getPackageName()Ljava/lang/String; , vtable #0x2c
move-result-object v2
.line 3042
iget-object-quick v3,v5,mComponent Landroid/content/ComponentName; ;[obj+0x28]
invoke-virtual-quick {v3},android/content/ComponentName/android/content/ComponentName/getClassName ; getClassName()Ljava/lang/String; , vtable #0xe
move-result-object v0
.line 3043
execute-inline {v2},Ljava/lang/String/length ; length()I , inline #0x4
move-result v1
Much better, isn't it? See you at Droidcon and I will explain how to interpret the code fragment above.
To celebrate the event, I finished the symbolic ODEX disassembly feature in Dedexer (look for version 1.8). This means that instead of ugly offsets, Dedexer now correctly decompiles the method and field names for execute-inline, iget/iput-quick and nvoke-virtual-quick instruction families if the dependency files are available. So instead of this:
.line 3041
invoke-virtual-quick {v5},vtable #0x2c
move-result-object v2
.line 3042
iget-object-quick v3,v5,[obj+0x28]
invoke-virtual-quick {v3},vtable #0xe
move-result-object v0
.line 3043
execute-inline {v2},inline #0x4
move-result v1
You will get this:
.line 3041
invoke-virtual-quick {v5},android/app/Activity/android/app/Activity/getPackageName ; getPackageName()Ljava/lang/String; , vtable #0x2c
move-result-object v2
.line 3042
iget-object-quick v3,v5,mComponent Landroid/content/ComponentName; ;[obj+0x28]
invoke-virtual-quick {v3},android/content/ComponentName/android/content/ComponentName/getClassName ; getClassName()Ljava/lang/String; , vtable #0xe
move-result-object v0
.line 3043
execute-inline {v2},Ljava/lang/String/length ; length()I , inline #0x4
move-result v1
Much better, isn't it? See you at Droidcon and I will explain how to interpret the code fragment above.
Friday, October 23, 2009
Sunday, October 18, 2009
Help needed
I decided to release a new version of dedexer but I am not satisfied. The Holy Grail I am chasing is the high-quality disassembly of ODEX files and I intended to use the hint received from Nenik. I extended the dedexer tool with data flow analysis so it now has knowledge about the types in Dalvik registers at any point of the execution of Android bytecode. If you ask nicely the new version of the tool (-r switch), it will even share this information with you. Now a decompiled method looks like this if this switch is used:
.method public(Ljava/lang/String;)V
.limit registers 4
; this: v2 (LLineReader;)
; parameter[0] : v3 (Ljava/lang/String;)
.catch java/io/IOException from lbba to lbda using lbdc
.line 18
invoke-direct {v2},java/lang/Object/ ; ()V
; v2 : LLineReader;
lbba:
.line 20
new-instance v0,java/io/FileInputStream
; v0 : Ljava/io/FileInputStream;
invoke-direct {v0,v3},java/io/FileInputStream/ ; (Ljava/lang/String;)V
; v0 : Ljava/io/FileInputStream; , v3 : Ljava/lang/String;
iput-object v0,v2,LineReader.fis Ljava/io/FileInputStream;
; v0 : Ljava/io/FileInputStream; , v2 : LLineReader;
.line 21
new-instance v0,java/io/BufferedInputStream
; v0 : Ljava/io/BufferedInputStream;
iget-object v1,v2,LineReader.fis Ljava/io/FileInputStream;
; v1 : Ljava/io/FileInputStream; , v2 : LLineReader;
invoke-direct {v0,v1},java/io/BufferedInputStream/ ; (Ljava/io/InputStream;)V
; v0 : Ljava/io/BufferedInputStream; , v1 : Ljava/io/FileInputStream;
iput-object v0,v2,LineReader.bis Ljava/io/BufferedInputStream;
; v0 : Ljava/io/BufferedInputStream; , v2 : LLineReader;
lbda:
.line 28
return-void
lbdc:
.line 23
move-exception v0
; v0 : Ljava/io/IOException;
goto lbda
.end method
Great then, but where is the invoke-quick disassembly? Well, erm, I ran into problems. First of all, I could not figure out the data structures that store the names of other ODEX files that this ODEX file depends on. They seem to be in some sort of data structure at the end of the ODEX file that stores the name of these files but its exact layout remains a mistery for me.
Second, in order to decode invoke-quick statements, iget-object-quick statements also need to be decoded because the type values they put into Dalvik registers are needed for the data flow analyser. The source of this instruction is known as an offset and the mapping of these offsets back to Java types.
I will try to progress with these problems, any help is appreciated.
And now some PR after the boring technical details.
I will make a short presentation about dedexer during the coming Android meetup in London. If you are interested about the tool and central London is accessible for you, let's see each other there.
.method public
.limit registers 4
; this: v2 (LLineReader;)
; parameter[0] : v3 (Ljava/lang/String;)
.catch java/io/IOException from lbba to lbda using lbdc
.line 18
invoke-direct {v2},java/lang/Object/
; v2 : LLineReader;
lbba:
.line 20
new-instance v0,java/io/FileInputStream
; v0 : Ljava/io/FileInputStream;
invoke-direct {v0,v3},java/io/FileInputStream/
; v0 : Ljava/io/FileInputStream; , v3 : Ljava/lang/String;
iput-object v0,v2,LineReader.fis Ljava/io/FileInputStream;
; v0 : Ljava/io/FileInputStream; , v2 : LLineReader;
.line 21
new-instance v0,java/io/BufferedInputStream
; v0 : Ljava/io/BufferedInputStream;
iget-object v1,v2,LineReader.fis Ljava/io/FileInputStream;
; v1 : Ljava/io/FileInputStream; , v2 : LLineReader;
invoke-direct {v0,v1},java/io/BufferedInputStream/
; v0 : Ljava/io/BufferedInputStream; , v1 : Ljava/io/FileInputStream;
iput-object v0,v2,LineReader.bis Ljava/io/BufferedInputStream;
; v0 : Ljava/io/BufferedInputStream; , v2 : LLineReader;
lbda:
.line 28
return-void
lbdc:
.line 23
move-exception v0
; v0 : Ljava/io/IOException;
goto lbda
.end method
Great then, but where is the invoke-quick disassembly? Well, erm, I ran into problems. First of all, I could not figure out the data structures that store the names of other ODEX files that this ODEX file depends on. They seem to be in some sort of data structure at the end of the ODEX file that stores the name of these files but its exact layout remains a mistery for me.
Second, in order to decode invoke-quick statements, iget-object-quick statements also need to be decoded because the type values they put into Dalvik registers are needed for the data flow analyser. The source of this instruction is known as an offset and the mapping of these offsets back to Java types.
I will try to progress with these problems, any help is appreciated.
And now some PR after the boring technical details.
I will make a short presentation about dedexer during the coming Android meetup in London. If you are interested about the tool and central London is accessible for you, let's see each other there.
Sunday, August 23, 2009
Dedexer in Softpedia
Wow, I got an e-mail that dedexer 1.5 got included into Softpedia. The funny thing is that it got included into the Mac development tool section. I have already got a number of questions from Android developers working on Mac. Is it so that Mac is a favourite platform for Android developers or is it just my fortune? Any opinions?
Friday, August 7, 2009
Dedexer annotation support
Just a short post for those few who do Dalvik bytecode analysis: dedexer has now full annotation support! For example it now decompiles "throws" and inner class annotation, along with custom annotations. Go for 1.5 release to get it.
Friday, January 9, 2009
Disassembling DEX files
One of the most remarkable features of the Dalvik virtual machine (the workhorse under the Android system) is that it does not use Java bytecode. Instead, a homegrown format called DEX was introduced and not even the bytecode instructions are the same as Java bytecode instructions. There was some discussion whether this makes Dalvik a Java virtual machine at all. My personal opinion is that this is a religious and legal dispute. Dalvik opcodes are clearly designed to support only the Java language. Compiling programs to Dalvik bytecode written in a language other than Java is certainly possible, as it was demonstrated with Java but neither the Java bytecode, nor the Dalvik bytecode makes any effort to support any language other than Java. This is in contrast with the .Net virtual machine where at least a claim has been made that the VM supports multiple languages - even though there are always limitations in any virtual machine that prevents running a particular language on a particular virtual machine.
Android comes with a disassembler called dexdump. The location of this tool is not intuitive, it runs on the Linux platform that hosts Android. Launch the emulator, and issue the following commands:
adb shell
dexdump
In order to use the tool, one has to move the DEX file to the Android platform (e.g. adb push in case of the emulator). Then one can say:
dexdump -d classes.dex
The output of this tool is not very easy to use, however. Take for example the bytecode compiled from the following switch statement.
The jump table used by the packed-switch instruction is not disassembled at all, it is not even dumped entirely. The same problem applies to fill-array-data tables and there are further restrictions.
I decided therefore to create a more comfortable disassembler and here is the first cut.
Access the dedexer project's page on SourceForge.
This tool is easier to use than dexdump for many reasons. For starter, it is a standard Java program that runs on the usual JVMs. Its format is much more readable and is familiar to those who know the Jasmin syntax. For example the previous fragment is disassembled like this by dedexer:
In addition, individual file is created for each class, along with the directory structure representing the package structure.
This is not a full decompiler, however. One has to know the Dalvik opcodes in order to work with the tool. This opcode list has been extended and maintained as dedexer was developed and is now in sync with the disassembler. You will see some unknown opcodes in the list. I have not encountered those instructions "out in the wild" and the disassembler does not recognize them either. If you see any of those, send me the DEX file so that I can analyse it!
This is a simple tool and is not without limitations. The most painful one is that the tool does not process the debug and annotation information in the DEX file. Array data dump could also be better. I am sure that the feature most people would like to see is a bridge toward Java class files but that is far away. Jasmin will be able to generate Java class files once the backward conversion from Dalvik opcodes to Java bytecode is provided but that's a complex task so don't hold your breath. The condition I set for myself as release condition is that the tool is able to disassemble the DEX file in framework.jar. It is able to, so I guess, the tool may be of use for others too. Enjoy!
Android comes with a disassembler called dexdump. The location of this tool is not intuitive, it runs on the Linux platform that hosts Android. Launch the emulator, and issue the following commands:
adb shell
dexdump
In order to use the tool, one has to move the DEX file to the Android platform (e.g. adb push in case of the emulator). Then one can say:
dexdump -d classes.dex
The output of this tool is not very easy to use, however. Take for example the bytecode compiled from the following switch statement.
000418: 2b02 0c00 0000 |0000: packed-switch v2, 0000000c // +0000000c
00041e: 12f0 |0003: const/4 v0, #int -1 // #ff
000420: 0f00 |0004: return v0
000422: 1220 |0005: const/4 v0, #int 2 // #2
000424: 28fe |0006: goto 0004 // -0002
000426: 1250 |0007: const/4 v0, #int 5 // #5
000428: 28fc |0008: goto 0004 // -0004
00042a: 1260 |0009: const/4 v0, #int 6 // #6
00042c: 28fa |000a: goto 0004 // -0006
00042e: 0000 |000b: nop // spacer
000430: 0001 0300 faff ffff 0500 0000 0700 ... |000c: packed-switch-data (10 units)
The jump table used by the packed-switch instruction is not disassembled at all, it is not even dumped entirely. The same problem applies to fill-array-data tables and there are further restrictions.
I decided therefore to create a more comfortable disassembler and here is the first cut.
Access the dedexer project's page on SourceForge.
This tool is easier to use than dexdump for many reasons. For starter, it is a standard Java program that runs on the usual JVMs. Its format is much more readable and is familiar to those who know the Jasmin syntax. For example the previous fragment is disassembled like this by dedexer:
.method public calc1(I)I
packed-switch v2,0
ps418_422 ; case 0
ps418_426 ; case 1
ps418_42a ; case 2
default: ps418_default
ps418_default:
const/4 v0,15
l420:
return v0
ps418_422:
const/4 v0,2
goto l420
ps418_426:
const/4 v0,5
goto l420
ps418_42a:
const/4 v0,6
goto l420
nop
.end method
In addition, individual file is created for each class, along with the directory structure representing the package structure.
This is not a full decompiler, however. One has to know the Dalvik opcodes in order to work with the tool. This opcode list has been extended and maintained as dedexer was developed and is now in sync with the disassembler. You will see some unknown opcodes in the list. I have not encountered those instructions "out in the wild" and the disassembler does not recognize them either. If you see any of those, send me the DEX file so that I can analyse it!
This is a simple tool and is not without limitations. The most painful one is that the tool does not process the debug and annotation information in the DEX file. Array data dump could also be better. I am sure that the feature most people would like to see is a bridge toward Java class files but that is far away. Jasmin will be able to generate Java class files once the backward conversion from Dalvik opcodes to Java bytecode is provided but that's a complex task so don't hold your breath. The condition I set for myself as release condition is that the tool is able to disassemble the DEX file in framework.jar. It is able to, so I guess, the tool may be of use for others too. Enjoy!