Recruiting & HR

Recognition of binary patterns by Morphological Analysis

Description
Recognition of binary patterns by Morphological Analysis Aurélien Thierry Guillaume Bonfante, Joan Calvet, Jean-Yves Marion, Fabrice Sabatier Recon / 34 Introduction
Published
of 75
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
Recognition of binary patterns by Morphological Analysis Aurélien Thierry Guillaume Bonfante, Joan Calvet, Jean-Yves Marion, Fabrice Sabatier Recon / 34 Introduction Binary analysis 2 / 34 Introduction Binary analysis Identify librairies that do not need to be reversed 2 / 34 Introduction Binary analysis Identify librairies that do not need to be reversed Our approach : Control flow graph comparison Import results in IDA 2 / 34 Waledac malware and OpenSSL Spamming botnet Use of cryptography for communication : RSA and AES 3 / 34 Waledac malware and OpenSSL Spamming botnet Use of cryptography for communication : RSA and AES OpenSSL 0.9.8e (Feb 2007) used for cryptography 3 / 34 Waledac malware and OpenSSL Spamming botnet Use of cryptography for communication : RSA and AES OpenSSL 0.9.8e (Feb 2007) used for cryptography Which functions are specifically used? 3 / 34 USB Key Morphological Analysis : Learning a file Step 1 : Learn 1 Gather MA-Engine Signature Detection 4 / 34 USB Key Morphological Analysis : Learning a file Step 1 : Learn Control Flow Graph 2 1 Gather Extract MA-Engine Signature Detection 4 / 34 USB Key Morphological Analysis : Learning a file Step 1 : Learn Control Flow Graph Abstraction JCC CALL 2 1 Gather Extract 3 RET MA-Engine Signature Detection 4 / 34 USB Key Morphological Analysis : Learning a file Step 1 : Learn Control Flow Graph Abstraction JCC CALL 2 1 Gather Extract 3 RET MA-Engine 4 MA database Learn Signature Detection 4 / 34 USB Key Morphological Analysis : Scanning a file Step 2 : Scan 1 Gather MA-Engine Signature Detection 4 / 34 USB Key Morphological Analysis : Scanning a file Step 2 : Scan Control Flow Graph Abstraction JCC CALL 2 1 Gather Extract 3 RET MA-Engine Signature Detection 4 / 34 USB Key Morphological Analysis : Scanning a file Step 2 : Scan Control Flow Graph Abstraction JCC CALL 2 1 Gather Extract 3 RET MA-Engine 4 MA database Consult Signature Detection 4 / 34 USB Key Morphological Analysis : Scanning a file Step 2 : Scan Control Flow Graph Abstraction JCC CALL 2 1 Gather Extract 3 RET MA-Engine 4 MA database Consult Signature Detection Compare 5 JCC CALL JCC CALL RET RET 4 / 34 USB Key Morphological Analysis : Scanning a file Step 2 : Scan Control Flow Graph Abstraction JCC CALL 2 1 Gather Extract 3 RET MA-Engine 4 MA database Consult Signature Detection Compare 5 6 Match JCC CALL JCC CALL RET RET 4 / 34 USB Key Morphological Analysis : Scanning a file Step 2 : Scan Control Flow Graph Abstraction JCC CALL 2 1 Gather Extract 3 RET MA-Engine 4 MA database Consult 7 output Signature Detection 6 Match Compare 5 JCC CALL JCC CALL RET RET 4 / 34 Control flow graph recovery Control Flow Graph (CFG) : oriented graph in which nodes are instruction addresses and edges represent all paths that might be traversed during execution 5 / 34 Control flow graph recovery Control Flow Graph (CFG) : oriented graph in which nodes are instruction addresses and edges represent all paths that might be traversed during execution ASM code cmp eax 0 jne +7 mov ecx eax dec ecx mul eax ecx cmp ecx 1 jne -3 jmp +2 inc ecx ret 5 / 34 Control flow graph recovery Control Flow Graph (CFG) : oriented graph in which nodes are instruction addresses and edges represent all paths that might be traversed during execution ASM code cmp eax 0 jne +7 mov ecx eax dec ecx mul eax ecx cmp ecx 1 jne -3 jmp +2 inc ecx ret cmp eax 0 jne +7 mov ecx eax dec ecx inc ecx mul eax ecx cmp ecx 1 jne -3 jmp +2 ret 5 / 34 Control flow graph recovery Control Flow Graph (CFG) : oriented graph in which nodes are instruction addresses and edges represent all paths that might be traversed during execution ASM code cmp eax 0 jne +7 mov ecx eax dec ecx mul eax ecx cmp ecx 1 jne -3 jmp +2 inc ecx ret cmp eax 0 jne +7 mov ecx eax dec ecx inc ecx mul eax ecx cmp ecx 1 jne -3 jmp +2 mov ecx eax dec ecx mul eax ecx cmp ecx 1 jne -3 cmp eax 0 jmp +2 jne +7 inc ecx ret ret 5 / 34 Control flow graph recovery Extraction of the control flow graph from a binary : Static analysis from entrypoints when possible (BeaEngine) Dynamic analysis otherwise (Intel s Pintools) 6 / 34 Control flow graph recovery Extraction of the control flow graph from a binary : Static analysis from entrypoints when possible (BeaEngine) Dynamic analysis otherwise (Intel s Pintools) Nodes of the control flow graph : Sequential instructions do not modify the control flow 4 types of instructions have an impact on the CFG (jmp, call, jcc, et ret) 6 / 34 Control flow graph construction & reduction Nth instruction Control flow graph Sequential instruction INST N+1 jmp K JMP K CALL call K N+1 K jcc K JCC N+1 K ret RET 7 / 34 Control flow graph construction & reduction Nth instruction Sequential instruction Control flow graph INST N+1 Remove sequential instructions N INST N+2 N N+2 N jmp K JMP K Realign jumps JMP K N K N call K CALL N+1 K Remove false calls CALL N+2 RET N N+2 N+3 N+3 JCC jcc K N+1 K ret RET 7 / 34 Reduction of the control flow graph The CFG is reducted : Reduce the size of the graph (fastens the algorithms) More abstract form : detect slight changes (junk code insertion, code re-ordering) 8 / 34 Reduction of the control flow graph The CFG is reducted : Reduce the size of the graph (fastens the algorithms) More abstract form : detect slight changes (junk code insertion, code re-ordering) Waledac with static analysis : nodes before reduction nodes after reduction 8 / 34 Reduction on Waledac 43cfb1 : CALL 43cdca : JUMP 43cfb6 : INST 43cf24 : INST 43cfb7 : CALL 43cf28 : JCC 43cfbc : INST 43cf2a : INST 43cfc0 : RET 43cf35 : JCC 43cf37 : INST 43cf3c : JCC 43cf3e : INST 43cf4c : INST 43cf40 : CALL 43cfb1 : CALL 43cf45 : INST 43cf46 : CALL 43c6d7 : INST 43c6dd : CALL 43cf28 : JCC 43cf4b : INST 43c6e2 : INST 451bc4 : INST 43cf35 : JCC 43cf4f : CALL 43c6e6 : CALL 451bcd : CALL 43cf54 : INST 48a0ff : JUMP 401e32 : INST 43c6eb : INST : INST 451bd2 : INST 43cf3c : JCC 43cf56 : RET 48c719 : INST 401e37 : JCC 43c6ec : RET : CALL 451bd4 : RET 43cf40 : CALL 48c720 : CALL 401e39 : INST : INST 48c725 : INST 401e37 : JCC 43cf46 : CALL 4959ac : RET 48c72a : JCC 401e39 : INST 43cf4f : CALL 48c72c : INST 48c733 : JCC 48c72a : JCC 43cf56 : RET 48c735 : INST 48c778 : INST 48c733 : JCC 48c737 : CALL 48c781 : CALL 48c73c : INST : INST 48c787 : INST 48c737 : CALL 48c781 : CALL 48c742 : CALL : JCC 48c789 : JCC 48c74d : JCC : JCC 48c789 : JCC 4972c0 : INST 48c747 : INST 48c7a1 : CALL 49725a : INST 49726d : INST 48c78b : CALL 4972d0 : JUMP 48c74d : JCC 48c7a6 : RET 4959ad : INST 49725b : CALL 49726f : CALL 4908e3 : CALL 48c790 : INST 48c74f : INST 48c758 : INST : INST 48c792 : CALL 4959c0 : RET 4972e4 : INST 48c74f : INST 48c758 : INST 4959c0 : RET : INST : INST : INST 4908e8 : INST : INST 48c792 : CALL 0 : UNDEFINED 0 : UNDEFINED Figure: Part of Waledac without reduction (80 nodes) and with reduction (23 nodes) 9 / 34 Graph matching Graph isomorphism detection d e 1 4 b a c / 34 Graph matching Graph isomorphism detection d e 1 4 b a c / 34 Graph matching Graph isomorphism detection d e 1 4 b a c / 34 Graph matching Graph isomorphism detection d e 1 4 b a c / 34 Graph matching Graph isomorphism detection d e 1 4 b a c / 34 Graph matching Graph isomorphism detection d e 1 4 b a c / 34 Graph matching Graph isomorphism detection d e 1 4 b a c / 34 4574aa : RET 4573f8 : JCC 4573ec : CALL : CALL : JCC : JCC : JCC : JCC : CALL 4571b2 : RET : RET 45601d : JCC 45742e : CALL : JCC 4565b9 : CALL 4572a7 : JCC : JCC 45692b : JCC 48f1b6 : JCC 4565c0 : CALL : CALL 0 : UNDEFINED 45603e : CALL : JCC a : JCC : CALL a : JCC d : JCC e : JCC : RET e : CALL : CALL c : JCC ce : RET b : JCC : JCC : JCC 10045f02 : CALL 10024f45 : RET 10001dbc : RET 10001d60 : JCC 10045a19 : CALL 10024f0c : CALL 10024f18 : JCC 10045f15 : JCC 10001d4c : JCC 10001d74 : CALL 10001d81 : CALL Graph matching : Waledac and OpenSSL Entire graphs are not isomorphic Waledac OpenSSL 11 / 34 4574aa : RET 4573f8 : JCC 4573ec : CALL : CALL : JCC : JCC : JCC : JCC : CALL 4571b2 : RET : RET 45601d : JCC 45742e : CALL : JCC 4565b9 : CALL 4572a7 : JCC : JCC 45692b : JCC 48f1b6 : JCC 4565c0 : CALL : CALL 0 : UNDEFINED 45603e : CALL : JCC a : JCC : CALL a : JCC d : JCC e : JCC : RET e : CALL : CALL c : JCC ce : RET b : JCC : JCC : JCC 10045a19 : CALL 10024f0c : CALL 10024f18 : JCC 10045f02 : CALL 10024f45 : RET 10001dbc : RET 10001d60 : JCC 10045f15 : JCC 10001d4c : JCC 10001d74 : CALL 10001d81 : CALL Graph matching : Waledac and OpenSSL Entire graphs are not isomorphic But some parts (subgraphs) are Waledac OpenSSL 11 / 34 4574aa : RET 4573f8 : JCC 4573ec : CALL : CALL : JCC : JCC : JCC : JCC : CALL 4571b2 : RET : RET 45601d : JCC 45742e : CALL : JCC 4565b9 : CALL 4572a7 : JCC : JCC 45692b : JCC 48f1b6 : JCC 4565c0 : CALL : CALL 0 : UNDEFINED 45603e : CALL : JCC a : JCC : CALL a : JCC d : JCC e : JCC : RET e : CALL : CALL c : JCC ce : RET b : JCC : JCC : JCC 10045a19 : CALL 10024f0c : CALL 10024f18 : JCC 10045f15 : JCC 10001d4c : JCC 10045f02 : CALL 10024f45 : RET 10001dbc : RET 10001d60 : JCC 10001d74 : CALL 10001d81 : CALL Graph matching : Waledac and OpenSSL Waledac OpenSSL c : JCC 4573f8 : JCC d : JCC 10024f18 : JCC : JCC : JCC ce : RET a : JCC 10045f02 : CALL 10024f45 : RET 4574aa : RET : JCC : CALL 4571b2 : RET : CALL 10045f15 : JCC 10045a19 : CALL 45742e : CALL : JCC 4565b9 : CALL 10001d4c : JCC : JCC 10001dbc : RET 10001d60 : JCC : RET 45601d : JCC 10001d74 : CALL 10001d81 : CALL : CALL 45603e : CALL 11 / 34 Subgraphs Both graphs are cut into many small subgraphs Generated through BFS (Breadth First Search) from each nodes 12 / 34 Subgraphs Both graphs are cut into many small subgraphs Generated through BFS (Breadth First Search) from each nodes Their size is limited (typically 24 nodes) Search graph isomorphisms between subgraphs of both binaries 12 / 34 4574aa : RET : JCC 4573f8 : JCC 4573ec : CALL : CALL : JCC : JCC : JCC : JCC : CALL 4571b2 : RET : RET 45601d : JCC 45742e : CALL : JCC 4565b9 : CALL 4572a7 : JCC 45692b : JCC 48f1b6 : JCC 4565c0 : CALL : JCC 45603e : CALL : CALL 0 : UNDEFINED More on subgraphs From one CFG, many subgraphs are generated Every reachable node is in many subgraphs Example on Waledac : 24 nodes to 8 subgraphs of size d : JCC : JCC : CALL : CALL 4574aa : RET : JCC : JCC : JCC 4565b9 : CALL 45603e : CALL 45742e : CALL : CALL 4571b2 : RET 48f1b6 : JCC 4565c0 : CALL : JCC 0 : UNDEFINED : JCC : JCC 4565b9 : CALL 4573f8 : JCC : JCC : JCC : RET 45601d : JCC : CALL 4573ec : CALL : CALL 4574aa : RET : JCC : JCC : JCC 4573f8 : JCC : CALL 45603e : CALL 45742e : CALL : RET 45601d : JCC : JCC : JCC 13 / 34 Graph isomorphism problem Graph isomorphism has no solution in polynomial time in the general case The problem is in NP General solutions are slow 14 / 34 Graph isomorphism problem Graph isomorphism has no solution in polynomial time in the general case The problem is in NP General solutions are slow Property (Simplification) Our subgraphs : Have a root node (from which every other node is reachable) Each node has at most 2 children (call or jcc) Children are ordered This problem is in P 14 / 34 Graph isomorphism problem Does not exactly resolve the graph isomorphism problem But there are fast solutions (polynomial time) call call jcc ret ret jcc (v) Original graph (w) Undetected graph 15 / 34 Morphological analysis engine Signatures are subgraphs from reducted control flow graphs Obtained statically or dynamically A database (tree automata) is filled with the signatures 16 / 34 Morphological analysis engine Signatures are subgraphs from reducted control flow graphs Obtained statically or dynamically A database (tree automata) is filled with the signatures Learning and scanning is fast (Intel Core i5 CPU M GHz) Operation Files Time (s) Learn 44 binaries ( 2000 nodes) 1.2s Scan 44 binaries ( 2000 nodes) 1.1s Learn OpenSSL (28313 nodes) 12s Scan Waledac (14626 nodes) 2.0s 16 / 34 Compare Waledac and OpenSSL Waledac uses OpenSSL 0.9.8e (Feb 2007) OpenSSL learnt with reduction 17 / 34 Compare Waledac and OpenSSL Waledac uses OpenSSL 0.9.8e (Feb 2007) OpenSSL learnt with reduction One DLL is matched (libeay.dll) OpenSSL version Comment Results (common subgraphs) 0.9.8x Released in May e Compiled for performance (/0x /02) e Compiled for file size (/01) / 34 Compare Waledac and OpenSSL Compile OpenSSL 0.9.8e with option /O1 (size optimization) 1264 common subgraphs between one of the DLLs (libeay.dll) and Waledac!! We want to know which functions are matched We will compare the matched code of OpenSSL and Waledac 18 / 34 Code and nodes The larger the matched subgraphs are, the more accurate the matching 19 / 34 Code and nodes The larger the matched subgraphs are, the more accurate the matching Learns and scans with increasing number of nodes from 24 Associate nodes that are in the largest subgraphs 19 / 34 Code and nodes The larger the matched subgraphs are, the more accurate the matching Learns and scans with increasing number of nodes from 24 Associate nodes that are in the largest subgraphs Outputs matched nodes for each size for IDA 19 / 34 b : JCC : JCC : JCC ce : RET d : JCC a : JCC : CALL a : JCC e : JCC : CALL c : JCC : RET e : CALL 10024f0c : CALL 10024f18 : JCC 10045f15 : JCC 10045a19 : CALL 10001d4c : JCC 10045f02 : CALL 10024f45 : RET 10001dbc : RET 10001d60 : JCC 10001d81 : CALL 10001d74 : CALL Code and nodes Waledac OpenSSL 4573ec : CALL 4573f8 : JCC : CALL : JCC : JCC : JCC 4574aa : RET : JCC : CALL 4571b2 : RET : RET 45601d : JCC 45742e : CALL : JCC 4565b9 : CALL : CALL : JCC 4572a7 : JCC 45692b : JCC 48f1b6 : JCC 4565c0 : CALL 45603e : CALL : JCC 0 : UNDEFINED 20 / 34 4574aa : RET 4573f8 : JCC 4573ec : CALL : JCC : CALL : JCC : JCC : JCC : JCC 45601d : JCC : RET 4571b2 : RET : CALL 45742e : CALL : CALL : JCC 4565b9 : CALL 45603e : CALL : JCC 0 : UNDEFINED 4572a7 : JCC 45692b : JCC 4565c0 : CALL 48f1b6 : JCC a : JCC : RET e : CALL 10045f15 : JCC 10024f18 : JCC 10045a19 : CALL 10024f0c : CALL 10001d4c : JCC 10024f45 : RET 10045f02 : CALL 10001dbc : RET 10001d60 : JCC 10001d74 : CALL 10001d81 : CALL : CALL e : JCC c : JCC a : JCC : CALL b : JCC d : JCC ce : RET : JCC : JCC Code and nodes Greatest subgraph found has 18 nodes Corresponding nodes in matched subgraphs are associated Then associate free nodes on matching subgraphs of lesser size Waledac OpenSSL 20 / 34 IDA plugin With both binaries opened in IDA Imports the list of matched nodes Marks them in IDA Provides browsing through corresponding nodes in both instances 21 / 34 Waledac / OpenSSL : common subroutines Figure: Matching nodes are in corresponding subroutines 22 / 34 Waledac / OpenSSL : common subroutines AES : AES set encrypt key, AES set decrypt key X509 : X509 PUBKEY set, X509 PUBKEY get RSA / DSA : RSA free, DSA size, DSA new method BN (Big Number lib) : BN is prime fasttest ex, BN ctx new, BN mod inverse CRYPTO : CRYPTO lock, CRYPTO malloc Misc OpenSSL routines : UI, encoding / 34 Comparing matched code : AES set encrypt key Figure: Matched code between OpenSSL (left) and Waledac (right) 24 / 34 Comparing code : AES encrypt Not detected Figure: AES encrypt subroutine 25 / 34 Comparing code : AES encrypt Not detected Control flow graph too small Initial round Rounds Final Round Figure: AES encrypt subroutine Figure: Simplified AES encrypt CFG 25 / 34 Waledac / OpenSSL : Findings OpenSSL 0.9.8e compiled for being small (option /01) 26 / 34 Waledac / OpenSSL : Findings OpenSSL 0.9.8e compiled for being small (option /01) Use of AES for symmetric encryption X.509 (certificate) handling, use of RSA and/or DSA algorithm Calls to primality tests (consistent with asymetric encryption like RSA but not exclusively) 26 / 34 Waledac / OpenSSL : Findings OpenSSL 0.9.8e compiled for being small (option /01) Use of AES for symmetric encryption X.509 (certificate) handling, use of RSA and/or DSA algorithm Calls to primality tests (consistent with asymetric encryption like RSA but not exclusively) Waledac actually uses X509/RSA and AES encryption We were able to find out without actually reversing its code 26 / 34 Duqu and Stuxnet Static analysis on their decrypted (and unpacked) main DLLs (maindll.dll for Stuxnet and netp191.pnf for Duqu) 27 / 34 Duqu and Stuxnet Static analysis on their decrypted (and unpacked) main DLLs (maindll.dll for Stuxnet and netp191.pnf for Duqu) First analysis : 26.5% of Duqu s subgraphs are common with Stuxnet (846 subgraphs) 60.3% of Duqu s nodes are in subgraphs matching with Stuxnet (2215 nodes) Duqu and Stuxnet are strongly related 27 / 34 Duqu / Stuxnet : common subroutines 28 / 34 Duqu / Stuxnet : subroutine identification Some of the common subroutines come from standard librairies (libc...) They are documented and should not be manually reversed 29 / 34 Duqu / Stuxnet : subroutine identification Some of the common subroutines come from standard librairies (libc...) They are documented and should not be manually reversed msvcr80.dll : Microsoft Visual C++ Run-Time How to identify its code within Duqu / Stuxnet in IDA? 29 / 34 Duqu / Stuxnet : libc identification Learn msvcr80.dll ( libc ) and scan Duqu, Stuxnet 30 / 34 Duqu / Stuxnet : libc identification Learn msvcr80.dll ( libc ) and scan Duqu, Stuxnet IDA plugin will : Mark the nodes common with msvcr80.dll Rename the matched subroutines 30 / 34 Duqu / Stuxnet : common subroutines Figure: Renamed subroutines matching between Duqu and Stuxnet 31 / 34 Highlighting msvcr80.dll in Stuxnet Figure: Colored (yellow) code of msvcr80.dll in Stuxnet, subroutines are renamed 32 / 34 Duqu / Stuxnet : summary From the decrypted and unpacked DLLs from Stuxnet, we are able to automatically find code shared with Duqu Before reversing, we identify standard (msvcr80.dll) subroutines With IDA, we can identify and browse matching subroutines 33 / 34 Conclusion Identify used librairies Show code similarities IDA UI for browsing matched code 34 / 34 Conclusion Identify used librairies Show code similarities IDA UI for browsing matched code Thank you Any question? 34 / 34
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks