For jump instructions, an edge is added from the current instruction to the first one of the target block. Additionally, we consider two types of instructions which may have several targets. For a block, because the instructions are executed in sequence, every node has an outgoing edge to the next one (line 5-9). From line 3 to 24, the graph edges are created by traversing all instructions of each block and considering possible execution paths from the current instruction to others. In line 2, the set of edges is initially set to empty. Specifically, the first line invokes procedure initialize_Blocks to read the file contents and return all the blocks. The second step is creating the edges to represent the control flow transfers in the program. In the first step, the code is partitioned into blocks of instructions based on the labels (e.g. Building the CFG from an assembly code includes two major steps. The algorithm takes an assembly file as the input, and outputs the CFG. The pseudo-code to generate CFGs is shown in Algorithm 1. 6 illustrates an example of the control flow graph constructed from an assembly code snippet, in which each vertex corresponds to an instruction and a directed edge shows the execution path from an instruction to the other. However, their AST structures have many differences including the positions of subtrees of two functions main and sumtoN, the separation of declaration and assignment of the variable n, and the function prototype of sumtoN in File 2.c.įig. Assume that if the statement i=1 in procedure sumtoN of File 2.c is changed to i+=1 then both programs are the same. Meanwhile, ASTs just describe the syntactic structures of programs, and they may contain many redundant branches. For example, two statements int n n=4 in the main procedure of File 2.c are treated in a similar way with the statement int n = 4 and, the if statement can be removed since the value of n is identified. The compiler applies many techniques to analyze and optimize the ASTs. Secondly, assembly code is refined because they are the products after AST processing of the compiler. 3) we observed that i+=1 and i=1 are translated into different instructions. Firstly, assembly code contains atomic instructions and their CFGs indicate step-by-step execution process of programs. Applying CFGs of assembly code is more beneficial than those of ASTs. Regarding this, each source code is converted into an execution flow graph by two stages: 1) compiling the source file to the assembly code, 2) generating the CFG from the compiled code. To explore more deeply programs’ semantics, this paper proposes combining precise graphs which represent execution flows of programs called Control Flow Graphs (CFGs), and a powerful graphical neural work. In other words, both metrics-based and tree-based approaches are not able to distinguish these programs. Similarly, parsing the procedures into ASTs using Pycparser 2 2 2, their ASTs are identical. To extract the traditional metrics, their feature vectors are exactly matching, since two procedures have the same lines of code, programming tokens, etc. As can be seen a bug from File 2.c, the statement i=1 causes an infinite loop of the for statement in case N>=1. Two procedures have a tiny difference at line 7 File 1.c and line 15 in File 2.c. For example, we consider the procedures with the same name sumtoN in two C files File 1.c and File 2.c (Fig. Therefore, both software metrics and AST features may not reveal many types of defects in programs. Meanwhile, ASTs do not show the execution process of programs instead, they simply represent the abstract syntactic structure of source code. However, defect characteristics are deeply hidden in programs’ semantics and they only cause unexpected output in specific conditions. Outperforms the baselines including several other deep learning approaches. TheĮxperiments on four real-world datasets show that our method significantly Source code we thereafter apply multi-view multi-layer directed graph-basedĬonvolutional neural networks (DGCNNs) to learn semantic features. Graphs are constructed from the assembly instructions obtained by compiling Networks for automatically learning defect features. Leverage precise graphs representing program execution flows, and deep neural To explore deeply programs' semantics, this paper proposes to The existing features and tree structures often fail to capture the semantics However, the performance of the models is not high since Using tree representations of programs, and exploiting different machine Predictive models, previous studies focus on manually extracting features or Existing defects in software components is unavoidable and leads to not onlyĪ waste of time and money but also many serious consequences.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |