Application of Rewriting Term System for Source Code Analysis








Abstract (English)


A extensible source code manipulation framework on top TermWare rule system is introduced and its application to software system analysis is presented. Framework allows to build light-weighted formal models, focused on particular properties of a program to be analysed and does not need full computational semantics of the program.This allows building a number of useful practical approximations of computational semantics in inexpensive and efficient way.Two practical applications built on top of TermWare -- static analyzer for Java source code and a system for partival evaluation of Java sources are described.


Keywords: TermWare, rule-nased systems..



Rewriting rule based techniques are well known to be used in symbolic computation as well as in software system development. A particular class of rewriting rule systems consists of comprehensive ones such as Maude[1], Stratego[2] or APS[3] that have appeared as an alternative to logic programming systems, Jess[4], intended to use rewriting rules for construction of expert system, and ASF+SDF [5] for transformations of syntactical trees analysis accordingly. Tom [6] is extension of Java to manipulate tree structures. Reference feature of such systems is availability of the rules of representation of a subject domain objects as algebraic terms and universal capability of transforming these terms through the policies of applying production rules based on non-logic principles, often in a form of imperative sequences (or strategies) of rule applications. TermWare system has been recently developed and applied by the authors in different subject domains of software engineering [7], [8], [9]. It also belongs to comprehensive rewriting systems but differs from other systems of this class in both semantics of used facilities and technology of their implementation. TermWare language is not a universal programming language for writing full-fledged software system. Instead it is aimed to compose domain specific parts of the applications built into an applied system for implementing functions of interaction of the system with its environment. The semantic features of TermWare language also essentially differs from conventional rewriting systems: Usually rewriting rules programmer uses the formal model of some subject domain to implement classic “closed” computation paradigm in functional style where input/output operations and interactions are not represented explicitly and are rather “side effects” than regular events. In TermWare formalism includes not only set of rewriting rules, but interactions with environment that is represented as dynamic base of the facts. Thus, instead of hiding imperative style of programming in side effects TermWare offers some kind of immersion of imperative operations into logic of declarative program. The term system is open in the sense that it is interactive and its behavior is determined not only by input terms, but by state of environment.

The outline of the paper is as follows. In the next section the formal model of TermWare language and the system are presented. An example of TermWare application in semantic approximation while analyzing source code for defects is described in section 4. Concluding remarks are given in section 5.




The model of TermWare is constructed as algebra of terms by expressions of the form f(x1 ..., xn) with variables and basic data types similarly to other untyped functional programming languages. Some well-known functional names can be shortcuted with infix syntax (x+y instead plus(x,y)) Computation is defined by sets of the rewriting rules together with policy of their application. Unlike traditional rewriting systems, rewriting rule is not a pair x y, but four x [ in ] y [out], where x, y - input and output terms; in, out - input and output reactions. Semantics of rule can be imperative describer as "rewrite x to y under condition in having applied to environment operation out". Environment is defined as class in object oriented language (Java in our case), so in and out are terms which includes calls for methods of environment class.

The main entity of termware language is term system, which consists from of set of rules, environment, name and strategy. I.e. term system itself is defined as a term of the form System(name, ruleset, facts, strategy )

where name is a name of the system, ruleset is a set of rewriting rules with interaction, facts is a database of the facts of environment, strategy is an algorithm of ruleset application.

Thus, TermWare gives a direct mapping of logical model of interactions to the programming language having clones of modern means of support of inheriting and a hierarchical name system. Detailed description of TermWare semantics is presented in [7].

3.1. AST terms.

The first naive, but practical approach to source code analysis presents an abstract syntax tree (AST) as a term in TermWare system. This term can be manipulated by means of facilities provided by TermWare system. Such representation can be seen as some form of symbiosis of AST and algebraic terms with opportunities of performing well-defined operations expressed as declarative rewriting rules.

As an illustrative example consider the following Java code:


package x;
class X {
int x() { return 1; }
}


which corresponds to the following term:


CompilationUnit(
PackageDeclaration(Name(cons(Identifier("x"),NIL))),
TypeDeclaration(Modifiers(1,NIL),
ClassOrInterfaceDeclaration(
class ,Identifier("X") ,NIL,
ExtendsList(NIL),
ImplementsList(NIL),
ClassOrInterfaceBody(
cons(
ClassOrInterfaceBodyDeclaration(
Modifiers(0),
MethodDeclaration(NIL,
int
MethodDeclarator(
Identifier("x"),
FormalParameters(NIL))
,NIL,
Block(
cons(
ReturnStatement(
IntegerLiteral("1")),
NIL)))
),
NIL
)))))

To walk over such terms one can use rewriting rules. For example following rule from JavaChecker project [8] trigger warning messsage on empty catch blocks:

catch($x,Block(NIL)) -> true
[ violationDiscovered(EmptyCatchBlock,
"empty catch block",$x) ].

3.2. Model terms

As one can see from previous examples syntax only manipulations are quite limited: Computations are local and one can check only relations with terms which are situated 'near' each other. Symbol categorization is absent; for example, in the sentence this.x=x we can not distinct member variable in the left part from parameter (or local variable) in the right part of the sentence.

TermWare solution for this problem is providing 'right' form of source representation, which include not only syntax tree, but some global context access object(mark) which involve functionality of semantics categorization of symbols and navigation on relations between them in semantics scope of analyzed program. Note, that exists other approach implemented in stratego system[10]: so called dynamic rules, which can be created and removed in runtime in depends of scope of processing. Such approach can solve problem of context avaibility (but not navigation).

So, termware provide model representation, differs from AST by embedding some extra parameter – source context which provides metainformation about current expression (similar to Java modeling elements of API in JSR 269 [11]). Since TermWare defines mapping of Java classes into terms, we can use object API for access to such information still preserving declarative style of rules.

For this purpose for each term x we will define the model term model(x) in following way:

For example, the model term for X.java, mentioned in previous section looks like following:


ClassOrInterfaceModel(
Modifiers(1),class,Identifier("X"),
NIL,NIL,NIL,
ClassOrInterfaceBody(
cons(MethodModel(Modifiers(0), NIL,
TypeRef(int,
jobject(JavaPrimitiveTypeModel@179dce4)
),
Identifier("x"),
FormalParameters(NIL),
NIL,
cons(
ReturnStatementModel(
IntegerLiteral("1"),
jobject(JavaPlaceContext@19bb25a)),
NIL),
jobject(@1e58cb8)),
NIL)),
jobject(@179935d)
)

where JavaPlaceContext incapsulates API for resolving of symbol terms and accese to semantics model of analyzed program. In this way our environment is reach enough to build strict algorithms of abstract interpretation, up to partial evaluation.



As an example, let us look on semantic approximation for resource leaks detection. Resource leak is a phenomenon of requested resource from operation system and middleware that is not released thereafter. Let have a look at following sample code:


public static void main(String[] args) {
FileReader reader = null;
try {
reader = new FileReader(args[1]);
}catch(FileNotFoundException ex){
ex.printStackTrace();
return;
}
for(int i=0; i<10; ++i) {
int ch=0;
try {
ch=reader.read();
}catch(IOException ex){
ex.printStackTrace();
return;
}
System.out.print(ch);
}
if (reader!=null) {
try {
reader.close();
}catch(IOException ex){
ex.printStackTrace();
return;
}
System.out.print(ch);
}
if (reader!=null) {
try {
reader.close();
}catch(IOException ex){
}
}
}

Here if an exception is occurred inside for loop than FileReader reader remains unclosed.

For the purpose of leak determination we can abstract program state as sequence of variables, which are opened, but not closed in current scope. Then, for each language construction we can either generate a set of posssible states or consider it as an operation over a state. Let the state will be one of:

Final state is correct (i. e. program is executed without resource leaks) if the resulting state does not contains OPENED elements.


Now, let's map Java model term to the execution semantics. A class is considered correct if all initializers and methods of this class are correct. Block is just a sequence of statements, and the real work begins at the level of statements and expressions.

Among all expressions, the most important for resource leak detection are allocation expressions and method calls. We think, that allocation of varibale cause OPENED state, when we create object, which implements Java Closeable interfade. Let's assume, that we create a new Closeable object on method call, if method can be named as so called 'Factory Method'. With allocation expressions situation is more interesting. In general, not every unclosed instance of Closeable interface can cause resource leak: there may exist some 'fake' Closeables which must implement interface but does not use system resources, such as StringWriter. Yet one problem is so-called 'ProxyAllocation' when the code


PrintStream x =
new PrintStream(new FileStream(fname));

can cause resource leak, but the code


StringOutputStream ss =
new StringOutputStream();
PrintStream x = new PrintStream(ss);

not.


So, for resolving such problems, model of program must contains information not only from source code, but from some other information source which can not be deduced from code. For this purpose mechanism of external annotations has been created: it is possible to annotated classes and methods, does not touch existing source code.



CheckExpression($var,
AllocationExpressionModel(TypeRef($tname,$type),$arguments,$ctx),$state)
[
$ctx.subtypeOrSame(
$type,
$ctx.resolveFullClassName(
"java.io.Closeable"))
&&
!($type.getAttribute(
"NotCloseable")==true)
]
-> CheckAllocationNotProxy(
$var,$type,$arguments,
$ctx,$state)
!-> CheckExpressions($arguments,
$state),


CheckAllocationNotProxy($var,$type,
$arguments,$ctx,
$state)
->
((
!($type.getAttribute(
"CloseableProxy")==true)
||
apply(
openclose::CloseableArguments,
apply(NormalizeExpressions,
$arguments))
) )
?
cons(
OPEN($var,$ctx.fileLine)
,$state)
:
CheckExpressions(
$var,$arguments,$state),


This is a rules for allocation expression. At first, look at conditions: by call of subtypeOrSame we determine: if allocated type is inherited from java.lang.Closeable. Attribute NotCloseable is a special attribute which must be set for classes, safe to leave without call of close(), such as java.io.StringWriter(). If such condition are met then we check that allocation is not 'ProxyAllocation' and add OPENED(x) to list of states

As example of other kind of rule --- rule for processing if statement:


CheckStatement(
IfStatementModel($expr,$s1,$s2,$ctx),
$state) ->
let ($evstate <- $state )


cond(
$expr,
CheckStatement($s1,
CheckExpression(NONE,$expr,$evstate))
&&
cond(not($expr),
CheckStatement($s2,
CheckExpression(NONE,
$expr,$evstate)))).

As one can see, this rules looks like a part of interpretation framework. At some level of abstraction, tracking resources can be described as abstract interpretation over states.


Now, our abstract state can be simplified, when we calls close method. This defines in next rule fragment:


CheckExpression($var,
MethodCallExpressionModel($obj,
$methodModel,
$arguments,$ctx),
$state)
.....
[
$ctx.subtypeOrSame(
ModelHelper.getType($obj),
$ctx.resolveFullClassName(
"java.io.Closeable")
)
&&
$methodModel.getName()=="close"
]
->
merge(CHANGE($obj,CLOSE),$state)
....

Where merge is operator on abstract state, defined by set of rules, which perform actual simplification of program state.


merge(CHANGE($x,CLOSE),[]) -> [ ],
merge(CHANGE($x,CLOSE),[$y:$z])
-> [checkOpenClose($x,$y),
merge(CHANGE($x,CLOSE),$z)]
and so on.

Specialization via partial evaluation

As one more application, consider a specialization of Java programs using simplified form of partial evaluations. This is natural way of extending previously described functionality, only one new element --- pretty printer for Java terms. Specialization of partial evaluation can be described as a process of mapping some Java program A and set of constants C to Java program mix(A,C), which work exactly as A on all datasets, which contains C. This is some

Typical rules are just recursive downhill of partial evaluation marker:


JPE(ClassOrInterfaceModel(
$modifiers,
$type,
$name,
$typeParameters,
$extendsList,
$implementsList,
$body,
$context))
->
ClassOrInterfaceModel(
$modifiers,
$type,
$name,
$typeParameters,
$extendsList,
$implementsList,
JPE($body),
$context),

JPE(ClassOrInterfaceBody($x))
-> ClassOrInterfaceBody(JPE($x)),

JPE([]) -> [],
JPE(cons($x,$y))
-> cons(JPE($x),JPE($y)),


When we change on expression level, JPE ruleset contains rules for performing constant computation:



JPE(AndExpressionModel($x,$y,$ctx))
->
AndExpressionModel(JPE($x),
JPE($y),$ctx),

AndExpressionModel(
BooleanLiteral(false),$y,$ctx)
->
BooleanLiteral(false),

AndExpressionModel($x,
BooleanLiteral(false),$ctx)
-> BooleanLiteral(false),

AndExpressionModel(BooleanLiteral(true),
$y,$ctx) -> $y,

AndExpressionModel($x,
BooleanLiteral(true),$ctx) -> $x,

And we must substitute variables from C by their values:


JPE(FieldModel($obj,$id,$fm,$ctx))
[ isJPEField($fm,$y) ] -> $y
|
[isStaticFinalLiteralField($fm,$y)
] -> $y
!-> FieldModel($obj,
$id,$fm,$ctx),

Process of partial evaluation implemented in TermWare transformation rules performs simplification of model terms and their transformation to syntactic AST terms. It can also be represented as abstract interpretation, where abstract states are specialized sources. All functionality is described by near 200 rules in less than 1000 lines of code.

Practical usage of partial evaluation is not only specialization but also optimization transformations which can be deduced from the fact that program works in specialized environment. Two main optimizations are method call devirtualization and elimination of unreachable code.

Elimination of unreachable code includes reachability analysis and removing classes and methods, unreachable from application entry point. This optimization is implemented as imperative algorithm, which runs after specialization.

Method call devirtualization is substitution of virtual method calls by non-virtual when program analysis shows that concrete type of method is statically known. In some cases such transformation can cause significant speedup of program execution. One of simple devirtualization strategies is just marking as final all classes which have no childs. Availability of childs can be checked by global source analysis. Note, that such optimization can not be performed by compiler or virtual machine, because compiler does not perform global analysis.

5. Conclusion

In this applications of term-rewriting framework TermWare to source code analysis and transformations have been considered. The framework is an example of real non-trivial application of term rewriting which is used in industrial strength [8],[9]. The combination of imperative and declarative approach appears more poverfull that declarative-only or imperative-only solution.

We believe that multiparadigm programming is the way to actually bring rewriting techniques to programming mainstream. Our future research efforts will be focused on better integration of object orientation, more precize model of java semantics (including stack emulation) and working with concerted models of multiple target languages to obtain multi-language transformation framework.



[1] Timothy Winkler, “Programming in OBJ and Maude”, in Functional Programming, Concurrency, Simulation and Automated Reasoning, International Lecture Series 1991--1992, Springer-Verlag, McMaster University, Hamilton, Ontario, Canada, 1993, pp. 229-277

[2] Visser E, “"Stratego: A language for program transformation based on rewriting strategies. System description of Stratego 0.5”, Rewriting Techniques and Applications (RTA 01) LNCS vol. 2051, Springer, 2001 pp. 357-361

[3] J.V.Kapitonova A.A.Letichevsky M.S.L'vov and V.A.Volkov. "Tools for solving problems in the scope of algebraic programming”, LNCS vol. 968, Springer-Verlag, 1995

[4] Ernest Friedman-Hill. "Jess in Action”, Manning Publications Co, 2003

[5] M. van den Brand and J. Heering and P. Klint and P. Olivier. ""Compiling Language Definitions: The {ASF}+{SDF} compiler”, University of Amsterdam, 1999

[6] Balland, Emilie and Brauner, Paul and Kopetz, Radu and Moreau, Pierre-Etienne and Reilles, “Tom: Piggybacking Rewriting on Java”, Proceedings of the 18th Conference on Rewriting Techniques”, {Springer-Verlag, 2007

[7] Anatoly E. Doroshenko and Ruslan Shevchenko. “A Rewriting Framework for Rule-Based Programming Dynamic Applications”, Fundam. Infom vol. 72, 2006, pp. 95-108

[8] Ruslan Shevchenko. “JavaChecker homepage: http://www.gradsoft.ua/products/javachecker_eng.html”, GradSoft

[9] Ruslan Shevchenko and Anatily Doroshenko. "Managing Business Logic with Symbolic Computations”,Lecture Notes in Informatics V. 30 - Proceeding of Information Systems Technology and its Applications 2003, Gesellschaft fur Informatik 2003

[10] Martin Bravenboer and Arthur van Dam and Karina Olmos and Eelco Visse. "Program Transformation with Scoped Dynamic Rewrite Rules”, Fundam. Inf. vol. 69 num 1-2, IOS Press, Amsterdam, The Netherlands, 2006

[11] JSR 269: Pluggable Annotation Processing API. , Java Community Process, Sun Microsystems,