Sunday 20 December 2015

Software analysis using JQAssistant

Last week, I was playing a bit with a very interesting tool: JQAssistant. JQAssistant parses and analyses source files and loads them into a Neo4j graph database. By default, JQAssistant will analyze Java source code files, but JQAssistant is built using a plugin based architecture. So, there are various plugins available. For example, there are plugins to analyse Maven Files, XML Files or JUnit test results. Hopefully, there will also be other language plugins be added in the future. Especially, C++ would be nice!
So, why would you want to have your source code in a graph database? Because this gives you a very powerful way to analyze your source code! JQAssistant creates a graph database with nodes and relations such as:
  • TYPE declares METHOD
  • TYPE declares FIELD
  • TYPE extends TYPE
  • TYPE implements TYPE
  • METHOD invokes METHOD
  • METHOD throws METHOD
  • METHOD reads FIELD
  • METHOD writes FIELD
  • METHOD annotated_by METHOD
  • ...
Each of the nodes also has properties like full qualified name, visibility, md5 hash (of classes) or signature and cyclomatic complexity for methods. As you can see, a big graph with lot's of information is created. This graph can now be queried for certain properties using the powerful Neo4j language. Several examples that come to my mind are:
  • It can be used to ensure architecture guidelines, such as "frontend classes may only call service layer classes, but not backend classes".
  • Ensure naming guidelines, "e.g. all Service classes must end in *Service or *ServiceImpl and frontend classes in a certain package must only be *Model or *Controller".
  • All unit tests must call an Assert.* method - otherwise the test does not test anything.
  • What are the most complex methods in my code that are not called from unit tests?
  • Analyze properties of your source, e.g. number of classes, methods etc.

Important to note is that there is a Maven plugin to execute JQAssistant during builds. This allows you to run JQAssistant queries during the build and for example let's you fail the built if certain architecture guidelines are not met.
We are doing this for QualityCheck already, even though we have only implemented a very simple check so far. This checks verifies that all unit tests match our name pattern ".*(Test|Test_.*)". Here is the relevant JQAssistant configuration my-rules.xml.
<jqa:jqassistant-rules xmlns:jqa="http://www.buschmais.com/jqassistant/core/analysis/rules/schema/v1.0">

    <constraint id="my-rules:TestClassName">
        <requiresConcept refId="junit4:TestClass" />
        <description>All JUnit test classes must have a name with suffix "Test".</description>
        <cypher><![CDATA[
            MATCH
                (t:Junit4:Test:Class)
            WHERE NOT
                t.name =~ ".*(Test|Test_.*)"
            RETURN
                t AS InvalidTestClass
        ]]></cypher>
    </constraint>

    <group id="default">
        <includeConstraint refId="my-rules:TestClassName" />
    </group>

</jqa:jqassistant-rules>
JQAssistant is even more useful for applications using QualityCheck! For example, QualityCheck encourages you to use the methods from Check.*, such as Check.notNull in all public methods of your classes and annotate methods usings @ArgumentsChecked. So, you could use JQAssistant> to find methods, annotated with @ArgumentsChecked, but who do not call any Check.* methods:
--
-- Find all methods having @ArgumentsChecked but not calling Check.* methods
--
MATCH
 (checkType:Type),
 (type:Type)-[:DECLARES]->(method:Method)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(checkAnnotationType:Type)
WHERE 
 checkType.fqn = 'net.sf.qualitycheck.Check' AND 
 method.visibility = 'public' AND
 checkAnnotationType.fqn='net.sf.qualitycheck.ArgumentsChecked' AND 
 NOT type.fqn =~ ".*(Test|Test_.*)" AND 
 NOT (method:Method)-[:INVOKES]->(:Method)<-[:DECLARES]-(checkType:Type)
RETURN
 method.signature AS Method, type.fqn as Type;
Additionally, the other way round should be checked, i.e. find all methods who call Check.* but that do not have the annotation.
--
-- Find methods calling Check.* but not having @ArgumentsChecked in all non-test classes
--
MATCH
 (checkType:Type)-[:DECLARES]->(checkMethod:Method),
 (type:Type)-[:DECLARES]->(method:Method)-[:INVOKES]->(checkMethod:Method),
 (checkAnnotationType:Type)
WHERE 
 checkType.fqn = 'net.sf.qualitycheck.Check' AND 
 method.visibility = 'public' AND
 checkAnnotationType.fqn='net.sf.qualitycheck.ArgumentsChecked' AND 
 NOT type.fqn =~ ".*(Test|Test_.*)" AND 
 NOT (method:Method)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(checkAnnotationType:Type)
RETURN
 checkMethod.name AS CHECK_METHOD_CALLED, method.signature AS Method, type.fqn as Type;
As a software architect, you should now create a third rule, which indicates which methods must use check and must have the annotation, i.e. all public methods in classes call "*ServiceImpl".
I hope this was a small introduction and gave you a small impression of the power of this tool. I would be happy to get some feedback on this and also share some questions that arise for the future of this.
  • Is there C++ support planned? Is someone working on a C++ plugin to parse C++ code?
  • How does this perform with big systems? Does anyone have used this on a larger system?
  • Is there a Sonar Qube integration? Can I show failed checks as violations or in charts in Sonar?
  • What are your use cases and queries?

Thursday 22 October 2015

Visualize dependencies of binaries and libraries on Linux

Update4: Albert Astals Cid mentioned that KDE maintains also a version of that script: draw_lib_dependencies

Update3: Marco Nelissen fixed an issue, that caused dependency resolution to break, as soon MAXDEPTH was reached once. This issue was fixed now and I am quite happy that this old script is still useful and even get's improved. The updated version can also be found at dependencies.sh. Version below fixed as well.

Update2: pfee made some more fixes. The script parses now the dependcies tree correctly using readelf and ldd so that only direct dependencies apear in the graph. The updated version can also be found at dependencies.sh


Update: Thanks to the feedback from pfee, I made some fixes to the script. The script is now also available for direct download dependencies.sh


Sometimes it is useful to know the library dependencies of an application or a library on Linux (or Unix). Especially OpenSource applications depend on lot's of libraries which in turn depend on other libraries again. So it is not always quite clear which dependencies your software has.


Imagine you want to package up your software for a customer and need to know on which libraries your software depends. Usually you know which libraries were used during development, but what are the dependencies of these libraries? You have to package all dependencies so that the customer can use and/or install your software.


I created a bash-script which uses ldd to find the dependencies of a binary on Linux and Graphviz to create a dependency graph out of this information. Benedikt Hauptmann had the idea to show dependencies as a graph - so I cannot take credits for that. Using this script I created the depency graph of TFORMer, the report generator we are developing at TEC-IT. The result is a nice graph showing all the dependencies a user has to have installed before using TFORMer.




Another beautiful graph is the one of PoDoFo. See below the graph of PoDoFo.




The dependencies of Firefox are way more complex than the examples shown above...




If you want to create a graph of your favorite application or library your self, get the script from here. I pulished the simple source code below. Graphviz is the only requirement. Usage is very simple, just pass an application or library as first parameter and the output image as second argument. The script will always create a PNG image:
./dependencies.sh /usr/bin/emacs emacs.png
./dependencies.sh /usr/local/lib/libpodofo.so \
                  podofo.png



The code of the script is as follows: (Warning: the style sheet cuts of some lines, so better download the script from dependencies.sh)


#!/bin/bash
 
# This is the maximum depth to which dependencies are resolved
MAXDEPTH=14
 
# analyze a given file on its
# dependecies using ldd and write
# the results to a given temporary file
#
# Usage: analyze [OUTPUTFILE] [INPUTFILE]
function analyze
{
    local OUT=$1
    local IN=$2
    local NAME=$(basename $IN)
 
    for i in $LIST
    do
        if [ "$i" == "$NAME" ];
        then
            # This file was already parsed
            return
        fi
    done
    # Put the file in the list of all files
    LIST="$LIST $NAME"
 
    DEPTH=$[$DEPTH + 1]
    if [ $DEPTH -ge $MAXDEPTH ];
        then
        echo "MAXDEPTH of $MAXDEPTH reached at file $IN."
        echo "Continuing with next file..."
 # Fix by Marco Nelissen for the case that MAXDEPTH was reached
 DEPTH=$[$DEPTH - 1]
        return
    fi
 
    echo "Parsing file:              $IN"
 
    $READELF $IN &> $READELFTMPFILE
    ELFRET=$?
 
    if [ $ELFRET != 0 ];
        then
        echo "ERROR: ELF reader returned error code $RET"
        echo "ERROR:"
        cat $TMPFILE
        echo "Aborting..."
        rm $TMPFILE
        rm $READELFTMPFILE
        rm $LDDTMPFILE
        exit 1
    fi
 
    DEPENDENCIES=$(cat $READELFTMPFILE | grep NEEDED | awk '{if (substr($NF,1,1) == "[") print substr($NF, 2, length($NF) - 2); else print $NF}')
 
    for DEP in $DEPENDENCIES;
    do
        if [ -n "$DEP" ];
        then
 
            ldd $IN &> $LDDTMPFILE
            LDDRET=$?
 
            if [ $LDDRET != 0 ];
                then
                echo "ERROR: ldd returned error code $RET"
                echo "ERROR:"
                cat $TMPFILE
                echo "Aborting..."
                rm $TMPFILE
                rm $READELFTMPFILE
                rm $LDDTMPFILE
                exit 1
            fi
 
            DEPPATH=$(grep $DEP $LDDTMPFILE | awk '{print $3}')
            if [ -n "$DEPPATH" ];
            then
                echo -e "  \"$NAME\" -> \"$DEP\";" >> $OUT
                analyze $OUT $DEPPATH
            fi
        fi
    done
 
    DEPTH=$[$DEPTH - 1]
}
 ########################################
# main                                 #
########################################
 if [ $# != 2 ];
    then
    echo "Usage:"
    echo "  $0 [filename] [outputimage]"
    echo ""
    echo "This tools analyses a shared library or an executable"
    echo "and generates a dependency graph as an image."
    echo ""
    echo "GraphViz must be installed for this tool to work."
    echo ""
    exit 1
fi
 DEPTH=0
INPUT=$1
OUTPUT=$2
TMPFILE=$(mktemp -t)
LDDTMPFILE=$(mktemp -t)
READELFTMPFILE=$(mktemp -t)
LIST=""
 if [ ! -e $INPUT ];
    then
    echo "ERROR: File not found: $INPUT"
    echo "Aborting..."
    exit 2
fi
 # Use either readelf or dump
# Linux has readelf, Solaris has dump
READELF=$(type readelf 2> /dev/null)
if [ $? != 0 ]; then
  READELF=$(type dump 2> /dev/null)
  if [ $? != 0 ]; then
    echo Unable to find ELF reader
    exit 1
  fi
  READELF="dump -Lv"
else
  READELF="readelf -d"
fi
 
 
 
echo "Analyzing dependencies of: $INPUT"
echo "Creating output as:        $OUTPUT"
echo ""
 
echo "digraph DependencyTree {" > $TMPFILE
echo "  \"$(basename $INPUT)\" [shape=box];" >> $TMPFILE
analyze $TMPFILE "$INPUT"
echo "}" >> $TMPFILE
 #cat $TMPFILE # output generated dotfile for debugging purposses
dot -Tpng $TMPFILE -o$OUTPUT
 
rm $LDDTMPFILE
rm $TMPFILE
 exit 0