Ce tutoriel n’existe qu’en anglais… In this post, I explain how to use a Java program directly in R. As an example, I will use the Java program, clustering.jar, available here (jar file and documentation) to cluster the vertices of my facebook network (or, more precisely, of its largest connected component): the example dataset can be downloaded here (and was extracted as explained in this post found on R blogger. This tutorial was made possible thanks to the help of Damien (also known as bl0b) who explained me how to use the rJava package.

This post will show you how to cluster a graph and how to display it accordingly to the clustering:

I hope that all of my (facebook) friends can find themselves on this picture and are happy with their group… 😉

Pre-requisites

  • What you need to use Java in R is a first a proper Java environment installed on your computer. If you are a linux or a Mac OS X user, you can check it by using the command
    java -version

    which should give you something like

    java version "1.6.0_24"
    OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1)
    OpenJDK Server VM (build 20.0-b12, mixed mode)

    If you are a Windows user, well…, GIYF (but not me);</li>

  • also, you need the R package rJava to be installed so that R can use the Java environment;
  • finally, if you want to be able to run my example, you also need the R package igraph to handle graphs in R.
  • </ul>

    How does it work?

    First, the function

    .jinit()

    is used to initialize the Java Virtual Machine. It has to be called before any other function of the package. Then,

    .jaddClassPath('clustering.jar')

    adds the jar file clustering.jar to the class path. Finally, the function J can be used to call a Java method. To be able to see which Java class reference you have to pass to this function, you can use the following command line in a terminal (if you are a linux or a Mac OS X user)

    jar -t clustering.jar

    which gave me

    META-INF/MANIFEST.MF
    org/apiacoa/graph/clustering/DoCluster.class
    org/apiacoa/graph/clustering/GraphClusteringParameters.class
    org/apiacoa/graph/clustering/SignificanceMergePriorizer.class
    org/apiacoa/graph/clustering/MergePriorizer.class
    org/apiacoa/graph/Graph.class
    gnu/trove/TIntObjectHashMap.class
    ...

    giving me a clue (well, really, giving Damien a clue) about the fact that the main class might be called ‘org.apiacoa.graph.clustering.DoCluste‘. Hence, I can use this jar file in R by

    J('org.apiacoa.graph.clustering.DoCluster', 'main', c(...))

    where c(...) is the list of parameters that has to be passed to the jar program, as described in the documentation of the program:

    J('org.apiacoa.graph.clustering.DoCluster', 'main', c('-graph', graph.file, '-part', tmp.part, '-recursive', '-mod', tmp.mod, '-random', '100'))

    for instance.

    Finally, how to use it?

    In my case, the jar file takes as an input a text file (containing the edge list of the graph, graph.file in the example above) and produces one or two text files (containing the clustering and the value of the modularities tmp.part and tmp.mod in the example above). So I used it as follows:

    • I extracted the list of edges using the function get.edgelist (igraph) and exported it in a text file (in the working directory);
    • I created one or two temporary files names using the function tempfile() to export the results;
    • I read the temporary files from R and deleted them using the function unlink.

    which finally gave me the following function to use most of the options of the initial jar file directly in an R function:

    ## Requires rJava, igraph
    do.hierarchical.clustering = function(a.graph, reduction=0.25, verbose=0, debug=0, random=NULL, recursive=FALSE, termination='significance', minsize=4, recrandom=50, weights=NULL) {
      if (is.null(weights)) {
        el = get.edgelist(a.graph)
      } else {
        el = data.frame(get.edgelist(a.graph),get.edge.attribute(a.graph,weights))
      }
      write.table(el,row.names=FALSE,col.names=FALSE,file='tmp.el.txt')
      tmp.part = tempfile()
    
      .jinit()
      .jaddClassPath('clustering.jar')
      if (is.null(random)) {
        if (recursive) {
           J('org.apiacoa.graph.clustering.DoCluster', 'main', c('-graph', 'tmp.el.txt', '-part', tmp.part, '-reduction', reduction, '-verbose', verbose, '-debug', debug, '-recursive', '-termination', termination, '-minsize', minsize, '-recrandom', recrandom))
        } else {
          J('org.apiacoa.graph.clustering.DoCluster', 'main', c('-graph', 'tmp.el.txt', '-part', tmp.part, '-reduction', reduction, '-verbose', verbose, '-debug', debug))
        }
      } else {
        tmp.mod = tempfile()
        if (recursive) {
           J('org.apiacoa.graph.clustering.DoCluster', 'main', c('-graph', 'tmp.el.txt', '-part', tmp.part, '-reduction', reduction, '-verbose', verbose, '-debug', debug, '-random', random, '-mod', tmp.mod, '-recursive', '-termination', termination, '-minsize', minsize, '-recrandom', recrandom))
        } else {
          J('org.apiacoa.graph.clustering.DoCluster', 'main', c('-graph', 'tmp.el.txt', '-part', tmp.part, '-reduction', reduction, '-verbose', verbose, '-debug', debug, '-random', random, '-mod', tmp.mod))
        }
      }
    
      mod = NULL
      part = read.table(tmp.part,row.names=1)
      part = part+1
      names(part) = paste('h',1:ncol(part),sep='')
    	unlink(tmp.part)
      if (!is.null(random)) {
        mod = read.table(tmp.mod,stringsAsFactors=FALSE)
        unlink(tmp.mod)
        names(mod) = c('modularity','type')
      }
      unlink('tmp.el.txt')
      list('part'=part,'mod'=mod)
    }

    I can be used to cluster the vertices of my facebook network (the igraph object is called fbnet in this Rdata file; it models an unweighted graph so the argument weights in the R function must be equal to NULL) by

    # basic clustering
    res1 = do.hierarchical.clustering(fbnet, verbose=1)
    # basic clustering with significance test
    res2 = do.hierarchical.clustering(fbnet, verbose=1, random=100)
    # hierarchical clustering with significance test (results in a hierarchy with two levels)
    res3 = do.hierarchical.clustering(fbnet, random=100, recursive=TRUE, recrandom=100)

    The last clustering can be interpreted by

    by(res3$mod$modularity,res3$mod$type,max)
    res3$mod$type: Original
    [1] 0.5307591
    ------------------------------------------------------------------------------------- 
    res3$mod$type: Random
    [1] 0.2525655

    (showing that the clustering is actually significant compared to a random graph with similar a degree distribution) and

    library(RColorBrewer)
    my.pal = brewer.pal(8,"Set2")
    par(mar=rep(0,4))
    plot(fbnet,layout=layout.fruchterman.reingold, vertex.size=5, vertex.color=my.pal[res3$part[match(V(fbnet)$name,rownames(res3$part)),1]], vertex.frame.color=my.pal[res3$part[match(V(fbnet)$name,rownames(res3$part)),1]], vertex.label=V(fbnet)$initial, vertex.label.color="black", vertex.label.cex=0.7)

    that displays the graph as shown at the beginning of this post.

    </div>