Extraire son arbre du Mathematics Genealogy Project avec RScrapping the Mathematics Genealogy Project with R

Posted in R and tagged genealogy, igraph, R, XML on Sep 13, 2013

Having had a look at Erwan’s mathematical genealogy tree, I wanted to have a full overview on my mathematical ancestors as well. This is pretty much easy using the Mathematics Genealogy Project, at least

once you’ve found one of your mathematics ancestor in this database (I am lucky enough to have Henri Caussinus for mathematics grandpa’ and I must say this is not only a chance because I was able, thanks to him, to build my mathematics genealogy tree but also because he is one of the most optimistic man you could dream about, because he is a great philosopher with an handsome accent from Toulouse):

Portrait d'Henri Caussinus, mathématicien… par Universite_de_Toulouse
if you know how to scrap this web site automatically. I explain in this post how to do it automatically with R.

Scrapping the Mathematics Genealogy Project

First, I scrapped the Mathematics Genealogy Project, by using the R package XML. Basically, I started from Henri Caussinus’s MGP id (157063) and proceeded recursively to:

search for the name of the current person’s page;
find a way to extract the number of advisors he/she (actually, always he…) had, carefully taking care of all the cases (this part was the trickiest because I had to take into account several different cases, persons with no advisor, persons with several advisor or having done several thesis…);
extract the first MGP ids found on the web page according to the number of supervisors: these give access to the supervisors’ web page.

The process was stopped when no more advisors were found on the visiting webpages and I just had to add myself and my mathematics daddy (Louis Ferré) at the beginning of the data.

## MGP scrapping
library(XML)

base.url = "http://genealogy.math.ndsu.nodak.edu/id.php?id="
start.id  = 157063
genealogy = data.frame("id"=1,"mgp.id"=start.id,"name"=NA)
supervise = NULL

cur.person = 1
while (sum(is.na(genealogy$name)>0)) {
  print(cur.person)
  # create the url for the current person and extract the data from the web
  cur.mgp.id = genealogy$mgp.id[cur.person]
  cur.url = paste(base.url,cur.mgp.id,sep="")
  cur.page = htmlTreeParse(cur.url, useInternalNodes = TRUE, encoding="utf-8")
  
  # search for the mathematician's name
  cur.name = xpathApply(cur.page, "//h2", xmlValue)
  cur.name = gsub("\n","",cur.name)
  print(cur.name)
  genealogy$name[genealogy$id==cur.person] = cur.name
  
  # search for the number of supervisors
  nbadv = grep("Advisor",xpathApply(cur.page,"//p",xmlValue),value=TRUE)
  countadv = sum(sapply(nbadv, function(x) sum(unlist(sapply(1:10,function(ind)
                                  grep(as.character(ind),x))))))
  if (countadv==0) {
    if (length(grep("Unknown",nbadv))==0) countadv = 1
  }
  
  # search for the supervisors ids
  if (countadv>0) {
    advisors = xpathSApply(cur.page, "//a[contains(@href, 'id.php?id')]",
                            xmlAttrs)
    advisors = advisors[1:countadv]
    all.ids = sapply(advisors,function(x) gsub("[id.php?id=]","",x,perl=FALSE))
    # removed already existing supervisors
    existing.ids = all.ids[all.ids%in%genealogy[,2]]
    all.ids = setdiff(all.ids,existing.ids)
    if (length(all.ids)>0) {
      adv.data = data.frame("id"=seq(max(genealogy$id)+1,
                                      max(genealogy$id)+length(all.ids),by=1),
                             "mgp.id"=all.ids,"name"=rep(NA,length(all.ids)))
      # update supervise
      supervise = rbind(supervise,cbind(rep(cur.person,countadv),
                                         c(genealogy$id[match(existing.ids,
                                                              genealogy$mgp.id)],
                                           seq(max(genealogy$id)+1,
                                               max(genealogy$id)+length(all.ids),
                                              by=1))))
      # add new advisors
      genealogy = rbind(genealogy,adv.data)
    } else {
      # if no new advisors, just update supervise
      supervise = rbind(supervise,cbind(rep(cur.person,countadv),
                                         genealogy$id[match(existing.ids,
                                                              genealogy$mgp.id)]))
    }
  }
  
  cur.person = cur.person+1
}


save(supervise,genealogy,file="mygenealogy.rda")

# add Louis and me at the beginning of the list
genealogy$id = genealogy$id+2
supervise = supervise+2
starting.persons = data.frame("id"=c(1,2),"mgp.id"=c(NA,NA),
                               name=c("Nathalie Vialaneix","Louis Ferré"))
genealogy = rbind(starting.persons,genealogy)
supervise = rbind(rbind(c(1,2),c(2,3)),supervise)

Using data to display your genealogy tree

Then, I used the package R package igraph to built a igraph object from the collected data (persons’ names and supervising relations). None of the layouts available in the package was suited to display a genealogy tree so I tried to figure out a way to do it by myself. It was pretty much messy and did not strictly respect the genealogy order so I gave up in this direction and simply exported the igraph object to a dot file (using the name as “label” attributes):

# define the tree as an igraph object
library(igraph)
genealogy.tree = graph.data.frame(cbind(supervise[,2],supervise[,1]),
                                  directed=TRUE,vertices=genealogy)
save(supervise,genealogy,genealogy.tree,file="mynewgenealogy.rda")

# export it for graphviz
V(genealogy.tree)$label = V(genealogy.tree)$name
write.graph(genealogy.tree,file="genealogyTree.dot",format="dot")

readable by Graphviz:

dot -Tpng genealogyTree.dot > genealogyTree.png

So… there it is

Among my prestigious mathematics ancestors are Émile Borel, Pierre Simon de Laplace, Siméon Denis Poisson, Joseph Louis Lagrange, Leonhard Euler (but of course not Évariste Galois with whom I am secretly in love for years)… not that I can compare in any way to them!

This post is dedicated to my real grandpa whose life has always been nourishing my political opinions and who is gone last spring.</p> </div>

Interface utilisateur web pour R avec shinyR web user interface with shiny

Posted in R and tagged package R, R, shiny on Aug 17, 2013

This post follows this post and this other post that, respectively, described how to create a Graphical User Interface with R and the package RGtk2 and how to build an R package that includes this graphical user interface. Here, I use the previous functions (which aim at helping lazy students to perform easily what their teachers want: the package provides basic statistics for all variables included in a chosen dataset provided in a ‘csv’ file) to create a new version of the original wnaetw package. This version now includes a web user interface, using the R package shiny.

Update! The application is now hosted on rstudio server and can be directly used online here (since I wrote this post, I have also added boxplots and a button to download the data file and the results; also, I updated the code in July 2013 to fit the new syntax of shiny). You can have a look at the code on my github

git clone https://github.com/tuxette/wnaetw.git

I thank Nicolas who pointed me out the package.

Foreword…

I programed and documented several online applications with shiny. The ones that are currently running are:

the toy example ‘wnaetw’ which is used to provide simple statistics on a given dataset. It can be tested with this file (that contains basic statistics on my former first year students) and the source code is provided in this post as well as on github, using:
```
git clone https://github.com/tuxette/wnaetw.git
```
a shiny application to visualize and download your facebook data. The source code is provided on github, using:
```
git clone https://github.com/tuxette/fbs.git
```
the more serious ‘NiLeDAM’ WUI, which is a graphical user interface for the NiLeDAM package, currently hosted on R-Forge. This interface is dedicated to geologists to help them date monazites from electron microprobing. The source code is provided on github, using:
```
git clone https://github.com/tuxette/niledam.git
```
an application to illustrate some important concepts of my course M1102 (univariate descriptive statistics, DUT STID): view application. The source code is provided on github, using:
```
git clone https://github.com/tuxette/m1102.git
```

Installing shiny

First, you have to install shiny. As the package is not an official CRAN package, you first have to add the rstudio repository to your repository list. In R, run the following command line:

options(repos=c(RStudio='http://rstudio.org/_packages', getOption('repos')))

install.packages('shiny')

and chose the first repository of the list (its name starts with ‘0’).

Preparing the shiny interface

In the directory inst of your package source, create a subdirectory shiny-WUI where you will create two files ui.R and server.R. The ui.R file contains the description of the user interface. Mine is simply

# Set the working directory and list all csv files inside
# Define UI for dataset viewer application
shinyUI(pageWithSidebar(
  # Application title
  headerPanel("Welcome lazy student! You're currently using the program
              'What Nicolas's teacher wants' "),
  
  # Left hand side panel
  sidebarPanel(
    h2("Data importation"),
    
    # Button to import data
    fileInput('file1', 'Choose CSV/TXT File',
              accept=c('text/csv', 'text/comma-separated-values,text/plain')),
    # Various checkboxes and input fields to specify the data file format
    checkboxInput('header', ' Header?', TRUE),
    checkboxInput('rownames', ' Row names?', FALSE),
    selectInput('sep', 'Separator:',
                c(Comma=',',Semicolon=';',Tab='\t', Space=' '), 'Comma'),
    selectInput('quote', 'Quote:',
                c(None='','Double Quote'='"','Single Quote'="'"),
                 'Double Quote'),
    selectInput('dec', 'Decimal mark', c(Dot='.', Comma=','), 'Dot'),
    br(),
    # Simple texts
    p(HTML("Get the sources on github:

           
           git clone 
           https://github.com/tuxette/wnaetw.git</code>")),
    p(HTML("This application is kindly provided by
           
           Natty with the generous help of 
           Nicholas, 
           Arthur P. and 
           John ;-). It is distributed under
           the licence WTFPL."))
  ),

  # Main panel (on the right hand side)
  mainPanel(
    tabsetPanel(
      tabPanel("Data",
               h3("Basic user guide"),
               p(HTML("To run the application, import your data set using the 
                       import button on the left hand side panel. You data must
                       be supplied on the form of a text/csv file. An example
                       of a properly formatted file is provided at <a href=
'http://owncloud.nathalievialaneix.eu/apps/files_sharing/get.php?token=a4ccfca90d9c7928ceb6153929d4212bd90badc5'
                       >here</a> (it contains simple data on my former 
                       first-year students): this file is formatted using the 
                       default options of the left panel. If the importation is
                       done properly, the data are displayed on the main panel
                       and analyzed on the two other panels.")),
               p(HTML("


  Warning! 'wnaetw' is a free program provided
                         without any guarantee: please note that it does not
                         replace your brain. In particular, the dev team is not
                         responsible if a lazy student is not able to interpret
                         the function's outputs properly!!! (and if he thinks
                         that an average zip code is somehow informative...)

                       ")),br(),
               h3("The dataset you want to use is displayed below:"),
               p("(only the first 50 first rows if the dataset contains more
                  than 50 rows, and the first 10 columns if the dataset contains
                  more than 10 columns)"),
               tableOutput("view")
      ),
      
      # Numerical summary of the dataset,
      # coming from the function output$summary in server.R
      tabPanel("Summary",downloadButton('downloadSummary', 'Download Summary'),
               br(),br(),tableOutput("summary")),
      
      # Graphic
      # coming from the function output$boxplots in server.R
      tabPanel("Boxplots",
               textInput("main",strong("Graphic title:"), "Boxplots"),
               textInput("xlab",strong("X axis label:"), "Variables"),
               textInput("ylab",strong("Y axis label:"), ""),
               textInput("color","Color:","pink"),
               checkboxInput(inputId = "scale", label = " Scale variables?",
                             value = TRUE),
               plotOutput("boxplots")
      )
  ))
))

The server.R file contains the process that should be run when the user acts on the interface and send back the output$...results to ui.R. Mine is:

source("scripts/scripts.R")
library(e1071)
library(ineq)

shinyServer(function(input, output) {
  # Function that imports the data file
  dInput = reactive({
    in.file = input$file1
    
    if (is.null(in.file))
      return(NULL)
    
    if (input$rownames) {
      read.table(in.file$datapath, header=input$header, sep=input$sep,
               quote=input$quote, row.names=1, dec=input$dec)
    } else {
      read.table(in.file$datapath, header=input$header, sep=input$sep,
                 quote=input$quote, dec=input$dec)
    }
  })
  
  # Function that render the data file and passes it to ui.R
  output$view = renderTable({
    d.input = dInput()
    if (is.null(d.input)) return(NULL)
    if (ncol(d.input>10)) d.input = d.input[,1:10]
    head(dInput(), n=50)  
  })
  
  # Function that calculates the output sent to the main panel in ui.R
  output$summary = renderTable({
    d.input = dInput()
    t(apply.wmtw(d.input))
  })
  
  # Function that creates a download button
  output$downloadSummary = downloadHandler(
    filename = "summary.csv",
    content = function(file) {
      write.csv(apply.wmtw(dInput()), file)
  })

  # Function that makes a boxplot for the numeric variables in the data set
  output$boxplots = renderPlot({
    make.boxplot(dInput(),main=input$main,xlab=input$xlab,ylab=input$ylab,
                 scale=input$scale, col=input$color)
  })
})

You can test your package in R by loading the shiny package and runing the following command line

runApp("PATH-TO-R-PACKAGE/inst/shiny-WUI/")

your web browser should open and with the following interface (in the image below, a dataset has been previously selected and thus the WhatMyTeacherWants been used):

Finishing the package

To finish the package, you simply have to add in your R/ directory a function that enables the application to be started:

calculateWUI

Make sure that this function is properly documented in the man/ directory with a corresponding .Rd file. Then, the package can be build and use. Do you want to test it. Simply download it below:

Note (July, 2013): I haven’t updated this package with the new shiny syntax yet. It is probably not working anymore, though I did not test it.

wnaetw: What Nicolas’s Teacher Wants

This package does what Nicolas’s teacher wants with numerical variables. It seems pretty clear with just the title.

Version: 2.0
Date: 2012-12-02
Author: Nicolas A. Edwards, Arthur Gomez, Jonathan Mahe and Nathalie Villa-Vialaneix
Maintainer: Nathalie Villa-Vialaneix
License: WTFPL (>=2.0)
Depends: e1071, ineq, graphics, stats

Downloads:

Package source: wnaetw_2.0.tar.gz
Windows binary: wnaetw_2.0.zip
Reference manual: wnaetw.pdf

starts R and run:

library(wnaetw)
calculateWUI()

Enjoy, lazy student 😉

Pros and Cons

In summary, to create a graphical user interface, which solution is the best: RGtk2 or shiny?

Pro shiny: shiny is much, much easier to use than RGtk2. Coding with shiny is straightforward. Also, shiny does not require the Gtk2 environment to be installed on your computer which makes it easier to use for Windows and Mac users at least.
Cons shiny: shiny ~~is not~~ is now an official CRAN package and thus can ~~not~~ be listed as a dependancy package for packages you want to put on CRAN (and thus, this has to be removed of the “cons” section). Also, the functionalities proposed in shiny are far less flexible than the ones you have with RGtk2.

… make your choice!

Filtrer virus et spam sur son serveur mail avec amavis et spamassassin

Posted in Linux, Ubuntu serveur and tagged amavis, email, postfix, spam, spamassassin, ubuntu server, virus on Jul 26, 2013

Configurer son serveur mail sous Ubuntu serveur 12.04 pour que les emails soient vérifiés pour la recherche de virus et de spam.

Ce tutoriel est la suite des posts :

Il présente comment utiliser amavis, un logiciel de filtrage de contenu des e-mails, clamav, un logiciel de détection des virus, et spamassassin, un logiciel de détection des spams. Ce tutoriel est largement inspiré de l’excellent tutoriel de Christoph Hass que vous pouvez consulter ici pour des détails techniques plus poussés, ainsi que de ce lien pour la configuration de spamassassin.

Installer les packages suivants :

sudo apt-get install amavisd-new spamassassin clamav-daemon lha arj unrar-free zoo
nomarch cpio lzop cabextract p7zip rpm

La configuration de amavis est effectuée en éditant les fichiers du répertoire /etc/amavis/conf.d/. En particulier, commencez par éditer le fichier 50-user et à ajouter ou modifier les lignes suivantes :

$sa_spam_subject_tag="[MY-SPAM] ";
$sa_tag_level_deflt=3.0;
$sa_tag2_level_deflt=3.0;
$final_spam_destiny=D_PASS;
$spam_quarantine_to=undef;

La première ligne sert à définir le tag qui sera ajouté aux e-mails lorsque ceux-ci seront détectés comme SPAM : il suffira alors de créer une règle spécifique avec sieve pour les rediriger vers un répertoire adéquat. La deuxième et la troisième lignes servent à définir le niveau (score spamassassin) à partir duquel amavis taguera les e-mail comme étant des spams, soit uniquement dans l’en-tête, soit dans le sujet. La quatrième ligne indique que les messages détectés comme spam sont quand même envoyé à leurs destinataires mais avec un tag supplémentaire ajouté (d’autres politiques peuvent être choisies, qui ne délivrent pas l’e-mail aux destinataires). Enfin, la dernière ligne sert à définir le répertoire vers lequel les messages marqués SPAM doivent être dirigés ; la valeur “undef” laisse le destinataire configurer seul ses règles de gestion des e-mails et n’envoie les spams vers aucun répertoire particulier.

On redémarre et on vérifie ensuite le fonctionnement de amavis avec :

/etc/init.d/amavis restart
netstat -nap | grep 10024

qui doit répondre :

tcp        0      0 127.0.0.1:10024         0.0.0.0:*               LISTEN
28222/amavisd

Communication entre amavis et postfix

L’étape suivante consiste à faire communiquer amavis et postfix ; on commence par définir amavis comme filtre de contenu

postconf -e content_filter=smtp-amavis:[127.0.0.1]:10024
postconf -e receive_override_options=no_address_mappings

Ces options seront ajoutées au fichier /etc/postfix/main.cf ; la deuxième ligne sert à indiquer que le logiciel de gestion de contenu doit voir les adresses originales de l’e-mail et pas les résultats résultats des redirections, de la gestion des hôtes virtuels, etc. Il faut ensuite éditer le fichier /etc/postfix/master.cf pour y ajouter :

smtp-amavis unix -      -       n     -       2  smtp
    -o smtp_data_done_timeout=1200
    -o smtp_send_xforward_command=yes
    -o disable_dns_lookups=yes
    -o max_use=20
127.0.0.1:10025 inet n  -       -     -       -  smtpd
    -o content_filter=
    -o local_recipient_maps=
    -o relay_recipient_maps=
    -o smtpd_restriction_classes=
    -o smtpd_delay_reject=no
    -o smtpd_client_restrictions=permit_mynetworks,reject
    -o smtpd_helo_restrictions=
    -o smtpd_sender_restrictions=
    -o smtpd_recipient_restrictions=permit_mynetworks,reject
    -o smtpd_data_restrictions=reject_unauth_pipelining
    -o smtpd_end_of_data_restrictions=
    -o mynetworks=127.0.0.0/8
    -o smtpd_error_sleep_time=0
    -o smtpd_soft_error_limit=1001
    -o smtpd_hard_error_limit=1000
    -o smtpd_client_connection_count_limit=0
    -o smtpd_client_connection_rate_limit=0
    -o receive_override_options=no_header_body_checks,no_unknown_recipient_checks
    -o local_header_rewrite_clients=

puis redémarrer postfix :

sudo service postfix reload

FIX IT! Comment configurer amavis pour qu’il ne scanne pas le contenu des messages sortants ?

Communication entre amavis et clamav

Pour permettre à amavis d’utiliser clamav pour filtrer les virus, il faut ajouter amavis au groupe des utilisateurs clamav :

adduser clamav amavis
/etc/init.d/clamav-daemon restart

Configurer Spamassassin

amavis utilise spamassassin via l’appel direct du module perl de spamassassin. Si vous disposez déjà d’une base d’apprentissage d’e-mails personnels spam et non spam (de plusieurs milliers d’e-mails de préférence), il est recommandé d’utiliser cette base pour entraîner spamassassin. Pour lancer spamassassin, éditez le fichier etc/default/spamassassin pour y modifier la ligne suivante :

ENABLED = 1

Mes e-mails sont contenus dans le répertoire Maildir de mon dossier IMAP : les spams dans un répertoire .Junk et les autres dans des répertoires .INBOX.XXX ; l’apprentissage de spamassassin se fait alors avec :

sa-learn --spam /home/mail/domain-name.org/tuxette/Maildir/.Junk
sa-learn --ham /home/mail/domain-name.org/tuxette/Maildir/.INBOX.*

Pour finir…

Relancer tout :

/etc/init.d/amavis restart
/etc/init.d/clamav-daemon restart
/etc/init.d/spamassassin restart
/etc/init.d/postfix restart

La configuration est bien prise en compte si le fichier /etc/log/mail.log contient les informations suivantes :

Aug  5 07:59:01 hostname amavis[27182]: ANTI-VIRUS code      loaded
Aug  5 07:59:01 hostname amavis[27182]: ANTI-SPAM code       loaded
Aug  5 07:59:01 hostname amavis[27182]: ANTI-SPAM-SA code    loaded

La configuration peut être testée par l’envoi d’un fichier exemple fourni par spamassassin :

sendmail tuxette@domain-name.org < /usr/share/doc/spamassassin/examples/sample-spam.txt

L’e-mail reçu est tagué par [MY-SPAM] dans l’objet et contient des informations de ce type dans les en-têtes :

X-Virus-Scanned: Debian amavisd-new at domain-name.org
X-Spam-Flag: YES
X-Spam-Score: 7.019
X-Spam-Level: *******
X-Spam-Status: Yes, score=7.019 required=6.31 tests=[DKIM_SIGNED=0.1,
	DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_IMAGE_RATIO_04=0.61,
	HTML_MESSAGE=0.001, MIME_HTML_ONLY=1.105, SPF_PASS=-0.001,
	TVD_RCVD_SPACE_BRACKET=0.001, T_SURBL_MULTI1=0.01,
	T_URIBL_BLACK_OVERLAP=0.01, UNPARSEABLE_RELAY=0.001,
	URIBL_BLACK=1.775, URIBL_JP_SURBL=1.948, URIBL_WS_SURBL=1.659]
	autolearn=no

sexy.rgtk: un package R pour programmer des interfaces RGtk2sexy.rgtk: an R package for programming RGtk2 GUI in a user-friendly manner

Posted in R and tagged GUI, package, R, RGtk2, sexy-rgtk on Jun 28, 2013

sexy.rgtk est un package R qui permet de programmer facilement des interfaces utilisateur avec RGtk2. sexy.rgtk permet d’entrelacer la syntaxe RGtk2 avec une syntaxe plus intuitive, basée sur un système de fonctions/options.

Mainteneur : Damien Leroux (Unité MIAT, INRA de Toulouse)
Développeurs : Damien Leroux, Nathalie Villa-Vialaneix

Quelques ressources utiles pour `sexy.rgtk`

Le package (en version beta, sans documentation) peut être téléchargé sur la forge mulcyber (INRA Toulouse)
Le package a aussi été présenté aux 2èmes rencontres R, à Lyon (France), le 28 Juin 2013. L’article (résumé) de référence est disponible ici. Vous pouvez aussi consulter le diaporama :

Utiliser `sexy.rgtk`

Pour vous lancer dans l’utilisation de sexy.rgtk, il faut avoir installé le package RGtk2 et donc un environnement Gtk+ correct sur votre ordinateur. Vous trouverez sur ce post, des indications sur la manière de faire pour installer RGtk2 sur votre système (Linux/Windows/Mac OS X). Une bonne méthode pour commencer consiste à lancer la démonstration :

library("sexy.rgtk")
demo(wnaetw)

qui lancera l’interface graphique décrite dans ce post (originellement construite avec RGtk2) et vous montrera le script sexy.rgtk équivalent commenté.

Merci à Nicolas (AE) pour son idée originale de cette interface graphique et pour avoir testé sexy.rgtk sur Mac OS X.
sexy.rgtk is an R package for programming easily RGtk2 GUI. sexy.rgtk code can be mixed with RGtk2 code and is based on a function/option syntax that makes the code more intuitive.

Maintainer: Damien Leroux (Unité MIAT, INRA of Toulouse)
Authors: Damien Leroux, Nathalie Villa-Vialaneix

Useful links for `sexy.rgtk`

The package (beta version, no documentation) can be downloaded on the mulcyber forge (INRA Toulouse)
The package has also been presented at the 2èmes rencontres R, Lyon (France), on June 28th, 2013. The article (abstract) can be downloaded here. You can also have a look at the slides:

Using `sexy.rgtk`

If you want to use sexy.rgtk, you first need to install RGtk2 and thus a proper Gtk+ environment on your computer You can find on this post, indications on how to install RGtk2 on your system (Linux/Windows/Mac OS X). Then, you may want to have a look at the demo:

library("sexy.rgtk")
demo(wnaetw)

that will start the GUI described in this post (originally created with plain RGtk2 code). The demo will show you the sexy.rgtk code and a few comments.

Thank you to Nicolas (AE) for giving me the original idea of the GUI and also for having tested sexy.rgtk on Mac OS X.

Utiliser le module « Limit Login Attempts » de WordPress pour alimenter une liste noire

Posted in Apache/PHP/MySQL, Lignes de commande, Linux, R (misc), Ubuntu serveur and tagged apache, limit login attempts, modsecurity, sécurité, ubuntu server, wordpress on Jun 2, 2013

Ce post explique comment intégrer les outils “Limit Login Attempts” (qui est un plugin de WordPress) et Modsecurity (qui est un utilitaire apache) pour créer une liste noire d’IPs auxquelles on interdit l’accès au serveur apache entier.

J’ai régulièrement des tentatives d’intrusions frauduleuses pour mon blog wordpress et l’installation du module Limit Login Attempts de WordPress me permet d’en suivre l’évolution. Quand une adresse IP est bannie, je reçois un email de ce type :

2 tentatives d'accès ont échouées (1 bloqué(s)) depuis l'adresse IP: XX.XX.XXX.XX
Dernière tentative de l'utilisateur: admin
L'adresse IP a été bloquée pour 9999 minutes

Habituellement, le nombre de blocages est de quelques-uns par jour mais j’ai récemment subie une attaque d’environ 1600 blocages en l’espace de deux heures. Ce post explique comment utiliser les informations envoyées par e-mail pour alimenter la blacklist mise en place dans le module Modsecurity comme décrit dans ce post.

Récupérer la liste des IPs à bannir

Je récupère la liste des IPs à bannir directement au niveau des répertoires de mon serveur IMAP avec la commande :

grep "depuis l'adresse IP" bannedIP > all_blocked.txt

où bannedIP est le dossier IMAP contenant les e-mails envoyés par le plugin WordPress.

Fusionner les IPs à bannir avec le fichier `blacklist.txt`

La suite est effectuée à l’aide du logiciel R : elle consiste à fusionner la liste des adresses bloquées par le plugin WordPress avec la liste des adresses déjà blacklistées, blacklist.txt qui se situe dans le répertoire /etc/modsecurity. Pour cela, le script suivant est exécuté :

old.ips = read.table("blacklist.txt",stringsAsFactor=FALSE)
export.ips = unique(old.ips$V1)
emails.ips = read.table("all_blocked.txt",sep=" ",stringsAsFactor=FALSE)
emails.ips = unique(emails.ips$V5)
emails.ips = paste("/",emails.ips,"/",sep="")
export.ips = unique(union(export.ips,emails.ips))
write.table(export.ips,file="new_blacklist.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)

qui permet de créer un nouveau fichier new_blacklist.txt contenant toutes les IPs à blacklister entourées par le caractère “/” comme indiqué dans ce post. J’envoie ensuite simplement ce fichier sur mon serveur dans le répertoire /etc/modsecurity pour remplacer le fichier blacklist.txt.

tuxette-chix

a girly blog about linux and free software

Extraire son arbre du Mathematics Genealogy Project avec RScrapping the Mathematics Genealogy Project with R

Scrapping the Mathematics Genealogy Project

Using data to display your genealogy tree

So… there it is

Interface utilisateur web pour R avec shinyR web user interface with shiny

Foreword…

Installing shiny

Preparing the shiny interface

Finishing the package

wnaetw: What Nicolas’s Teacher Wants

Downloads:

Pros and Cons

Filtrer virus et spam sur son serveur mail avec amavis et spamassassin

Communication entre amavis et postfix

Communication entre amavis et clamav

Configurer Spamassassin

Pour finir…

sexy.rgtk: un package R pour programmer des interfaces RGtk2sexy.rgtk: an R package for programming RGtk2 GUI in a user-friendly manner

Quelques ressources utiles pour `sexy.rgtk`

Utiliser `sexy.rgtk`

Useful links for `sexy.rgtk`

Using `sexy.rgtk`

Utiliser le module « Limit Login Attempts » de WordPress pour alimenter une liste noire

Récupérer la liste des IPs à bannir

Fusionner les IPs à bannir avec le fichier `blacklist.txt`

About Me

Copyright Notice

Recent Posts

Categories

Tags

Blogroll

Archives

Scrapping the Mathematics Genealogy Project

Using data to display your genealogy tree

So… there it is

Foreword…

Installing shiny

Preparing the shiny interface

Finishing the package

wnaetw: What Nicolas’s Teacher Wants

Downloads:

Pros and Cons

Communication entre amavis et postfix

Communication entre amavis et clamav

Configurer Spamassassin

Pour finir…

Quelques ressources utiles pour sexy.rgtk

Utiliser sexy.rgtk

Useful links for sexy.rgtk

Using sexy.rgtk

Récupérer la liste des IPs à bannir

Fusionner les IPs à bannir avec le fichier blacklist.txt

About Me

Copyright Notice

Recent Posts

Categories

Tags

Blogroll

Archives

Quelques ressources utiles pour `sexy.rgtk`

Utiliser `sexy.rgtk`

Useful links for `sexy.rgtk`

Using `sexy.rgtk`

Fusionner les IPs à bannir avec le fichier `blacklist.txt`