... for everything else, there's publication data bases

Mastodons DECADE - "Découverte et exploitation des connaissances pour l'aide à la décision en chimie thérapeutique"

The objective of the project DECADE, financed in the context of the AAP MASTODONS 2017 (CNRS) is the development of a system that allows to exploit both knowledge derived from data and from domain experts in pharmaceutical chemistry. This requires studying new approaches to data mining, such as instantaneous pattern discovery, learning of user preferences and constraints, and integrating those preferences into the mining process.

The concrete test bed of the approach is the identification and characterization of PAINS (Pan Assay Interference Compounds) but the overall goal is the development of general approaches that can be applied in other problem domains as well.


The consortium brings together a number of French computer science labs: GREYC (Caen) - project leader, IRISA (Rennes), LI (Blois), LIFO (Orléans), LIRIS (Lyon), LORIA (Nancy), as well as two research institutes in pharmaceutical chemistry: CERMN (Caen) and ICOA (Orléans).


During the first year, we have addressed the PAINS prediction and characterization problem, using a recent data set from the biochemical literature. The mined fragments and decision tree predictors have been integrated into a tool allowing the prediction of new compounds and the visualization of fragments involved in the prediction.

The abstract of the poster presenting the approach during the 8es journées de la Société Française de Chémoinformatique SFCi2017

The PrePeP prototype for download.

Building this tool requires addressing a number of different challenges: sub-sampling the very imbalanced data set, performing unsupervised stratified data sampling, reducing a very large chemical descripteur space, efficiently finding descriptor combinations (conjunctive subgroup descriptions or subgraphs) that are typical for (sub)classes, all while reducing redundancy among them.

Not all of those capacities are already present in PrePeP (hence a prototype, which will involve during the life-time of the project and possibly afterwards) but the consortium members have worked (and continue working) towards developing techniques that will be incorporated into PrePeP.

Meetings organized in the context of DECADE

Mini-symposium 08/11/2017 - 10/11/2017 -- LIRIS, Lyon, building Blaise Pascal

Mercredi, 08/11/17

Jeudi, 09/11/17

Vendredi, 10/11/17

Meeting 29/08/2017 -- INRIA Paris, 2 Rue Simone IFF, 75012 Paris
  • 10h00 Maksim Koptelov: A method for finding substructural alerts for frequent hitters, and how to use them for prediction
  • 11h00 Esther Galbrun: Redescription mining for relation numerical and structural descriptors
  • 12h00 Lunch
  • 13h30 Discussion how to proceed scientifically (sampling etc)
  • 15h00 Discussion how to proceed logistically (symposium etc)

Kick-Off, May 17/18, Orléans, LIFO - Bâtiment IIIA, Rue Léonard de Vinci, B.P. 6759, F-45067 ORLEANS Cedex 2, 1er étage, espace communication
Information on how to get to LIFO is here

Wednesday, 17/05/17

  • 12h30 déjeuner au RU l'Agora
  • 14h00: welcome
  • 14h15: introduction, some remarks regarding Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS (A. Zimmermann)
  • 15h00: a short description of the available data (P. Bonnet)
  • 15h15 - 18h00: free-flow discussion with a focus on data acquisition, selection, preparation

Diner au restaurant l'Ardoise (map), 20h

Thursday, 18/05/17

  • 9h00: Caractérisation interactive de classes dans des données non-étiquetées par échantillonnage de motifs (A. Giacometti)
  • 11h00: Two contributions to Humans in the loop in constrained clustering (Christel Vrain)
  • 12h15: Round-up
  • 12H30, Déjeuner au RU l'Agora pour ceux qui le veulent