Mastodons DECADE - "Découverte et exploitation des connaissances pour l'aide à la décision en chimie thérapeutique"
The objective of the project DECADE, financed in the context of the AAP MASTODONS 2017 (CNRS) is the development of a system that allows to exploit both knowledge derived from data and from domain experts in pharmaceutical chemistry. This requires studying new approaches to data mining, such as instantaneous pattern discovery, learning of user preferences and constraints, and integrating those preferences into the mining process.The concrete test bed of the approach is the identification and characterization of PAINS (Pan Assay Interference Compounds) but the overall goal is the development of general approaches that can be applied in other problem domains as well.
Consortium
The consortium brings together a number of French computer science labs: GREYC (Caen) - project leader, IRISA (Rennes), LI (Blois), LIFO (Orléans), LIRIS (Lyon), LORIA (Nancy), as well as two research institutes in pharmaceutical chemistry: CERMN (Caen) and ICOA (Orléans).Results
During the first year, we have addressed the PAINS prediction and characterization problem, using a recent data set from the biochemical literature. The mined fragments and decision tree predictors have been integrated into a tool allowing the prediction of new compounds and the visualization of fragments involved in the prediction.The abstract of the poster presenting the approach during the 8es journées de la Société Française de Chémoinformatique SFCi2017
The PrePeP prototype for download.
Building this tool requires addressing a number of different challenges: sub-sampling the very imbalanced data set, performing unsupervised stratified data sampling, reducing a very large chemical descripteur space, efficiently finding descriptor combinations (conjunctive subgroup descriptions or subgraphs) that are typical for (sub)classes, all while reducing redundancy among them.
Not all of those capacities are already present in PrePeP (hence a prototype, which will involve during the life-time of the project and possibly afterwards) but the consortium members have worked (and continue working) towards developing techniques that will be incorporated into PrePeP.
Meetings organized in the context of DECADE
Mini-symposium 08/11/2017 - 10/11/2017 -- LIRIS, Lyon, building Blaise PascalMercredi, 08/11/17
- 14h00 Albrecht Zimmermann: welcome
- 14h15 Srinivasan Parthasarathy: Scalable Data Analytics: The Role of Stratified Data Sharding
- 15h30 Aimene Belfodil: Mining Convex Polygon Patterns with Formal Concept Analysis
- 16h00 Coffee break
- 16h30 Vincent Leroy: Large-scale graph mining
- 17h00 Bertrand Cuissart: Computation of 2D pharmacophores from a dataset of molecules annotated with biological activities
- 17h30 Siegfried Nijssen: Ordering data rows & columns using convolution
Jeudi, 09/11/17
- 10h00 Ian Davidson: Human-guided machine learning
- 11h15 Coffee break
- 11h45 Thomas Lampert: Constrained Clustering: Why and How?
- 12h30 Lunch
- 14h00 Jefrey Lijffijt: Personalised Pattern Mining
- 15h15 Julien Velcin: Towards interpretable topic models
- 16h00 Coffee break
- 16h30 Lakhdar Sais: Towards cross-fertilization between data mining and constraints
- 17h00 Jordan Fréry: Efficient top rank optimization with gradient boosting for supervised anomaly detection
- 17h30 Marie Le Guilly: SQL query completion for data exploration
Vendredi, 10/11/17
- 09h00 Adnene Belfodil: Flashpoints: Discovering Exceptional Pairwise Behavior in Vote or Rating Datasets
- 09h30 Clément Gautrais: Purchase Signatures of Retail Customers
- 10h00 Coffee break
- 10h30 Moustafa Bensafi: What is neuro-science?
- 11h45 End
- 12h00 Lunch
Meeting 29/08/2017 -- INRIA Paris, 2 Rue Simone IFF, 75012 Paris
- 10h00 Maksim Koptelov: A method for finding substructural alerts for frequent hitters, and how to use them for prediction
- 11h00 Esther Galbrun: Redescription mining for relation numerical and structural descriptors
- 12h00 Lunch
- 13h30 Discussion how to proceed scientifically (sampling etc)
- 15h00 Discussion how to proceed logistically (symposium etc)
Kick-Off, May 17/18, Orléans, LIFO - Bâtiment IIIA, Rue Léonard de Vinci, B.P. 6759, F-45067 ORLEANS Cedex 2, 1er étage, espace communication
Information on how to get to LIFO is here
Wednesday, 17/05/17
- 12h30 déjeuner au RU l'Agora
- 14h00: welcome
- 14h15: introduction, some remarks regarding Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS (A. Zimmermann)
- 15h00: a short description of the available data (P. Bonnet)
- 15h15 - 18h00: free-flow discussion with a focus on data acquisition, selection, preparation
Diner au restaurant l'Ardoise (map), 20h
Thursday, 18/05/17
- 9h00: Caractérisation interactive de classes dans des données non-étiquetées par échantillonnage de motifs (A. Giacometti)
- 11h00: Two contributions to Humans in the loop in constrained clustering (Christel Vrain)
- 12h15: Round-up
- 12H30, Déjeuner au RU l'Agora pour ceux qui le veulent