Skip to content
5

Working with CDK to predict Toxicity

CDK as we all know (do we ?) is the preferred java library for making various models of molecular data. I have been using it to discover overlaps between toxic molecules. It is an attempt to find toxicity causing groups of atoms by comparing molecules that are toxic with each other.

The first part of the project was to find these overlapping strucutres. We call them MCS – Maximum Common Substructure. The MCS is found using the graph theory approach where a list of graph matches are found between two molecular structures (represented as graphs).

The graphs can be derived in two ways from the molecular structure.

1) We can treat the atoms of the molecules as the nodes on the graph and then draw a representation where the connections between the nodes reflect the actual connections between the atoms in the moleule.

2) We can treat the bonds as the nodes of the graph and the connections in the graph would then represent the relationships between the bonds in the molecule.

We are currently using the first approach, although to be perfectly rigorous about it we should be doing both and comparing the results. Using the first approach we found the list of overlaps between to molecules (being compared). Such pairwise comparisons were done throughout the database of molecules. Meaning we compared each molecule to every other and generated a list of all the overlaps.

This List was parsed to remove duplicates and finally a set of all unique overlap sections were found. These overlaps were then used to generate the fingerprint for each molecule. This was done using CDK again. The SMARTSQuery is a handy tool to check how many times a substructure occurs inside a molecule. Using this tool we generated a table of sorts where each molecule was represented by an entry in every column (‘ 0′ for no occurences and integer for the number of occurences).

We intend to throw this data at a neural network to see if any patterns can be detected. The objective is to use such pattern recognition to predict toxicities of molecules outside the initial database.

5 Comments Post a comment
  1. When I open up your RSS feed it seems to be a whole lot of unformatted html, is the issue on my end?

  2. Nov 20 2010

    Thank you for the helpful information! I wouldn’t have found this myself!

  3. Nov 21 2010

    No BS and written well, ty for the information

  4. Just thought I would comment and say neat theme, did you code it yourself? Really looks excellent!

  5. After study a few of the blog posts on your website now, and I truly like your way of blogging. I bookmarked it to my bookmark website list and will be checking back soon. Pls check out my web site as well and let me know what you think.

Share your thoughts, post a comment.

(required)
(required)

Note: HTML is allowed. Your email address will never be published.

Subscribe to comments

Get Adobe Flash playerPlugin by wpburn.com wordpress themes
This site employs the Ravatars plugin.