Highlighting fingerprints with RDKit#

← Previous | Next →
July 21, 2022 () by Anders Lervik | Category: chemistry
Tagged: cheminformatics | chemistry | fingerprint | jupyter | molecules | python | rdkit

Introduction#

Fingerprints are interesting molecular descriptors, and RDKit can calculate many different fingerprints. Sometimes it can be informative to depict the on bits for fingerprints, and RDKit has methods for this. These methods focus on the environment of the central atom of the bit and do not show the whole molecule. Here is an example of how the whole molecule can be depicted together with the active bit(s), as in this image:

A molecule with an active bit depicted

Example: Drawing fingerprint bits#

https://mybinder.org/badge_logo.svg

We start by importing RDKit (and matplotlib for showing some colors):

[1]:
from rdkit import Chem
from rdkit.Chem import (
    AllChem,
    rdCoordGen,
)
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle
from IPython.display import SVG

plt.style.use("seaborn-notebook")
IPythonConsole.ipython_useSVG = True  # Use higher quality images for molecules

I will use niacinamide as an example. Let us create it from smiles and draw the molecule with RDKit:

[2]:
niacinamide = Chem.MolFromSmiles("c1cc(cnc1)C(=O)N")
rdCoordGen.AddCoords(niacinamide)
niacinamide
[2]:

And we calculate the Morgan fingerprint with a radius of 2:

[3]:
info = {}
fp = AllChem.GetMorganFingerprintAsBitVect(niacinamide, radius=2, bitInfo=info)

RDKit already has methods for depicting bits from fingerprints:

[4]:
on_bits = [(niacinamide, i, info) for i in fp.GetOnBits()]
labels = [str(i[1]) for i in on_bits]
Draw.DrawMorganBits(on_bits, molsPerRow=5, legends=labels)  # Draw the on bits
[4]:
posts_2022_drawfingerprint_drawfingerprint_8_0.svg

Now, we will draw the full molecule and highlight the atoms/bonds that are part of the bit.

First, we define the colors to use for the highlighting. Here, I use the colormap of Okabe & Ito:

[5]:
COLOR_TUPLES = [
    (230, 159, 0),
    (86, 180, 233),
    (0, 158, 115),
    (240, 228, 66),
    (0, 114, 178),
    (213, 94, 0),
    (204, 121, 167),
    (204, 204, 204),  # Gray, added by me
]
COLOR_FRAC = [tuple(x / 255 for x in color) for color in COLOR_TUPLES]
COLOR_MAP = {
    "Center atom": COLOR_FRAC[1],
    "Atom in a ring": COLOR_FRAC[6],
    "Aromatic atom": COLOR_FRAC[3],
    "Other atoms": COLOR_FRAC[7],
    "Bonds": COLOR_FRAC[7],
}

fig, ax = plt.subplots(constrained_layout=True)
for i, (txt, color) in enumerate(COLOR_MAP.items()):
    ax.text(0, i, txt, va="center", fontsize="xx-large", backgroundcolor=color)
ax.text(0, 5, "Colors:", fontsize="xx-large")
ax.set_xlim(0.0, 0.6)
ax.set_ylim(-1, 6)
ax.axis("off");
posts_2022_drawfingerprint_drawfingerprint_10_0.png

Next, we define some methods that will add colors for us:

[6]:
def get_atom_colors(molecule, atoms, centers=None):
    """Define some colors for different atoms."""
    colors = {}
    for atom in atoms:
        if centers is not None and atom in centers:
            colors[atom] = COLOR_MAP["Center atom"]
        else:
            if molecule.GetAtomWithIdx(atom).GetIsAromatic():
                colors[atom] = COLOR_MAP["Aromatic atom"]
            elif molecule.GetAtomWithIdx(atom).IsInRing():
                colors[atom] = COLOR_MAP["Atom in a ring"]
            else:
                colors[atom] = COLOR_MAP["Other atoms"]
    return colors


def get_bond_colors(bonds):
    """Define colors for bonds."""
    return {bond: COLOR_MAP["Bonds"] for bond in bonds}

Let us take a look at the fingerprint info we get from the method generating the fingerprint:

[7]:
print(info[1043])
((0, 2),)

The bits are a tuple of tuples on the form: ((center, radius),) where center is the index of the center atom, and radius is the radius used in the fingerprint. For depicting the fingerprint, we will first grab the environment of the center atom within the radius.

[8]:
def get_environment(molecule, center, radius):
    """Get the environment of a certain radius around a center atom."""
    if not molecule.GetNumConformers():
        rdDepictor.Compute2DCoords(mol)
    env = Chem.FindAtomEnvironmentOfRadiusN(molecule, radius, center)
    atoms = set([center])
    bonds = set([])
    for bond in env:
        atoms.add(molecule.GetBondWithIdx(bond).GetBeginAtomIdx())
        atoms.add(molecule.GetBondWithIdx(bond).GetEndAtomIdx())
        bonds.add(bond)
    atoms = list(atoms)
    bonds = list(bonds)

    atom_colors = get_atom_colors(molecule, atoms, centers=set([center]))
    bond_colors = get_bond_colors(bonds)

    return atoms, bonds, atom_colors, bond_colors

Let us try this on bit 1043 where the center was 0 and the radius was 2:

[9]:
atoms, bonds, _, _ = get_environment(niacinamide, 0, 2)
print("Atoms:", atoms)
print("Bonds:", bonds)
Atoms: [0, 1, 2, 4, 5]
Bonds: [0, 8, 4, 1]

Let us check that we got the correct atoms and bonds, by labeling them by their index:

[10]:
mol = Chem.Mol(niacinamide)
for i in atoms:
    mol.GetAtomWithIdx(i).SetProp("atomNote", f"A-{i}")
for i in bonds:
    mol.GetBondWithIdx(i).SetProp("bondNote", f"B-{i}")
mol.GetAtomWithIdx(0).SetProp("atomNote", f"A-0 (center)")
canv = Draw.rdMolDraw2D.MolDraw2DSVG(250, 250)  # or MolDraw2DSVG
canv.DrawMolecule(
    mol,
    highlightAtoms=atoms,
    highlightBonds=bonds,
    highlightBondColors={i: (0.8, 0.8, 0.8) for i in bonds},
)
canv.FinishDrawing()
SVG(canv.GetDrawingText())
[10]:
posts_2022_drawfingerprint_drawfingerprint_20_0.svg

So, we are able to pick out the correct atoms and bonds. We can also compare with the bit-drawing from RDKit:

[11]:
Draw.DrawMorganBit(niacinamide, 1043, info)
[11]:
posts_2022_drawfingerprint_drawfingerprint_22_0.svg

OK, so that seems to work fine. Let us put all that into a method that will draw a selected bit. Since each bit can occur several times in the same molecule, we will draw a grid to handle that:

[12]:
def draw_molecule_and_bit_info_grid(
    molecule, info, bit, max_examples=None, mols_per_row=3
):
    """Highlight a bit in a given molecule."""
    atoms_to_highlight = []
    bonds_to_highlight = []
    atoms_to_highlight_colors = []
    bonds_to_highlight_colors = []
    molecules_to_draw = []
    for i, example in enumerate(info[bit]):
        if max_examples is not None and i + 1 > max_examples:
            break
        center, radius = example
        atoms, bonds, atom_colors, bond_colors = get_environment(
            molecule, center, radius
        )
        atoms_to_highlight.append(atoms)
        bonds_to_highlight.append(bonds)
        atoms_to_highlight_colors.append(atom_colors)
        bonds_to_highlight_colors.append(bond_colors)
        molecules_to_draw.append(molecule)

    options = Draw.rdMolDraw2D.MolDrawOptions()
    options.prepareMolsForDrawing = True
    options.fillHighlights = True
    return Draw.MolsToGridImage(
        molecules_to_draw,
        molsPerRow=min(mols_per_row, len(molecules_to_draw)),
        subImgSize=(200, 200),
        highlightAtomLists=atoms_to_highlight,
        highlightBondLists=bonds_to_highlight,
        highlightAtomColors=atoms_to_highlight_colors,
        highlightBondColors=bonds_to_highlight_colors,
        drawOptions=options,
    )
[13]:
draw_molecule_and_bit_info_grid(niacinamide, info, 1043)
[13]:
posts_2022_drawfingerprint_drawfingerprint_25_0.svg

Let us also make a variant that will show all on bits, but we keep it to a single example per bit:

[14]:
def show_all_on_bits(molecule, info):
    """Draw all on bits"""
    atoms_to_highlight = []
    bonds_to_highlight = []
    atoms_to_highlight_colors = []
    bonds_to_highlight_colors = []
    molecules_to_draw = []
    legends = []
    for key in sorted(info.keys()):
        center, radius = info[key][0]
        legends.append(f"Bit {key}")
        atoms, bonds, atom_colors, bond_colors = get_environment(
            molecule, center, radius
        )
        atoms_to_highlight.append(atoms)
        bonds_to_highlight.append(bonds)
        atoms_to_highlight_colors.append(atom_colors)
        bonds_to_highlight_colors.append(bond_colors)
        molecules_to_draw.append(molecule)

    options = Draw.rdMolDraw2D.MolDrawOptions()
    options.prepareMolsForDrawing = True
    options.fillHighlights = True
    return Draw.MolsToGridImage(
        molecules_to_draw,
        molsPerRow=min(5, len(molecules_to_draw)),
        subImgSize=(150, 150),
        legends=legends,
        highlightAtomLists=atoms_to_highlight,
        highlightBondLists=bonds_to_highlight,
        highlightAtomColors=atoms_to_highlight_colors,
        highlightBondColors=bonds_to_highlight_colors,
        drawOptions=options,
    )
[15]:
show_all_on_bits(niacinamide, info)
[15]:
posts_2022_drawfingerprint_drawfingerprint_28_0.svg

Summary#

The code above can be used to depict Morgan fingerprint bits together with a molecule. This can make it easier to understand what the bit is measuring the presence of.