During my time as a data scientist at the Vienna-based pharma-tech startup: Phenaris one of my first projects was the development of an advanced similarity search algorithm that provides meaningful results for a (similar) compound-target interaction matrix (as shown below).

Requirements of the algorithm:
- Strong focus on keeping the same > similar scaffold
 - Find variations in side chains & functional groups
 - Find as many pharmacologically/toxicologically meaningful results as possible
 
Extensive benchmarking of fingerprints on diverse datasets resulted in an advanced similarity search algorithm that was more than twice as accurate (as defined by the project) as basic similarity searches, while still producing a large number of hits.
As you can see in the example image above, the matrix is sparse, because it only shows in vitro bioactivity values against the target. This sparked our interest, in the possibility of filling the matrix with in silico predictions using machine learning, and
led to the development of the “UNIVIE Toolbox for transporter modeling and off-target prediction”.
Check out the finished product at: https://www.phenaris.com/products/toxphacts/
(Free version is not based on the advanced similarity search!)
The text was written in English and then automatically translated.