KDnuggets Home » News » 2017 » Jun » Opinions, Interviews » The world’s first protein database for Machine Learning and AI ( 17:n25 )

Silver Blog, June 2017The world’s first protein database for Machine Learning and AI

dSPP is the world first interactive database of proteins for AI and Machine Learning, and is fully integrated with Keras and Tensorflow. You can access the database at peptone.io/dspp

By Kamil Tamiola, Founder of Peptone.

protein database for Machine Learning and AI
I am incredibly proud and excited to present the very first public product of Peptone, the Database of Structural Propensities of Proteins.

Database of Structural Propensities of Proteins (dSPP) is the world’s first interactive repository of structural and dynamic features of proteins with seamless integration for leading Machine Learning frameworks, Keras and Tensorflow.

dSPP is based on peer-reviewed research from leading academic institutions around the world involved in Nuclear Magnetic Resonance spectroscopy techniques for protein structure and disorder characterization. dSPP data are derived from solution and solid state Nuclear Magnetic Resonance spectroscopy experiments for 7200+ unrelated proteins studied under physiologically-relevant conditions.

dSPP is a unique source of information for Intrinsically Disordered Proteins (IDPs), which are a challenging class of proteins to study. IDPs are implicated in numerous debilitating human pathologies, including Alzheimer’s, Parkinson’s, prion diseases, molecular basis of cancer, HIV, HSV, HVC, ZIKVR, and many others.

Intrinsically Disordered Proteins

Structural interpretation of propensity score for MOAG-4 protein, extracted from dSPP database https://peptone.io/dspp/entry/dSPP27058_0. MOAG-4 is known to enhance the process of protein aggregation in animal brain models, thus accelerating an early onset of Parkinson’s disease.

dSPP data can be readily used by experimentalist to gain exclusive insight into structural stability of secondary structure motifs, as well as high throughput computational techniques, which aim to deliver realistic models of medically relevant proteins.

As opposed to binary (logits) secondary structure assignments available in other protein datasets for experimentalists and the machine learning community, dSPP data report on protein structure and local dynamics at the residue level with atomic resolution, as gauged from continuous structural propensity assignment in a range -1.0 to 1.0.

continuous structural propensity

dSPP experimental data were collected at physiologically-relevant conditions, rendering them absolutely unique for structure and disorder prediction methods that aim to tackle protein folding and stability in biologically and medically relevant contexts.

dSPP is equipped with intuitive user interface which offers seamless access to relevant decision data, original literature citations, and uniform rendering of Machine Learning data belonging to protein of interest.

Seamless dSPP integration with Keras and Tensorflow machine learning frameworks is achieved via dspp-keras Python package, available for download and setup in under 60 seconds time. Thus, virtually any person with basic understanding of machine learning can start experimenting with protein structure prediction methodology.

dSPP is the first publicly available product by Peptone with automated 14-day updatecycle, made specifically for continuous learning AI applications.

Scientific reference:

  • Structural Propensity Database Of Proteins. Kamil Tamiola, Matthew Michael Heberling, Jan Domanski. bioRxiv 144840; doi:https://doi.org/10.1101/144840 

Availability and Call to Action


  • We want to acknowledge Dr. Wenwei Zheng (NIDDK, US), Dr. Ruud Scheek (University of Groningen, NL) and Dr. Xavier Periole (Aarhus University, DK) for insightful comments and editorial suggestions concerning our dSPP paper.
  • François Chollet of Keras / Google is greatly acknowledged for insightful feedback on database interface and straightforward suggestions concerning Keras integration.
  • We extend sincere thanks to Alison Lowndes, Carlo Ruiz and Dr. Adam Grzywaczewski, (NVIDIA Corporation) for facilitating collaboration and access to DGX-1 supercomputer.
  • Jon Wedell (BMRB) is greatly acknowledged for technical support with NMR resonance assignment retrieval from BMRB.
  • We thank Dr. Frans A.A. Mulder (Aarhus University, DK) and Dr. Predrag Kukic (University of Cambridge, UK) for providing structural ensemble models of MOAG-4.
  • Lastly, we want to greatly acknowledge Mark Berger (NVIDIA Corporation) for overwhelming support throughout the execution of this project.

Press release

About Peptone

Founded in 2016 (Amsterdam, The Netherlands), Peptone offers state of the art solutions for protein biotechnology via Machine Learning and AI. We transform big data from public and private repositories into powerful predictive models and intuitive tools for protein production, stability, disorder, engineering, and directed evolution experiments, providing our clients with transparent and complementary software that saves time and yields precise research answers.

Original. Reposted with permission.

Bio: Kamil Tamiola is an entrepreneur and researcher with an extensive scientific background in supercomputing and structural biophysics of proteins.