Computational Pipeline

About PapayaDB

Repository of cathepsin-glycosaminoglycan in silico interactions

PapayaDB is a curated computational resource designed to organize structural models, molecular simulations and interaction descriptors for cathepsin–glycosaminoglycan complexes.

Why cathepsin–GAG interactions matter

Cysteine cathepsins are papain-like proteases involved in protein degradation, extracellular matrix remodeling and regulation of proteolytic activity. Their interactions with glycosaminoglycans (GAGs) can influence localization, stability, substrate recognition and enzymatic function.

As cathepsins and GAGs are both involved in extracellular matrix turnover, inflammation and tissue remodeling, their complexes are relevant to pathological processes such as bone resorption disorders, cancer progression, inflammatory diseases and connective tissue remodeling.

The structural data gap

Despite the biological relevance of cathepsin–GAG interactions, amount of experimentally resolved structures of these complexes remains limited. GAGs are periodic flexible chemically heterogeneous and highly charged molecules, which makes their binding modes difficult to capture using experimental structural biology alone.

PapayaDB was created to address this gap by providing a systematic in silico dataset of cathepsin–GAG complexes generated with molecular docking and molecular dynamics simulations.

What PapayaDB provides

Each PapayaDB record is organized by cathepsin, GAG class, oligosaccharide length and simulation method. Depending on dataset availability, records may include structures, trajectories, plots, contact maps, energy estimates, hydrogen-bond analyses and downloadable files.

Structural models

Structures of cathepsins, GAG oligosaccharides and predicted complexes.

Docking results

Binding poses and candidate interaction regions.

Molecular simulations

Trajectories of cathepsin–GAG complexes showing their behaviour over time.

Interaction descriptors

Quantitative summary in terms of RMSD, contact maps, hydrogen bonds, MM-GBSA and LIE estimates.

Metadata

Cathepsin name, GAG class, chain length, simulation method, identifiers and available files.

Three complementary simulation levels

PapayaDB combines several simulation strategies because no single representation captures every relevant feature of cathepsin–GAG recognition. All-atom simulations provide detailed local interactions, RS-REMD improves sampling for longer GAG chains, and coarse-grained simulations enable extended exploration of selected systems.

All-atom molecular dynamics

Represents every atom of the protein, GAG and solvent environment. In PapayaDB, all-atom simulations are used to describe short oligosaccharide complexes in high structural detail.

Coverage:
AA MD: cathepsin complexes with dp2, dp4 and dp6 GAGs.

Replica-exchange molecular dynamics

Used to improve conformational sampling and explore broader structural variability, especially for longer GAG chains.

Coverage:
RS-REMD: cathepsin complexes with dp16 GAGs.

Coarse-grained molecular dynamics

Reduces molecular detail to enable longer-timescale simulations and broader exploration of interaction patterns.

Coverage:
CG MD: cathepsin complexes with dp6 and dp16 heparin.

All-atom computational workflow

The all-atom dataset was generated through a multistage workflow combining electrostatic potential analysis, molecular docking, DBSCAN clustering, representative pose selection and molecular dynamics simulations.

Important note: This workflow refers to the all-atom molecular dynamics dataset.

All-atom computational workflow diagram
All-atom workflow used for docking, clustering and molecular dynamics simulation of cathepsin–GAG complexes. View Full-Size

Technical protocol specifications

The datasets in PapayaDB were generated using method-specific computational protocols. Parameters are shown to make the records easier to interpret, compare and reuse.

All-atom MD

Docking software:AutoDock 3.05
Docking grid:126 Å × 126 Å × 126 Å, 0.375 Å step
Pose selection:Top 50 docking poses
Clustering:DBSCAN
Representative selection:Up to three representative binding poses selected from the most favorable scoring clusters
MD engine:AMBER 20
Solvent:TIP3P water model
Force field:ff14SB for protein, GLYCAM06j for GAG
Production run:50 ns
Post-processing:RMSD, contact maps, hydrogen bonds, MM-GBSA, LIE and per-residue energy decomposition

RS-REMD

Chain length:dp16
Coverage:Each cathepsin–GAG pair
Purpose:Enhanced conformational sampling
Protocol details:Detailed RS-REMD parameters will be provided with the corresponding dataset records.

Coarse-grained MD

Chain length:dp6, dp16
GAG class:Heparin
Representation:SUGRES-compatible heparin representation
Purpose:Extended-timescale exploration
Protocol details:Detailed coarse-grained simulation parameters will be provided with the corresponding dataset records.

Simulation-derived descriptors

PapayaDB records include simulation-derived descriptors that summarize structural stability, interaction persistence and approximate energetic properties of cathepsin–GAG complexes. These descriptors are intended to support comparative interpretation across cathepsins, GAG classes, chain lengths and simulation methods.

Descriptor What it describes How to interpret it
Molecular dynamics trajectory Time-dependent evolution of the molecular system during simulation. Allows inspection of how the cathepsin–GAG complex behaves over time, including movement, flexibility and stability of the binding mode.
Protein RMSD Root mean square deviation of protein atomic positions relative to a reference structure. Lower and stable RMSD values suggest that the protein structure remains close to the starting or reference conformation. Larger shifts may indicate conformational rearrangement.
GAG RMSD Root mean square deviation of GAG atomic positions relative to a reference structure. Helps assess GAG mobility and conformational variability during simulation. Higher values may reflect flexible or changing binding modes.
MM-GBSA binding free energy Approximate binding free-energy estimate calculated from minimized molecular dynamics trajectory structures with implicit solvent treatment. Useful as a qualitative descriptor of complex stability. Values should be interpreted comparatively, not as direct experimental affinities.
Per-residue MM-GBSA energy decomposition Estimated contribution of individual protein residues to the MM-GBSA binding free energy. Helps identify residues that contribute favorably or unfavorably to GAG binding and may indicate potential GAG-recognition regions on the cathepsin surface.
Linear interaction energy — LIE Approximate interaction-energy descriptor based on electrostatic and van der Waals interactions extracted directly from the molecular dynamics trajectory. Provides an additional energetic summary of protein–GAG interaction strength and trends across related systems.
Hydrogen bonds Polar interactions between donor and acceptor atoms, defined using geometric distance and angle criteria. Persistent hydrogen bonds may indicate specific stabilizing contacts between GAG functional groups and protein residues.
Contact maps Normalized representation of contacts between GAG units and protein residues during simulation. Values close to 1 indicate contacts maintained for most or all of the simulation. Values close to 0 indicate rare or absent contacts.

Important interpretation note

Affinity estimators in PapayaDB, based on MM-GBSA and LIE calculations, are intended for qualitative and comparative interpretation. They should not be treated as direct quantitative measurements of experimental binding affinity.

Database coverage

PapayaDB integrates multiple simulation datasets to support comparison across protein family members, GAG classes, chain lengths and simulation resolutions.

Dataset availability notice

The database coverage numbers describe the total number of simulations and complexes generated within the PapayaDB project. Not all generated records have been fully entered into the public web interface yet. Record upload and validation are ongoing, and the complete dataset is expected to be available by autumn 2026.

11
Human cysteine cathepsins
6
GAG classes
198
All-atom MD complexes
~1700
All-atom MD simulations
132
RS-REMD complexes
~1100
RS-REMD simulations
22
Coarse-grained MD complexes
~2000
Coarse-grained MD simulations
All-atom MD:

cathepsin complexes with dp2, dp4 and dp6 GAGs.

RS-REMD:

cathepsin complexes with dp16 GAGs

Coarse-grained MD:

cathepsin complexes with dp6 and dp16 heparin.

FAIR-oriented data organization

PapayaDB is organized to support findability, accessibility, interoperability and reuse of cathepsin–GAG simulation data.

F Findable

Records are organized by cathepsin, GAG class, chain length, simulation method and identifiers.

A Accessible

Available structures, plots, descriptors and downloadable files are grouped at the record level for direct inspection and reuse.

I Interoperable

Where available, records reference external identifiers and standards such as PDB, UniProt, ChEBI, GlyTouCan and GlycoCT.

R Reusable

Transparent protocols, metadata and descriptor definitions support comparison and reuse across related systems.

Data reuse

PapayaDB data are provided as a research-oriented computational resource for browsing, comparison and interpretation of cathepsin–glycosaminoglycan interaction models. The database includes computationally generated structural models, simulation-derived descriptors, plots, metadata and downloadable files.

The data may be used for academic, educational and non-commercial research purposes, provided that PapayaDB and the relevant associated publication or release are cited. Formal citation information will be provided with the first public release or associated publication.

Users should note that PapayaDB binding affinity records are based on in silico models. They are intended for qualitative and comparative interpretation across related systems and should not be treated for direct comparison with experimental binding affinity.

External identifiers and referenced resources, such as PDB, UniProt, ChEBI, GlyTouCan and GlycoCT, remain subject to their own database terms, licenses and citation requirements.

For reuse beyond academic browsing, citation, teaching or non-commercial research, please contact the PapayaDB team.

Reuse guidance

  • Academic use Allowed for research, comparison, teaching and interpretation with proper citation.
  • Citation Cite PapayaDB and the associated publication/release once available.
  • Interpretation Treat simulation descriptors as qualitative and comparative, not as experimental affinities.
  • External resources Follow original terms and citation rules for PDB, UniProt, ChEBI, GlyTouCan and GlycoCT.
  • Extended reuse Contact the PapayaDB team for broader reuse or redistribution.

Selected references and related studies

A curated list of publications related to cathepsin–GAG interactions, molecular docking, molecular dynamics protocols, force fields and FAIR glycomics resources will be provided here.

Development and citation notice

PapayaDB is currently under active development. The associated manuscript has not yet been submitted. Manuscript submission is planned for October 2026. Until the manuscript and formal citation details are available, please do not cite PapayaDB as a published resource. For questions regarding citation, collaboration or early-stage use of the database, please contact the PapayaDB team at contact@papayadb.org.

Cathepsin–GAG interactions

  • Orignal manuscript in preparation. Submission planned for October 2026. All updates will be posted.

Molecular docking and molecular dynamics

  • to be inserted

Force fields and simulation protocols

  • to be inserted

Glycomics identifiers and FAIR data resources

  • to be inserted