Skip to content

ElhabashyLab/PDB2Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

101 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDB2Net

PDB2Net automatically extracts Protein Interaction Networks (PINs) from PDB/mmCIF files and visualizes them as Cytoscape networks.
It uses Gemmi for structure parsing, SciPy cKDTree for distance-based interaction detection, and BLAST+ for UniProt annotation of unidentified chains.

Features

  • Automatic parsing of .pdb, .cif, and .mmCIF structures
  • Distance-based chain interaction detection
  • Protein-level and chain-level networks
  • Full UniProt annotation via SIFTS and BLAST+
  • Export of chain, protein, and combined networks (CX2 format)

System Requirements & Setup

1️⃣ Install Python 3.11 or 3.12

  • Recommended Version: Python 3.11
  • Download Python
  • Ensure that pip is installed:

python -m ensurepip --default-pip

2️⃣ Install Required Libraries

pip install -r requirements.txt

3️⃣ Install Cytoscape

  • Download Cytoscape 3.10.4 or newer:
    Cytoscape Download
  • Start once manually, so it can auto-launch later via PDB2Net.
  • On headless servers, Cytoscape is automatically disabled (open_in_cytoscape = false).

4️⃣ Reference Data (required)

File Source Purpose
pdb_seqres.txt https://www.rcsb.org/downloads/fasta PDB single-FASTA (chains)
pdb_chain_uniprot.tsv https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html PDB ⇄ UniProt mapping (SIFTS)
uniprot_sprot.fasta https://www.uniprot.org/uniprotkb?query=reviewed:true Swiss-Prot for building BLAST DB

5️⃣ Setting up BLAST for UniProt Matching

Download & Install BLAST+

  1. Go to the NCBI BLAST+ Download page:
    🔗 NCBI BLAST+ Download
  2. Download the correct version for your OS:
    • Windows: Download ncbi-blast-*-win64.exe
    • Linux: Download ncbi-blast-*-x64-linux.tar.gz
    • MacOS: Download ncbi-blast-*-universal-macosx.tar.gz
  3. Install BLAST+:
    • Windows: Run the .exe file and follow the installation wizard.
    • Linux/MacOS: Extract the files and move them to /usr/local/bin:
      tar -xvzf ncbi-blast-*-x64-linux.tar.gz
      sudo mv ncbi-blast-* /usr/local/bin

6️⃣ Create the BLAST Database

Now, generate the BLAST database from the downloaded UniProt FASTA file.

  1. Open a terminal (Linux/Mac) or PowerShell/Git Bash (Windows).

  2. Run the following command:

    makeblastdb -in C:/blast_db/uniprot_sprot.fasta -dbtype prot -out C:/blast_db/uniprot_db

    Explanation:

    • -in → Input FASTA file.
    • -dbtype prot → Specifies a protein database.
    • -out → Output database name (uniprot_db).
  3. Expected output:

    Building a new DB, current time: 03/16/2025 12:45:32
    New DB name:   C:/blast_db/uniprot_db
    Number of sequences: 570,000
    

    This confirms that BLAST has successfully created the database.


⚙️ Configuration (Multi-Layer)

PDB2Net loads configuration in layers — later files override earlier ones:

  1. configs/config.base.json — shared defaults
  2. configs/config.{windows|linux|darwin}.json — OS-specific overrides
  3. configs/config.local.json — user machine settings (git-ignored)
  4. Environment variables — highest priority

🗂️ Paths support ~ and $VARS expansion.

Core keys (by file)

config.base.json(defaults):

{
  "networks": {
    "chain_per_pdb": true,
    "combined_chain_network": true,
    "protein_per_pdb": true,
    "combined_protein_network": true
  },
  "distance_thresholds": { "ca_radius": 15.0, "all_atoms_radius": 5.0 },
  "workers": { "parsing": "auto", "blast_threads": "auto" },
  "keep_last_n_networks": 46,
  "export_detailed_interactions": true
}

OS examples (adjust to your system):

  • config.windows.json
{
  "input_folder_path": "E:/PDB_Files/Test500",
  "pdb_fasta_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/pdb_seqres.txt",
  "uniprot_fasta_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/uniprot_sprot.fasta",
  "sifts_tsv_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/pdb_chain_uniprot.tsv",
  "output_path": "D:/Networks",
  "cytoscape_path": "C:/Program Files/Cytoscape_v3.10.4/Cytoscape.exe",
  "blast_db_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/blast_db",
  "blastp_executable": "C:/Program Files/NCBI/blast-2.17.0+/bin/blastp.exe",
  "open_in_cytoscape": true
}
  • config.linux.json
{
  "input_folder_path": "/data/pdb_inputs",
  "pdb_fasta_path": "/data/reference/pdb_seqres.txt",
  "uniprot_fasta_path": "/data/reference/uniprot_sprot.fasta",
  "sifts_tsv_path": "/data/reference/pdb_chain_uniprot.tsv",
  "output_path": "/srv/pdb2net_outputs",
  "blast_db_path": "/data/reference/blast_db",
  "blastp_executable": "blastp",
  "open_in_cytoscape": false
}
  • config.darwin.json (macOS)
{
  "input_folder_path": "$HOME/pdb2net/pdb_inputs",
  "pdb_fasta_path": "$HOME/pdb2net/reference/pdb_seqres.txt",
  "uniprot_fasta_path": "$HOME/pdb2net/reference/uniprot_sprot.fasta",
  "sifts_tsv_path": "$HOME/pdb2net/reference/pdb_chain_uniprot.tsv",
  "output_path": "$HOME/pdb2net/outputs",
  "blast_db_path": "$HOME/pdb2net/reference/blast_db",
  "blastp_executable": "blastp",
  "open_in_cytoscape": true,
  "cytoscape_path": "/Applications/Cytoscape.app/Contents/MacOS/Cytoscape"
}

Environment variable overrides

You can override individual settings via ENV:

ENV var Maps to config key
PDB2NET_INPUT input_folder_path
PDB2NET_OUTPUT output_path
PDB2NET_PDB_FASTA pdb_fasta_path
PDB2NET_UNIPROT_FASTA uniprot_fasta_path
PDB2NET_SIFTS_TSV sifts_tsv_path
PDB2NET_CYTO_PATH cytoscape_path
PDB2NET_BLAST_DB blast_db_path
PDB2NET_BLASTP blastp_executable
PDB2NET_OPEN_IN_CYTOSCAPE open_in_cytoscape (true/false/1/0/yes/no)
PDB2NET_WORKERS_PARSING workers.parsing (auto or int)
PDB2NET_WORKERS_BLAST workers.blast_threads (auto or int)
PDB2NET_CA_RADIUS distance_thresholds.ca_radius
PDB2NET_ALL_ATOMS_RADIUS distance_thresholds.all_atoms_radius

Examples:

Windows PowerShell:

setx PDB2NET_INPUT "E:\PDB_Files\Dataset"
setx PDB2NET_OUTPUT "E:\Networks"
setx PDB2NET_OPEN_IN_CYTOSCAPE "true"

Linux/macOS:

export PDB2NET_INPUT=~/pdb2net/pdb_inputs
export PDB2NET_OUTPUT=~/pdb2net/outputs
export PDB2NET_OPEN_IN_CYTOSCAPE=false

Run the Tool

Once all dependencies are installed, you can run the tool with:

python main.py

  • Output goes to a timestamped subfolder in output_path, e.g.: ""/…/Networks/2025-10-20_18-32-45/"

User input

Valid PDB/mmCIF files found in input_folder_path

Outputs

File/Folder Description
log.txt Timing summary (parsing, classification, BLAST, interaction, exports)
*.cx2 Cytoscape networks (Chain/Protein/Combined), portable CX2
detailed_interactions.csv Per-atom residue/atom distance pairs (if export_detailed_interactions: true)
error_in_batch_log/ Batch/runtime logs

Network types

PDB2Net generates several network representations:

  1. Chain Interaction Network (per PDB) — Nodes: chains; Edges: interactions
  2. Combined Chain Network — All chains across all PDBs
  3. Protein Network (per PDB) — Nodes: UniProt IDs; Edges aggregated over chains
  4. Combined Protein Network — UniProt nodes across all PDBs

Cytoscape Behavior (important)

Headless / Server (open_in_cytoscape: false)
→ Only CX2 files are written (no .cyjs).
→ Deterministic positions and visual mappings are embedded.

Desktop (open_in_cytoscape: true)
→ Networks are created in Cytoscape via py4cytoscape and also exported as CX2.

Download the UniProt FASTA File

The BLAST database will be built from a UniProt FASTA file.

  1. Download the latest UniProt Swiss-Prot database

  2. Move the file to the BLAST database folder (adjust the path if necessary):

    mkdir -p C:/blast_db   # Windows (Git Bash)
    mkdir -p ~/blast_db    # Linux/MacOS

Cite

Habitzreither, G., Gautam, Lupas, A., Elhabashy, H. PDB2Net: Automated extraction of biomolecular Interaction Networks from Three-Dimensional Structures. Manuscript in preparation.

Authors

  • Gregor Habitzreither
  • Hadeer Elhabashy

Contact

If you have any questions or inquiries, please feel free to contact Hadeer Elhabashy at (Elhabashylab [@] gmail.com))

License

  • The PDB2NET code in this repository is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages