PDB2Net automatically extracts Protein Interaction Networks (PINs) from PDB/mmCIF files and visualizes them as Cytoscape networks.
It uses Gemmi for structure parsing, SciPy cKDTree for distance-based interaction detection, and BLAST+ for UniProt annotation of unidentified chains.
- Automatic parsing of
.pdb,.cif, and.mmCIFstructures - Distance-based chain interaction detection
- Protein-level and chain-level networks
- Full UniProt annotation via SIFTS and BLAST+
- Export of chain, protein, and combined networks (CX2 format)
- Recommended Version: Python 3.11
- Download Python
- Ensure that pip is installed:
python -m ensurepip --default-pip
pip install -r requirements.txt
- Download Cytoscape 3.10.4 or newer:
Cytoscape Download - Start once manually, so it can auto-launch later via PDB2Net.
- On headless servers, Cytoscape is automatically disabled (
open_in_cytoscape = false).
| File | Source | Purpose |
|---|---|---|
pdb_seqres.txt |
https://www.rcsb.org/downloads/fasta | PDB single-FASTA (chains) |
pdb_chain_uniprot.tsv |
https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html | PDB ⇄ UniProt mapping (SIFTS) |
uniprot_sprot.fasta |
https://www.uniprot.org/uniprotkb?query=reviewed:true | Swiss-Prot for building BLAST DB |
- Go to the NCBI BLAST+ Download page:
🔗 NCBI BLAST+ Download - Download the correct version for your OS:
- Windows: Download
ncbi-blast-*-win64.exe - Linux: Download
ncbi-blast-*-x64-linux.tar.gz - MacOS: Download
ncbi-blast-*-universal-macosx.tar.gz
- Windows: Download
- Install BLAST+:
- Windows: Run the
.exefile and follow the installation wizard. - Linux/MacOS: Extract the files and move them to
/usr/local/bin:tar -xvzf ncbi-blast-*-x64-linux.tar.gz sudo mv ncbi-blast-* /usr/local/bin
- Windows: Run the
Now, generate the BLAST database from the downloaded UniProt FASTA file.
-
Open a terminal (Linux/Mac) or PowerShell/Git Bash (Windows).
-
Run the following command:
makeblastdb -in C:/blast_db/uniprot_sprot.fasta -dbtype prot -out C:/blast_db/uniprot_db
Explanation:
-in→ Input FASTA file.-dbtype prot→ Specifies a protein database.-out→ Output database name (uniprot_db).
-
Expected output:
Building a new DB, current time: 03/16/2025 12:45:32 New DB name: C:/blast_db/uniprot_db Number of sequences: 570,000This confirms that BLAST has successfully created the database.
PDB2Net loads configuration in layers — later files override earlier ones:
configs/config.base.json— shared defaultsconfigs/config.{windows|linux|darwin}.json— OS-specific overridesconfigs/config.local.json— user machine settings (git-ignored)- Environment variables — highest priority
🗂️ Paths support
~and$VARSexpansion.
config.base.json(defaults):
{
"networks": {
"chain_per_pdb": true,
"combined_chain_network": true,
"protein_per_pdb": true,
"combined_protein_network": true
},
"distance_thresholds": { "ca_radius": 15.0, "all_atoms_radius": 5.0 },
"workers": { "parsing": "auto", "blast_threads": "auto" },
"keep_last_n_networks": 46,
"export_detailed_interactions": true
}config.windows.json
{
"input_folder_path": "E:/PDB_Files/Test500",
"pdb_fasta_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/pdb_seqres.txt",
"uniprot_fasta_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/uniprot_sprot.fasta",
"sifts_tsv_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/pdb_chain_uniprot.tsv",
"output_path": "D:/Networks",
"cytoscape_path": "C:/Program Files/Cytoscape_v3.10.4/Cytoscape.exe",
"blast_db_path": "C:/Users/habit/Documents/Projekte/MPI_PDB2Net/Data/blast_db",
"blastp_executable": "C:/Program Files/NCBI/blast-2.17.0+/bin/blastp.exe",
"open_in_cytoscape": true
}config.linux.json
{
"input_folder_path": "/data/pdb_inputs",
"pdb_fasta_path": "/data/reference/pdb_seqres.txt",
"uniprot_fasta_path": "/data/reference/uniprot_sprot.fasta",
"sifts_tsv_path": "/data/reference/pdb_chain_uniprot.tsv",
"output_path": "/srv/pdb2net_outputs",
"blast_db_path": "/data/reference/blast_db",
"blastp_executable": "blastp",
"open_in_cytoscape": false
}config.darwin.json(macOS)
{
"input_folder_path": "$HOME/pdb2net/pdb_inputs",
"pdb_fasta_path": "$HOME/pdb2net/reference/pdb_seqres.txt",
"uniprot_fasta_path": "$HOME/pdb2net/reference/uniprot_sprot.fasta",
"sifts_tsv_path": "$HOME/pdb2net/reference/pdb_chain_uniprot.tsv",
"output_path": "$HOME/pdb2net/outputs",
"blast_db_path": "$HOME/pdb2net/reference/blast_db",
"blastp_executable": "blastp",
"open_in_cytoscape": true,
"cytoscape_path": "/Applications/Cytoscape.app/Contents/MacOS/Cytoscape"
}You can override individual settings via ENV:
| ENV var | Maps to config key |
|---|---|
PDB2NET_INPUT |
input_folder_path |
PDB2NET_OUTPUT |
output_path |
PDB2NET_PDB_FASTA |
pdb_fasta_path |
PDB2NET_UNIPROT_FASTA |
uniprot_fasta_path |
PDB2NET_SIFTS_TSV |
sifts_tsv_path |
PDB2NET_CYTO_PATH |
cytoscape_path |
PDB2NET_BLAST_DB |
blast_db_path |
PDB2NET_BLASTP |
blastp_executable |
PDB2NET_OPEN_IN_CYTOSCAPE |
open_in_cytoscape (true/false/1/0/yes/no) |
PDB2NET_WORKERS_PARSING |
workers.parsing (auto or int) |
PDB2NET_WORKERS_BLAST |
workers.blast_threads (auto or int) |
PDB2NET_CA_RADIUS |
distance_thresholds.ca_radius |
PDB2NET_ALL_ATOMS_RADIUS |
distance_thresholds.all_atoms_radius |
Windows PowerShell:
setx PDB2NET_INPUT "E:\PDB_Files\Dataset"
setx PDB2NET_OUTPUT "E:\Networks"
setx PDB2NET_OPEN_IN_CYTOSCAPE "true"
Linux/macOS:
export PDB2NET_INPUT=~/pdb2net/pdb_inputs
export PDB2NET_OUTPUT=~/pdb2net/outputs
export PDB2NET_OPEN_IN_CYTOSCAPE=false
Once all dependencies are installed, you can run the tool with:
python main.py
- Output goes to a timestamped subfolder in
output_path, e.g.: ""/…/Networks/2025-10-20_18-32-45/"
Valid PDB/mmCIF files found in input_folder_path
| File/Folder | Description |
|---|---|
log.txt |
Timing summary (parsing, classification, BLAST, interaction, exports) |
*.cx2 |
Cytoscape networks (Chain/Protein/Combined), portable CX2 |
detailed_interactions.csv |
Per-atom residue/atom distance pairs (if export_detailed_interactions: true) |
error_in_batch_log/ |
Batch/runtime logs |
PDB2Net generates several network representations:
- Chain Interaction Network (per PDB) — Nodes: chains; Edges: interactions
- Combined Chain Network — All chains across all PDBs
- Protein Network (per PDB) — Nodes: UniProt IDs; Edges aggregated over chains
- Combined Protein Network — UniProt nodes across all PDBs
Headless / Server (open_in_cytoscape: false)
→ Only CX2 files are written (no .cyjs).
→ Deterministic positions and visual mappings are embedded.
Desktop (open_in_cytoscape: true)
→ Networks are created in Cytoscape via py4cytoscape and also exported as CX2.
The BLAST database will be built from a UniProt FASTA file.
-
Download the latest UniProt Swiss-Prot database
- Manual Download: UniProt Swiss-Prot
-
Move the file to the BLAST database folder (adjust the path if necessary):
mkdir -p C:/blast_db # Windows (Git Bash) mkdir -p ~/blast_db # Linux/MacOS
Habitzreither, G., Gautam, Lupas, A., Elhabashy, H. PDB2Net: Automated extraction of biomolecular Interaction Networks from Three-Dimensional Structures. Manuscript in preparation.
- Gregor Habitzreither
- Hadeer Elhabashy
If you have any questions or inquiries, please feel free to contact Hadeer Elhabashy at (Elhabashylab [@] gmail.com))
- The PDB2NET code in this repository is licensed under the MIT License.