KluDo (Diffusion Kernel-based Graph Node Clustering for Protein Domain Assignment), is an automatic framework that incorporates diffusion kernels on protein graphs as affinity measures between residues to decompose protein structures into structural domains.
It is recommended to run KluDo by a Python version of 3.6 or more, however there should be no problem with running KluDo in any 3.x version. Also the following packages/programs must be installed before using KluDo:
- numpy (Python)
- scipy (Python)
- atomium (Python)
- igraph (Python)
- sklearn (Python)
- tslearn (Python)
- prettytable (Python)
- unidip (Python)
- pyclustertend (Python)
- DSSP (Linux/Windows/Mac binary executable)
Furthermore, it is required to have the single/multi-domain classifier in the main directory. Click here to download it from Dropbox.
The following commands are based on Linux operating system. By using --help argument one can see a description of all arguments in KluDo:
$ python3 kludo.py --help
The arguments are as follows:
- --pdb (PDB file path)
- --chainid (chain ID)
- --dssppath (DSSP binary file path; Default: /usr/bin/dssp)
- --clustering (the clustering method: "SP" or "KK"; default: "SP")
- --numdomains (the number of domains; default: automatic)
- --minsegsize (minimum segment size; default: 27)
- --mindomainsize (minimum domain size; default: 27)
- --maxalphahelix (maximum size of alpha-helix to merge; default: 30)
- --maxsegdomratio (maximum ratio of segment count to domain count; default: 1.5)
- --kernel (kernel type; default: LED)
- --dispall (display all candidate partitionings)
- --bw_a (coefficient of radius of gyration to calculate bandwidth coefficient)
- --bw_b (exponent of radius of gyration to calculate bandwidth parameter)
Two of the arguments are mandatory: --pdb and --chainid. If you don't pass values to the rest of arguments, default values are used. Values of the arguments --bw_a and --bw_b should be passed simultaneously. These are coefficient (a) and exponent (b) of the Rg, respectively, which determine the bandwidth parameter (β or t) for each kernel (a * Rg ^ b). Default values of these arguments depend on the kernel type and the clustering algorithm.
Fore the argument --kernel users can choose one of the following options:
- LED (Laplacian Exponential Diffusion Kernel)
- MD (Markov Diffusion Kernel, default)
- MED (Markov Exponential Diffusion Kernel)
- RL (Regularized Laplacian Diffusion Kernel)
Also for the argument --clustering there are following options:
- SP (Spectral clustering, default)
- KK (Kernel k-means)
As an example assuming that the pdb file 1cid.pdb is stored in the current path the following command runs KluDo with the minimal arguments:
$ python3 kludo.py --pdb 1cid.pdb --chainid A
KluDo is also available as a web application at: http://www.cbph.ir/tools/kludo
Taheri-Ledari M, Zandieh A, Shariatpanahi SP, Eslahchi C, "Assignment of structural domains in proteins using diffusion kernels on graphs", BMC Bioinformatics, vol. 23, no. 1, 2022, p. 369. Read the paper