DEEPScreen: Virtual Screening with Deep Convolutional Neural Networks Using Compound Images
DEEPScreen is a command-line prediction tool written in Python 3.7.1. DEEPScreen was developed and tested in MacOSx but it should run in any Unix-like operating system. Please run the below commands to install requirements for model training and testing. Dependencies are available in requirements.txt file which is located under bin directory.
conda create -n deepscreen_env python=3.7
source activate deepscreen_env
pip install -r requirements.txt
bin folder includes the source code of DEEPScreen.
training_files folder includes the files directly used in the training and testing of the system:
chembl27_preprocessed_filtered_bioactivity_dataset.tsv.zip updated version of ChEMBL preprocessed and filtered dataset contains drug/compound-target interactions from the ChEMBL database (v27) after the application of multiple filtering operations to obtain a clean training set,
chembl27_training_target_list.txt list of target chembl ids,
target_training_datasets contains a folder (e.g. CHEMBL286) for each target where each target folder contains
chembl27_preprocessed_filtered_act_inact_comps_10.0_20.0_blast_comp_0.2.txt contains the active and inactive compound information for each target protein in ChEMBL, after the similarity-based negative training dataset enrichment process. In this file, there are two lines for each target, in the following format:
CHEMBL286_act CHEMBL1818056,CHEMBL2115367,CHEMBL344651,CHEMBL62054, ...
CHEMBL286_inact CHEMBL288434,CHEMBL584926,CHEMBL406111,CHEMBL151055, ...
The list of active/inactive compounds separated by commas (i.e., the second tab seperated column: CHEMBL1818056,C…) for the correnponding target (i.e., the first column: CHEMBL286_act),
chembl27_uniprot_mapping.txt contains the id mapping between UniProt accessions and ChEMBL ids for proteins, in tab-separated format (Target UniProt accession, Target ChEMBL id, Target protein name and Target type),
result_files folder contains results of various tests/analyses:
2-D images of:
Clone the Git Repository
Download the compressed file for the target that you want to train here
Locate the zipped target file under training_files/target_training_datasets and unzip it
Run the main_training.py script as shown below
–targetid: Target to be trained (default: CHEMBL286)
–model: CNN architecture to be used (default: CNNModel1)
–fc1: number of neurons in the first fully-connected layer (default:512)
–fc2: number of neurons in the second fully-connected layer (default:256)
–lr:learning rate (default: 0.001)
–bs: batch size (default: 32)
–dropout: dropout rate (default: 0.1)
–epoch: number of epochs (default: 200)
–en: the name of the experiment (default: my_experiment)
python main_training.py --targetid CHEMBL286 --model CNNModel1 --fc1 256 --fc2 128 --lr 0.01 --bs 64 --dropout 0.25 --epoch 100 --en my_chembl286_training
main_training.py creates a folder named <experiment_name> (given as argument –en) under result_files/experiments folder. Two files are created under results_files/experiments/<experiment_name>:
If you use DEEPScreen please consider citing:
Rifaioglu, A. S., Nalbat, E., Atalay, V., Martin, M. J., Cetin-Atalay, R., & Doğan, T. (2020). DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representations. Chemical Science, 11(9), 2531-2557.
DEEPScreen
Copyright © 2020 CanSyL
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.