ChemPSy

Chemical Prioritization System - ChemPSy The aim of **ChemPSy** (Chemical Prioritization System) is to develop an innovative approach based on several bioinformatics and biostatistics methodologies to analyze and integrate massive toxicogenomics datasets. Specific objectives include: (1) classification of chemicals based on transcriptional signatures, e.g. the set of genes whose expression is known to be positively or negatively altered after an exposure to these compounds; (2) the association of classes with human pathologies or deleterious phenotypes, e.g. classes containing toxicants with well-known effects; (3) the prediction of novel reprotoxicants and/or endocrine disruptors based on transcriptional signature similarities with known chemicals affecting testis development and function.

0
0
R

Chemical Prioritization System - ChemPSy

Stable v1.0.0
Python v>=2.7.12
License: GPL-3

The aim of ChemPSy (Chemical Prioritization System) is to develop an innovative approach based on several bioinformatics and biostatistics methodologies to analyze and integrate massive toxicogenomics datasets.

Specific objectives include: (1) classification of chemicals based on transcriptional signatures, e.g. the set of genes whose expression is known to be positively or negatively altered after an exposure to these compounds; (2) the association of classes with human pathologies or deleterious phenotypes, e.g. classes containing toxicants with well-known effects; (3) the prediction of novel reprotoxicants and/or endocrine disruptors based on transcriptional signature similarities with known chemicals affecting testis development and function.

##Data format
Each dataset is organizing according to the following:

+-- [Species]  
   +-- [Tissue]
		+--GSEXXX
			+--Experimental_conditions
			¦	+--Condition 1
			¦	+--Condition 2
			+--Individual_experiments
			¦	+--GSMXXXX.CEL.gz
			¦	+--GSMXXXX.CEL.gz  
			+--GSEXXX.txt  

[Species]: Binomial nomenclature for selected species (e.g. Homo sapiens for Human)

[Tissue]: Tested tissue in upper case (e.g. LIVER, KIDNEY or MCF-7, HK-2…)


###Step 1 - Describe your dataset
To describe your dataset please use tabulate .txt file with the following fields (Keep the order):

Fields Description
Files CEL file full name (GSM1223.CEL.gz)
Species Binomial nomenclature of species where the results come from
Strain Species strain (e.g. Sprague-Dawley). Can be not specified *
Gender Animal gender(male/female). Can be not specified *
Experiment Experiment (E.G. in vitro, in vivo, …). Can be not specified *
Tissues/Cells Tissue or cell name where the experiment is performed
Age Animal age. Can be not specified *
Generation Animal generation (for trans-generational studies). If not specified, please put ‘F0’.
ChemicalName Chemical usual / synonym name (only one name)
CAS Chemical CAS number
MESH Chemical MESH ID
Dose Chemical exposition dose
Duration Chemical exposition duration
Route Chemical route. Can be not specified *
Vehicule Chemical vehicle. Can be not specified *
PMID Associated publication PubMed ID. Can be not specified *
GSE GEO dataset ID
GSM GEO profile ID
GPL GPL use. Can be not specified *
Mail Corresponding dataset author mail. Can be not specified *
Paired Paired data (Yes/No)
Replicates Replicate number
Experiment type Experimental type details (e.g. 'Expression profiling by array’). Can be not specified *
Design Experimental design. Can be not specified *
Treatment protocol Treatment protocol description. Can be not specified *
Characteristics Tissue/cells characteristics. Can be not specified *
Extraction protocol Extraction protocol description. Can be not specified *
Link(s) Cross-link(s) (e.g. GEO, database, personal website …). Can be not specified *
Data processing Data processing description. Can be not specified *
Sample Treated Treated or Control sample. Can be not specified
Associated Ctrl Associate a unique number to your control and list all control paired with your treated sample (e.g. control1 = 1, control2=2 …, treated_sample1 = 1,2 [this sample is paired with control 1 and 2]).

Don’t leave empty fields: use ‘NA’ if your field is not specified
‘*’: This field is required for TOXsIgN integration
Each line need to correspond to one and unique sample

###Step 2 - Organize your data
In your GSEXXX directory, save your tabulate .txt file using the same name of your directory: GSEXXX.txt and create a new folder called: Individual_experiments.
Drop in this folder all expression files associated with your study. Please make sur that all yours. CEL file are compressed. If not use the following command:

gzip *.CEL

###Step 3 - Create conditions and treatment.info
To create the Experimental_conditions directory and all conditions sub-directories, use the CreateTreatmentInfo.sh script. This script takes no arguments but load a configuration file: ChemPSy.ini. Please modified this file or change the configuration file load in CreateTreatmentInfo.sh script:

source /home/genouest/irset/tdarde/projects/ChemPSy/20160321/script/ChemPSy_Human.ini

Next adapt the loop according to your datasets:

#! /bin/bash
source /home/genouest/irset/tdarde/projects/ChemPSy/20160321/script/ChemPSy_Human.ini
echo "Reading config...." >&2
#echo "Create treatment.info files for HEPATOCYTES"
for i in $HepatoList
    do
        python $scriptTreatment -p $i -t HEPATOCYTES -e $fileRemove -s True
done

echo "Create treatment.info files for HK-2"
for i in $HK2List
    do
        python $scriptTreatment -p $i -t HK-2 -e $fileRemove -s True
done

echo "Create treatment.info files for ISHIKAWA_CELLS"
for i in $IshikawaList
    do
        python $scriptTreatment -p $i -t ISHIKAWA_CELLS -e $fileRemove -s True
done

echo "Create treatment.info files for JURKAT_CELLS"
for i in $JurkatList
    do
        python $scriptTreatment -p $i -t JURKAT_CELLS -e $fileRemove -s True
done

echo "Create treatment.info files for MCF-7"
for i in $MCFList
    do
        python $scriptTreatment -p $i -t MCF-7 -e $fileRemove -s True
done

echo "Create treatment.info files for liver"
for i in $LiverList
    do
        python $scriptTreatment -p $i -t LIVER -e $fileRemove -s False
done

echo "Create treatment.info files for tg"
for i in $TgList
    do
        python $scriptTreatment -p $i -t THIGH-MUSCLE -e $fileRemove
done

If you have no error, you may obtain the following directories organization:

+-- [Species]  
   +-- [Tissue]
		+--GSEXXX
			+--Experimental_conditions
			¦	+--Condition 1
			¦	¦	+--treatment.info
			¦	+--Condition 2
			¦	¦	+--treatment.info
			+--Individual_experiments
			¦	+--GSMXXXX.CEL.gz
			¦	+--GSMXXXX.CEL.gz  
			+--GSEXXX.txt  

In each treatment.info you will find the association between treated sample (first column) and control sample (second column):

003016029014.CEL.gz	003016029008.CEL.gz	0
003016029014.CEL.gz	003016029009.CEL.gz	0
003016029015.CEL.gz	003016029008.CEL.gz	0
003016029015.CEL.gz	003016029009.CEL.gz	0

##Run ChemPSy
Before run ChemPSy please be sur that you have the same architecture like previously describe and you have all your conditions with associated treatment.info files

Next run ChemPSy_data_prep.sh
As the previous script, this script uses the same configuration file. So please edit it and/or change the path on source line.

#!/bin/bash

#################################
#     Source .ini file          #
#################################

echo "##############################       ChemPSy       ##############################"


echo "--1-- Checking config file"
source /home/genouest/irset/tdarde/projects/ChemPSy/20160321/script/ChemPsy_processing/ChemPSy_Human.ini
echo "Reading config...." >&2
echo "Reading scriptPath: $scriptPath" >&2
echo "Reading config: $Rscript " >&2
echo "Reading dataPath: $dataPath" >&2
echo "Reading config: $processedPath " >&2

###Step 1 - Quality control
The first step of ChemPSy_data_prep.sh is a quality control. Various information will be created for each conditions including the microarray picture.

function step_1 {
	echo "--2-- STEP_1 process_data"
	for tissue in $tissues
		do
			echo $tissue
			outputT=$processedPath$tissue"/"
			mkdir -p $outputT
			
			for gse in $gsePath
				do
					path=$dataPath$tissue"/"$gse"/"
					if [ -d $path ]
						then
							output=$processedPath$tissue"/"$gse"/Experimental_conditions/"
							mkdir -p $output
							scriptA=$scriptPath'Rlauncher.sh'
							$scriptA -p $path -t $tissue -o $output -c $cdfpath
					fi
			done
	done
	while [ $(qstat | grep "ChemPSy_" | wc -l) -ne 0 ]
	do
		echo "Running --2-- STEP_1 process_data"
		sleep 7
	done
	echo "--2-- STEP_1 process_data finish"
} 

To performe the quality control please check each picture one by one and remove microarray with 20% or more of hybridization error.
List all your sample to remove in removeCelFile.txt and run the Step - 3 of Data Format part

+-- [Species]  
   +-- [Tissue]
		+--GSEXXX
			+--Experimental conditions
				+--Condition 1
				¦	+--contrastmatrix.txt 
				¦	+--normdata.txt 
				¦	+--designmatrix.txt
				¦	+--qc_boxplot_afternormalization.pdf
				¦	+--qc_boxplot_beforenormalization.pdf
				¦	+--filtration.txt 
				¦	+--log2fcchangedata.txt
				¦	+--qc_corrmatrix_afternormalization.pdf
				¦	+--mednormdata.txt
				¦	+--qc_image_003016029009.CEL.gz.png
				¦	+--qc_image_003016029015.CEL.gz.png
				¦	+--qc_image_003016029009.CEL.gz.png

###Step 2 - List all conditions
This step lists all the conditions