Data OrganizationΒΆ
Data and codes in https://github.com/lichen-lab/DeepChrInteract are listed as follows in order to run DeepChrInteract
DeepChrInteractThe
rootfolder of this project. All files should be stored in this directory and its subdirectories.DeepChrInteract.pyMain file to call all functions. Use
python3 DeepChrInteract.py -hto browse all functions and optionsdata_preprocessing.pyUsed to
preprocess data, generatenpzandpngfiles, usepython3 DeepChrInteract.py -p true -n [file name]model.pyStored
all models, including:onehot_cnn_one_branch/onehot_cnn_two_branch/onehot_embedding_dense/onehot_embedding_cnn_one_branch/onehot_embedding_cnn_two_branch/onehot_dense/onehot_resnet18/embedding_cnn_one_branch/embedding_cnn_two_branchtrain.pyCalled by
model.pyto train the model. Usepython3 DeepChrInteract.py -m [model name] -t train -n [file name]test.pyCalled by
model.pyto test the model. Usepython3 DeepChrInteract.py -m [model name] -t test -n [file name]log.txtUsed to store prediction results by
test.py, Stores the source gene of the timestamp model, the test target gene, the aucroc result, and the Pearson correlation coefficientembedding_matrix.npyPretrained
DNA2VEC embeddingmatrix fromhg19 human genome, which contains 4097*100 dimensions (6mer, 2**6=4096, where the first line is the initial line, all 0)resnet.pyInclude resnet18, resnet34, resnet50, resnet101, resnet152. This file is a resnet library file.
File path
dataStore DNA sequences from labelled chromatin interactions and non-chromatin interactions
File path
Example: AD2.poseq.anchor1.pos.txtDNA sequence for chromatin-interacted region1. Each row is a sequence.
seq.anchor2.pos.txtDNA sequence for chromatin-interacted region2. Each row is a sequence.
seq.anchor1.neg2.txtDNA sequence for non-chromatin-interacted region1. Each row is a sequence.
seq.anchor2.neg2.txtDNA sequence for non-chromatin-interacted region2. Each row is a sequence.
File path
h5_weightsSaved weights for neural network
File path
resultConsists of folders for multiple datasets and multiple model folders for each dataset