Author: | Ronak H Shah |
---|---|
Contact: | rons.shah@gmail.com |
Source code: | http://github.com/rhshah/iAnnotateSV |
License: | Apache License 2.0 |
iAnnotateSV is a Python library and command-line software toolkit to annotate and visualize structural variants detected from Next Generation DNA sequencing data. This works for majority is just re-writing of a tool called dRanger_annotate written in matlab by Mike Lawrence at Broad Institue. But it also has some additional functionality and control over the annotation w.r.t the what transcripts to be used for annotation. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina.
We are in the process of publishing a manuscript describing iAnnotateSV as part of the Structural Variant Detection framework. If you use this software in a publication, for now, please cite our website iAnnotateSV.
I would like to thanks Mike Lawrence from Broad Institue for sharing his code and Michael Berger for his insights into the dRanger_annotate tool.
We require that you install:
pandas: | v0.24.2 |
---|---|
biopython: | v1.65 |
Pillow: | v3.4.2 |
reportlab: | v3.3.0 |
coloredlogs: | v5.2 |
If you know python I have created a small test script in /iAnnotateSV/test directory it runs a test on existing code and compares the result with the output file.
python /path/to/iAnnotateSV.py -i svFile.txt -ofp outputfilePrefix -o /path/to/output/dir -r hg19 -d 3000
python path/to/path/to/iAnnotateSV.py -i svFile.txt -ofp outputfilePrefix -o /path/to/output/dir -r hg19 -d 3000 -c canonicalTranscripts.txt
python path/to/iAnnotateSV.py -i svFile.txt -ofp outputfilePrefix -o /path/to/output/dir -r hg19 -d 3000 -c canonicalTranscripts.txt -u uniprot.txt -p
usage: iAnnotateSV.py [options] Annotate SV based on a specific human reference optional arguments: -h, --help show this help message and exit -v, --verbose make lots of noise [default] -r hg19, --refFileVersion hg19 Which human reference file to be used, hg18,hg19 or hg38 -rf hg19.sv.table.txt, --refFile hg19.sv.table.txt Human reference file location to be used -ofp test, --outputFilePrefix test Prefix for the output file -o /somedir, --outputDir /somedir Full Path to the output dir -i svfile.txt, --svFile svfile.txt Location of the structural variants file to annotate -d 3000, --distance 3000 Distance used to extend the promoter region -a, --autoSelect Auto Select which transcript to be used[default] -c canonicalExons.txt, --canonicalTranscripts canonicalExons.txt Location of canonical transcript list for each gene. Use only if you want the output for specific transcripts for each gene. -p, --plotSV Plot the structural variant in question -u uniprot.txt, --uniprotFile uniprot.txt Location of UniProt list contain information for protein domains. Use only if you want to plot the structural variant -rr RepeatRegionFile.tsv, --repeatFile RepeatRegionFile.tsv Location of the Repeat Region Bed File -dgv DGvFile.tsv, --dgvFile DGvFile.tsv Location of the Database of Genomic Variants Bed File -cc CosmicConsensus.tsv, --cosmicConsensusFile CosmicConsensus.tsv Location of the Cosmic Consensus TSV file -cct cosmic_fusion_counts.tsv, --cosmicCountsFile cosmic_fusion_counts.tsv Location of the Cosmic Counts TSV file
Input file format is a tab-delimited file containing:
chr1 pos1 str1 chr2 pos2 str2
as the header and where:
Output file will is a tab-delimited file containing:
chr1 pos1 str1 chr2 pos2 str2 gene1 transcript1 site1 gene2 transcript2 site2 fusion
as the header and where:
Example Plot: |
---|
The above plot shows the following:
Output file name for plot is Gene1-Chromosome1_Position1_Gene2-Chromosome2_Position2_EventType.jpg All the Outputs are written into a folder called iAnnotateSVplots in the given output directory
Please look at examples of input and output files in /data/test directory where: /data/test/testData.txt is the input file /data/test/testResult.txt is the output file
The refFileVersion are automaticslly chosen from /data/references. But caution this is only tested on hg19. All these files are created using UCSC table browser.
The example for canonical transcripts can be also found in /data/canonicalInfo. In general the file is tab-delimited containing:
Gene Transcripts
as the headers where:
The file for hg19 uniprot is created using UCSC table browser (Uniprot spAnnot track). The file for hg19 is in /data/UcscUniprotdomainInfo
iAnnotateSV
contents.. automodule:: iAnnotateSV :members: :undoc-members: :show-inheritance:
AnnotateEachBreakpoint
module.. automodule:: iAnnotateSV.AnnotateEachBreakpoint :members: FindATranscript, FindAllTranscript :undoc-members: :show-inheritance:
Example: | AnnotateEachBreakpoint(chr1,pos1,str1,refDF) |
---|
FindATranscript
function.. automodule:: iAnnotateSV.FindTranscrpit.FindATranscript :members: :undoc-members: :show-inheritance:
Example: | FindATranscript(queryDF,refDF) |
---|
FindAllTranscripts
funtion.. automodule:: iAnnotateSV.FindTranscrpit.FindAllTranscripts :members: :undoc-members: :show-inheritance:
Example: | FindAllTranscripts(queryDF,refDF) |
---|
FindCanonicalTranscript
module.. automodule:: iAnnotateSV.FindCanonicalTranscript :members: :undoc-members: :show-inheritance:
Example: | FindCT(geneList,transcriptList,siteList,zoneList,strandList,intronnumList,intronframeList,ctDict) |
---|
PredictFunction
module.. automodule:: iAnnotateSV.PredictFunction :members: :undoc-members: :show-inheritance:
This module will predict the function of each annotated breakpoint
Example: |
So ann1S & ann2S are series that will go to PredictFuntionForSV()
|
---|
AddExternalAnnotations
module.. automodule:: iAnnotateSV.AddExternalAnnotations :members: ReadSVFile :undoc-members: :show-inheritance:
/data/repeat_region/hg19_repeatRegion.tsv
),/data/database_of_genomic_variants/hg19_DGv_Annotation.tsv
),/data/cosmic/cancer_gene_census.tsv
),/data/cosmic/cosmic_fusion_counts.tsv
),Example: |
|
---|
AnnotateForRepeatRegion
module.. automodule:: iAnnotateSV.AnnotateForRepeatRegion :members: ReadRepeatFile,AnnotateRepeatRegion :undoc-members: :show-inheritance:
ReadRepeatFile
- This will read a tab-delimited file into a panadas dataframe
AnnotateRepeatRegion
- This is will annotate the breakpoints for repeat region.
Example: AnnotateRepeatRegion(verbose, count, svObject, repeatregionDict)
AnnotateForDGv
module.. automodule:: iAnnotateSV.AnnotateForDGv :members: ReadDGvFile,AnnotateDGv :undoc-members: :show-inheritance:
ReadDGv
- This will read a tab-delimited file into a panadas dataframe
AnnotateDGv
- This is will annotate the breakpoints for Database of Genomic Variants.
Example: AnnotateDGv(verbose, count, svObject, dgvDict)
AnnotateForCosmic
module.. automodule:: iAnnotateSV.AnnotateForCosmic :members: AnnotateFromCosmicCensusFile :undoc-members: :show-inheritance:
This module will annotate each breakpoint for Cosmic Census
Example: AnnotateFromCosmicCensusFile(comic_census_filename, verbose, count, svObject)
Example: AnnotateFromComicFusionCountsFile(comic_fusion_counts_filename, verbose, count, svObject)
helper
module.. automodule:: iAnnotateSV.helper :members: ReadFile,ExtendPromoterRegion,bp2str :undoc-members: :show-inheritance:
VisualizeSV
module.. automodule:: iAnnotateSV.VisualizeSV :members: :undoc-members: :show-inheritance:
Example: |
|
---|---|
Example Plot: |
iAnnotateSV
main funtion.. automodule:: iAnnotateSV.iAnnotateSV :members: processSV :undoc-members: :show-inheritance:
Here is the Usage again:
usage: iAnnotateSV.py [options] Annotate SV based on a specific human reference optional arguments: -h, --help show this help message and exit -v, --verbose make lots of noise [default] -r hg19, --refFileVersion hg19 Which human reference file to be used, hg18,hg19 or hg38 -ofp test, --outputFilePrefix test Prefix for the output file -o /somedir, --outputDir /somedir Full Path to the output dir -i svfile.txt, --svFile svfile.txt Location of the structural variants file to annotate -d 3000, --distance 3000 Distance used to extend the promoter region -a, --autoSelect Auto Select which transcript to be used[default] -c canonicalExons.txt, --canonicalTranscripts canonicalExons.txt Location of canonical transcript list for each gene. Use only if you want the output for specific transcripts for each gene. -p, --plotSV Plot the structural variant in question[default] -u uniprot.txt, --uniprotFile uniprot.txt Location of UniProt list contain information for protein domains. Use only if you want to plot the structural variant -rr RepeatRegionFile.tsv, --repeatFile RepeatRegionFile.tsv Location of the Repeat Region Bed File -dgv DGvFile.tsv, --dgvFile DGvFile.tsv Location of the Database of Genomic Variants Bed File -cc CosmicConsensus.tsv, --cosmicConsensusFile CosmicConsensus.tsv Location of the Cosmic Consensus TSV file -cct CosmicFusionCounts.tsv, --cosmicCountsFile CosmicConsensus.tsv Location of the Cosmic Fusion Counts TSV file
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。