compare

ingenannot compare establishes list of shared, specific CDS between gene datasets.

usage

$ ingenannot.py -v 2 compare file.fof --graphout --export_same_cds

positional arguments:

FoF

File of files, <GFF/GTF>TAB<source>

optional arguments:

-h, –help

show this help message and exit

–clutype CLUTYPE

Feature type used to clusterize: [gene, cds], default=cds

–clustranded

Same strand orientation required to cluster features, default=False

–export_same_cds

Export identical shared CDS by all annotations in same_cds.gff3 file, default=False

–export_specific

Export specific CDS for each annotation, locus and CDS specific in separate files, default=False

–export_venn

Export CDS in with metagene code to perform venn diagrams, default=False

–export_upsetplot

Export upsetplot of CDS, default=False

–graphout GRAPHOUT

output filename of the graph, default=upsetplot.png

–graphtitle GRAPHTITLE

output title of the graph, default=Intersecting sets of CDS

inputs

File of Files (FoF) with all files to compare. One per line such: <GFF/GTF>TAB<source>. If you use –clutype gene, genes with overlaps on their UTR will be clusterized together. In the same manner, –clustranded will separate overlapping annotations even if they overlap.

outputs

If we compare these 2 files [file 1] and [file 2], find below the positions of the genes and associated mRNAs:

upset plot

We obtain these comparison metrics:

INFO:root:reading ../../test-data/compare.src1.gff3
INFO:root:parsing ../../test-data/compare.src1.gff3 as format:gff3
INFO:root:10 genes extracted from ../../test-data/compare.src1.gff3 - source: src1
INFO:root:reading ../../test-data/compare.src2.gff
INFO:root:parsing ../../test-data/compare.src2.gff as format:gff3
INFO:root:17 genes extracted from ../../test-data/compare.src2.gff - source: src2
INFO:root:27 genes extracted from 2 sources
INFO:root:28 transcripts extracted from 2 sources
INFO:root:Clustering genes based on 'cds' coordinates, strand orientation: 'False'
INFO:root:22 clusters for sequence: chr_1
INFO:root:22 clusters generated
ERROR:root:WARNING, Source: src2 have multiple same CDS for metagene 21, removing transcript: Zt09_model_chr_1_00016_dup
Number of Metagenes: 22
Number of different CDS: 24
Number of sources per CDS: {1: 21, 2: 3}
Number of CDS shared by all sources: 3
Number of MetaGenes with unique CDS: 20
Number of MetaGenes with unique CDS (nb sources): {1: 17, 2: 3}
Number of MetaGenes with multiple CDS: 2
Number of MetaGenes with multiple CDS (nb different CDS): {2: 2}
shared same CDS:
src1 - src1: 10
src1 - src2: 3
src2 - src2: 17
Number of specific CDS per source: {'src1': 6, 'src2': 13}
Number of specific CDS, with other CDS from other source at the same locus/metagene: {'src1': 1, 'src2': 1}
Number of specific loci/Metagene per source: {'src1': 6, 'src2': 12}
Number of sources of most representative CDS per Metagene: {1: 19, 2: 3}
INFO:root:Upsetplot exported in upsetplot.png

One CDS of src2 is duplicated and was removed by the process with the following message:

ERROR:root:WARNING, Source: src2 have multiple same CDS for metagene 21, removing transcript: Zt09_model_chr_1_00016_dup