[05/07/24 08:00:27]: /venv/bin/funannotate update -i fun --cpus 16 [05/07/24 08:00:27]: OS: Debian GNU/Linux 10, 16 cores, ~ 791 GB RAM. Python: 3.8.12 [05/07/24 08:00:27]: Running 1.8.17 [05/07/24 08:00:28]: fasta version=36.3.8g path=/venv/bin/fasta [05/07/24 08:00:28]: minimap2 version=2.26-r1175 path=/venv/bin/minimap2 [05/07/24 08:00:28]: tbl2asn version=25.8 path=/venv/bin/tbl2asn [05/07/24 08:00:28]: hisat2 version=2.2.1 path=/venv/bin/hisat2 [05/07/24 08:00:28]: hisat2-build version=NA path=/venv/bin/hisat2-build [05/07/24 08:00:28]: kallisto version=0.46.1 path=/venv/bin/kallisto [05/07/24 08:00:28]: Trinity version=2.8.5 path=/venv/bin/Trinity [05/07/24 08:00:28]: bedtools version=bedtools v2.31.1 path=/venv/bin/bedtools [05/07/24 08:00:28]: java version=11.0.8-internal path=/venv/bin/java [05/07/24 08:00:28]: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl version=NA path=/venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl [05/07/24 08:00:28]: /venv/opt/pasa-2.4.1/bin/seqclean version=NA path=/venv/opt/pasa-2.4.1/bin/seqclean [05/07/24 08:00:28]: trimmomatic version=0.39 path=/venv/bin/trimmomatic [05/07/24 08:00:28]: minimap2 version=2.26-r1175 path=/venv/bin/minimap2 [05/07/24 08:00:28]: blat version=BLAT v35 path=/venv/bin/blat [05/07/24 08:00:28]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt' [05/07/24 08:00:28]: Found relevant files in fun/training, will re-use them: GFF3: fun/predict_results/Species_name.gff3 Genome: fun/predict_results/Species_name.scaffolds.fa Forward reads: fun/training/left.fq.gz Reverse reads: fun/training/right.fq.gz Forward Q-trimmed reads: fun/training/trimmomatic/trimmed_left.fastq.gz Reverse Q-trimmed reads: fun/training/trimmomatic/trimmed_right.fastq.gz Forward normalized reads: fun/training/normalize/left.norm.fq Reverse normalized reads: fun/training/normalize/right.norm.fq Trinity results: fun/training/funannotate_train.trinity-GG.fasta PASA config file: fun/training/pasa/alignAssembly.txt BAM alignments: fun/training/funannotate_train.coordSorted.bam StringTie GTF: fun/training/funannotate_train.stringtie.gtf [05/07/24 08:01:34]: Reannotating Species name, NCBI accession: None [05/07/24 08:01:34]: Previous annotation consists of: 45,531 protein coding gene models and 695 non-coding gene models [05/07/24 08:01:34]: Existing annotation: locustag=FUN_ genenumber=46226 [05/07/24 08:01:34]: Input reads: ('fun/training/left.fq.gz', 'fun/training/right.fq.gz', None) [05/07/24 08:01:34]: Quality trimmed reads: ('fun/training/trimmomatic/trimmed_left.fastq.gz', 'fun/training/trimmomatic/trimmed_right.fastq.gz', None) [05/07/24 08:01:34]: FASTQ headers seem compatible with Trinity [05/07/24 08:01:34]: Normalized reads: ('fun/training/normalize/left.norm.fq', 'fun/training/normalize/right.norm.fq', None) [05/07/24 08:01:34]: Long reads: (None, None, None) [05/07/24 08:01:34]: Long reads FASTA format: (None, None, None) [05/07/24 08:01:34]: Long SeqCleaned reads: (None, None, None) [05/07/24 08:01:34]: /venv/opt/pasa-2.4.1/bin/seqclean trinity.fasta -c 16 [05/07/24 08:01:47]: seqclean running options: seqclean trinity.fasta -c 16 Standard log file: seqcl_trinity.fasta.log Error log file: err_seqcl_trinity.fasta.log Using 16 CPUs for cleaning -= Rebuilding trinity.fasta cdb index =- Launching actual cleaning process: psx -p 16 -n 1000 -i trinity.fasta -d cleaning -C '/trinity.fasta:ANLMS100:::11:0' -c '/venv/opt/pasa-2.4.1/bin/seqclean.psx' Collecting cleaning reports ************************************************** Sequences analyzed: 60028 ----------------------------------- valid: 59993 (738 trimmed) trashed: 35 ************************************************** ----= Trashing summary =------ by 'dust': 35 ------------------------------ Output file containing only valid and trimmed sequences: trinity.fasta.clean For trimming and trashing details see cleaning report : trinity.fasta.cln -------------------------------------------------- seqclean (trinity.fasta) finished on machine 54c7c765a420 in , without a detectable error. [05/07/24 08:01:47]: minimap2 -ax splice -t 16 --cs -u b -G 3000 fun/update_misc/genome.fa fun/update_misc/trinity.fasta.clean | samtools sort --reference fun/update_misc/genome.fa -@ 4 -o fun/update_misc/trinity.alignments.bam - [05/07/24 08:02:21]: Converting transcript alignments to GFF3 format [05/07/24 08:02:26]: Converting Trinity transcript alignments to GFF3 format [05/07/24 08:02:31]: PASA database is SQLite: /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/training/pasa/Species_name_pasa [05/07/24 08:02:31]: /venv/bin/cdbfasta fun/update_misc/genome.fa [05/07/24 08:02:33]: 3574 entries from file fun/update_misc/genome.fa were indexed in file fun/update_misc/genome.fa.cidx [05/07/24 08:02:33]: Running PASA annotation comparison step 1 [05/07/24 08:02:33]: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/pasa/annotCompare.txt -g /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/genome.fa -t /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/trinity.fasta.clean -A -L --CPU 16 --annots /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/genome.gff3 [05/07/24 12:12:56]: Running PASA annotation comparison step 2 [05/07/24 12:12:56]: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/pasa/annotCompare.txt -g /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/genome.fa -t /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/trinity.fasta.clean -A -L --CPU 16 --annots /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/pasa/Species_name_pasa.gene_structures_post_PASA_updates.595.gff3 [05/07/24 18:23:15]: copying final PASA GFF3 to output: fun/update_misc/pasa/Species_name_pasa.gene_structures_post_PASA_updates.515857.gff3 [05/07/24 18:23:15]: Using Kallisto TPM data to determine which PASA gene models to select at each locus [05/07/24 18:23:15]: Building Kallisto index [05/07/24 18:23:15]: /venv/opt/pasa-2.4.1/misc_utilities/gff3_file_to_proteins.pl fun/update_misc/pasa_final.gff3 fun/update_misc/genome.fa cDNA [05/07/24 18:24:05]: [05/07/24 18:24:05]: kallisto index -i fun/update_misc/getBestModel/bestModel fun/update_misc/getBestModel/transcripts.fa [05/07/24 18:26:06]: [build] loading fasta file fun/update_misc/getBestModel/transcripts.fa [build] k-mer length: 31 [build] warning: clipped off poly-A tail (longer than 10) from 6 target sequences [build] warning: replaced 318 non-ACGUT characters in the input sequence with pseudorandom nucleotides [build] counting k-mers ... done. [build] building target de Bruijn graph ... done [build] creating equivalence classes ... done [build] target de Bruijn graph has 190732 contigs and contains 52501301 k-mers [05/07/24 18:26:06]: Mapping reads using pseudoalignment in Kallisto [05/07/24 18:26:06]: kallisto quant -i fun/update_misc/getBestModel/bestModel -o fun/update_misc/getBestModel/kallisto --plaintext -t 16 fun/training/trimmomatic/trimmed_left.fastq.gz fun/training/trimmomatic/trimmed_right.fastq.gz [05/07/24 18:29:17]: [quant] fragment length distribution will be estimated from the data [index] k-mer length: 31 [index] number of targets: 51,089 [index] number of k-mers: 52,501,301 [index] number of equivalence classes: 78,080 [quant] running in paired-end mode [quant] will process pair 1: fun/training/trimmomatic/trimmed_left.fastq.gz fun/training/trimmomatic/trimmed_right.fastq.gz [quant] finding pseudoalignments for the reads ... done [quant] processed 28,638,946 reads, 21,693,697 reads pseudoaligned [quant] estimated average fragment length: 172.18 [ em] quantifying the abundances ... done [ em] the Expectation-Maximization algorithm ran for 983 rounds [05/07/24 18:29:17]: Parsing Kallisto results. Keeping alt-splicing transcripts if expressed at least 10.0% of highest transcript per locus. [05/07/24 18:29:20]: Wrote 46,970 transcripts derived from 45,479 protein coding loci. [05/07/24 18:29:22]: bedtools intersect -sorted -v -a /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/genome.trna.gff3.sorted.gff3 -b /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~funannotate/test/fun/update_misc/bestmodels.gff3.sorted.gff3 [05/07/24 18:29:57]: Validating gene models (renaming, checking translations, filtering, etc) [05/07/24 18:30:09]: Writing 46,164 loci to TBL format: dropped 0 overlapping, 0 too short, and 0 frameshift gene models [05/07/24 18:30:10]: Converting to Genbank format [05/07/24 18:30:10]: /venv/bin/python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i fun/update_misc/tbl2asn/genome.tbl -f fun/update_misc/genome.fa -o fun/update_misc/tbl2asn --sbt /venv/lib/python3.8/site-packages/funannotate/config/test.sbt -d fun/update_results/Species_name.discrepency.report.txt -s Species name -t -l paired-ends -v 1 -c 16 [05/07/24 18:37:17]: Collecting final annotation files [05/07/24 18:38:23]: Comparing original annotation to updated original: fun/predict_results/Species_name.gff3 updated: fun/update_results/Species_name.gff3 [05/07/24 18:57:01]: Updated annotation complete: ------------------------------------------------------- Total Gene Models: 46,164 Total transcripts: 47,656 New Gene Models: 94 No Change: 32,837 Update UTRs: 13,010 Exons Changed: 197 Exons/CDS Changed: 26 Dropped Models: 4 CDS AED: 0.004 mRNA AED: 0.045 ------------------------------------------------------- [05/07/24 18:57:01]: Funannotate update is finished, output files are in the fun/update_results folder [05/07/24 18:57:01]: There are 5 gene models that need to be fixed. [05/07/24 18:57:01]: Manually edit the tbl file fun/update_results/Species_name.tbl, then run: funannotate fix -i fun/update_results/Species_name.gbk -t fun/update_results/Species_name.tbl [05/07/24 18:57:01]: After the problematic gene models are fixed, you can proceed with functional annotation. [05/07/24 18:57:01]: Your next step might be functional annotation, suggested commands: ------------------------------------------------------- Run InterProScan (Docker required): funannotate iprscan -i fun -m docker -c 16 Run antiSMASH: funannotate remote -i fun -m antismash -e youremail@server.edu Annotate Genome: funannotate annotate -i fun --cpus 16 --sbt yourSBTfile.txt -------------------------------------------------------