[05/10/24 11:09:39]: /venv/bin/funannotate update -i fun --cpus 16 [05/10/24 11:09:39]: OS: Debian GNU/Linux 10, 16 cores, ~ 791 GB RAM. Python: 3.8.12 [05/10/24 11:09:39]: Running 1.8.17 [05/10/24 11:09:40]: fasta version=36.3.8g path=/venv/bin/fasta [05/10/24 11:09:40]: minimap2 version=2.26-r1175 path=/venv/bin/minimap2 [05/10/24 11:09:40]: tbl2asn version=25.8 path=/venv/bin/tbl2asn [05/10/24 11:09:40]: hisat2 version=2.2.1 path=/venv/bin/hisat2 [05/10/24 11:09:40]: hisat2-build version=NA path=/venv/bin/hisat2-build [05/10/24 11:09:40]: kallisto version=0.46.1 path=/venv/bin/kallisto [05/10/24 11:09:40]: Trinity version=2.8.5 path=/venv/bin/Trinity [05/10/24 11:09:40]: bedtools version=bedtools v2.31.1 path=/venv/bin/bedtools [05/10/24 11:09:40]: java version=11.0.8-internal path=/venv/bin/java [05/10/24 11:09:40]: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl version=NA path=/venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl [05/10/24 11:09:40]: /venv/opt/pasa-2.4.1/bin/seqclean version=NA path=/venv/opt/pasa-2.4.1/bin/seqclean [05/10/24 11:09:40]: trimmomatic version=0.39 path=/venv/bin/trimmomatic [05/10/24 11:09:40]: minimap2 version=2.26-r1175 path=/venv/bin/minimap2 [05/10/24 11:09:40]: blat version=BLAT v35 path=/venv/bin/blat [05/10/24 11:09:40]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt' [05/10/24 11:09:40]: Found relevant files in fun/training, will re-use them: GFF3: fun/predict_results/Plecoglossus_altivelis.gff3 Genome: fun/predict_results/Plecoglossus_altivelis.scaffolds.fa Forward reads: fun/training/left.fq.gz Reverse reads: fun/training/right.fq.gz Forward Q-trimmed reads: fun/training/trimmomatic/trimmed_left.fastq.gz Reverse Q-trimmed reads: fun/training/trimmomatic/trimmed_right.fastq.gz Forward normalized reads: fun/training/normalize/left.norm.fq Reverse normalized reads: fun/training/normalize/right.norm.fq Trinity results: fun/training/funannotate_train.trinity-GG.fasta PASA config file: fun/training/pasa/alignAssembly.txt BAM alignments: fun/training/funannotate_train.coordSorted.bam StringTie GTF: fun/training/funannotate_train.stringtie.gtf [05/10/24 11:10:43]: Reannotating Plecoglossus altivelis, NCBI accession: None [05/10/24 11:10:43]: Previous annotation consists of: 44,168 protein coding gene models and 718 non-coding gene models [05/10/24 11:10:43]: Existing annotation: locustag=FUN_ genenumber=44886 [05/10/24 11:10:43]: Input reads: ('fun/training/left.fq.gz', 'fun/training/right.fq.gz', None) [05/10/24 11:10:43]: Quality trimmed reads: ('fun/training/trimmomatic/trimmed_left.fastq.gz', 'fun/training/trimmomatic/trimmed_right.fastq.gz', None) [05/10/24 11:10:44]: FASTQ headers seem compatible with Trinity [05/10/24 11:10:44]: Normalized reads: ('fun/training/normalize/left.norm.fq', 'fun/training/normalize/right.norm.fq', None) [05/10/24 11:10:44]: Long reads: (None, None, None) [05/10/24 11:10:44]: Long reads FASTA format: (None, None, None) [05/10/24 11:10:44]: Long SeqCleaned reads: (None, None, None) [05/10/24 11:10:44]: /venv/opt/pasa-2.4.1/bin/seqclean trinity.fasta -c 16 [05/10/24 11:10:54]: seqclean running options: seqclean trinity.fasta -c 16 Standard log file: seqcl_trinity.fasta.log Error log file: err_seqcl_trinity.fasta.log Using 16 CPUs for cleaning -= Rebuilding trinity.fasta cdb index =- Launching actual cleaning process: psx -p 16 -n 1000 -i trinity.fasta -d cleaning -C '/trinity.fasta:ANLMS100:::11:0' -c '/venv/opt/pasa-2.4.1/bin/seqclean.psx' Collecting cleaning reports ************************************************** Sequences analyzed: 60037 ----------------------------------- valid: 59997 (681 trimmed) trashed: 40 ************************************************** ----= Trashing summary =------ by 'dust': 40 ------------------------------ Output file containing only valid and trimmed sequences: trinity.fasta.clean For trimming and trashing details see cleaning report : trinity.fasta.cln -------------------------------------------------- seqclean (trinity.fasta) finished on machine 1b6f8ceb9855 in , without a detectable error. [05/10/24 11:10:54]: minimap2 -ax splice -t 16 --cs -u b -G 3000 fun/update_misc/genome.fa fun/update_misc/trinity.fasta.clean | samtools sort --reference fun/update_misc/genome.fa -@ 4 -o fun/update_misc/trinity.alignments.bam - [05/10/24 11:11:29]: Converting transcript alignments to GFF3 format [05/10/24 11:11:34]: Converting Trinity transcript alignments to GFF3 format [05/10/24 11:11:39]: PASA database is SQLite: /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/training/pasa/Plecoglossus_altivelis_pasa [05/10/24 11:11:39]: /venv/bin/cdbfasta fun/update_misc/genome.fa [05/10/24 11:11:40]: 3574 entries from file fun/update_misc/genome.fa were indexed in file fun/update_misc/genome.fa.cidx [05/10/24 11:11:40]: Running PASA annotation comparison step 1 [05/10/24 11:11:40]: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/pasa/annotCompare.txt -g /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/genome.fa -t /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/trinity.fasta.clean -A -L --CPU 16 --annots /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/genome.gff3 [05/10/24 15:30:21]: Running PASA annotation comparison step 2 [05/10/24 15:30:21]: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/pasa/annotCompare.txt -g /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/genome.fa -t /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/trinity.fasta.clean -A -L --CPU 16 --annots /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/pasa/Plecoglossus_altivelis_pasa.gene_structures_post_PASA_updates.595.gff3 [05/10/24 21:45:35]: copying final PASA GFF3 to output: fun/update_misc/pasa/Plecoglossus_altivelis_pasa.gene_structures_post_PASA_updates.512138.gff3 [05/10/24 21:45:35]: Using Kallisto TPM data to determine which PASA gene models to select at each locus [05/10/24 21:45:35]: Building Kallisto index [05/10/24 21:45:35]: /venv/opt/pasa-2.4.1/misc_utilities/gff3_file_to_proteins.pl fun/update_misc/pasa_final.gff3 fun/update_misc/genome.fa cDNA [05/10/24 21:46:25]: [05/10/24 21:46:25]: kallisto index -i fun/update_misc/getBestModel/bestModel fun/update_misc/getBestModel/transcripts.fa [05/10/24 21:48:18]: [build] loading fasta file fun/update_misc/getBestModel/transcripts.fa [build] k-mer length: 31 [build] warning: clipped off poly-A tail (longer than 10) from 8 target sequences [build] warning: replaced 395 non-ACGUT characters in the input sequence with pseudorandom nucleotides [build] counting k-mers ... done. [build] building target de Bruijn graph ... done [build] creating equivalence classes ... done [build] target de Bruijn graph has 194151 contigs and contains 51646035 k-mers [05/10/24 21:48:18]: Mapping reads using pseudoalignment in Kallisto [05/10/24 21:48:18]: kallisto quant -i fun/update_misc/getBestModel/bestModel -o fun/update_misc/getBestModel/kallisto --plaintext -t 16 fun/training/trimmomatic/trimmed_left.fastq.gz fun/training/trimmomatic/trimmed_right.fastq.gz [05/10/24 21:51:26]: [quant] fragment length distribution will be estimated from the data [index] k-mer length: 31 [index] number of targets: 49,735 [index] number of k-mers: 51,646,035 [index] number of equivalence classes: 76,826 [quant] running in paired-end mode [quant] will process pair 1: fun/training/trimmomatic/trimmed_left.fastq.gz fun/training/trimmomatic/trimmed_right.fastq.gz [quant] finding pseudoalignments for the reads ... done [quant] processed 28,638,946 reads, 21,219,188 reads pseudoaligned [quant] estimated average fragment length: 176.615 [ em] quantifying the abundances ... done [ em] the Expectation-Maximization algorithm ran for 1,142 rounds [05/10/24 21:51:27]: Parsing Kallisto results. Keeping alt-splicing transcripts if expressed at least 10.0% of highest transcript per locus. [05/10/24 21:51:30]: Wrote 45,628 transcripts derived from 44,117 protein coding loci. [05/10/24 21:51:31]: bedtools intersect -sorted -v -a /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/genome.trna.gff3.sorted.gff3 -b /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/annotation~Funannotate/fun/update_misc/bestmodels.gff3.sorted.gff3 [05/10/24 21:52:06]: Validating gene models (renaming, checking translations, filtering, etc) [05/10/24 21:52:18]: Writing 44,826 loci to TBL format: dropped 0 overlapping, 1 too short, and 0 frameshift gene models [05/10/24 21:52:19]: Converting to Genbank format [05/10/24 21:52:19]: /venv/bin/python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i fun/update_misc/tbl2asn/genome.tbl -f fun/update_misc/genome.fa -o fun/update_misc/tbl2asn --sbt /venv/lib/python3.8/site-packages/funannotate/config/test.sbt -d fun/update_results/Plecoglossus_altivelis.discrepency.report.txt -s Plecoglossus altivelis -t -l paired-ends -v 1 -c 16 [05/10/24 21:58:40]: Collecting final annotation files [05/10/24 21:59:45]: Comparing original annotation to updated original: fun/predict_results/Plecoglossus_altivelis.gff3 updated: fun/update_results/Plecoglossus_altivelis.gff3 [05/10/24 22:17:25]: Updated annotation complete: ------------------------------------------------------- Total Gene Models: 44,826 Total transcripts: 46,338 New Gene Models: 102 No Change: 31,535 Update UTRs: 12,962 Exons Changed: 198 Exons/CDS Changed: 29 Dropped Models: 3 CDS AED: 0.004 mRNA AED: 0.046 ------------------------------------------------------- [05/10/24 22:17:26]: Funannotate update is finished, output files are in the fun/update_results folder [05/10/24 22:17:26]: There are 4 gene models that need to be fixed. [05/10/24 22:17:26]: Manually edit the tbl file fun/update_results/Plecoglossus_altivelis.tbl, then run: funannotate fix -i fun/update_results/Plecoglossus_altivelis.gbk -t fun/update_results/Plecoglossus_altivelis.tbl [05/10/24 22:17:26]: After the problematic gene models are fixed, you can proceed with functional annotation. [05/10/24 22:17:26]: Your next step might be functional annotation, suggested commands: ------------------------------------------------------- Run InterProScan (Docker required): funannotate iprscan -i fun -m docker -c 16 Run antiSMASH: funannotate remote -i fun -m antismash -e youremail@server.edu Annotate Genome: funannotate annotate -i fun --cpus 16 --sbt yourSBTfile.txt -------------------------------------------------------