QC~seqkit

simple statistics of FASTA/Q files

input_1:FASTA/FASTQ(.gz)

input_1/ecoli.fasta

input_1/fastq_runid_4de97058536f00a50a5594d603041572795f8954_0.fastq

Command

QC~seqkit -c 8 -m 32 input_1/

Output

seqkit.stats.txt

file                                                                   format  type  num_seqs     sum_len    min_len    avg_len    max_len   Q1         Q2     Q3  sum_gap        N50  Q20(%)  Q30(%)
input_1//ecoli.fasta                                                   FASTA   DNA          1   4,641,652  4,641,652  4,641,652  4,641,652    0  4,641,652      0        0  4,641,652       0       0
input_1//fastq_runid_4de97058536f00a50a5594d603041572795f8954_0.fastq  FASTQ   DNA      4,000  25,135,131          5    6,283.8    107,136  991      2,887  7,672        0     13,646   45.11   25.74

view all outputs

Log

pp QC~seqkit -c 8 -m 32 input_1/
Checking the realpath of input files.
0 input_1/
1 /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/QC~seqkit/input_1/ecoli.fasta
1 /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/QC~seqkit/input_1/fastq_runid_4de97058536f00a50a5594d603041572795f8954_0.fastq
/home/yoshitake.kazutoshi/files/m256y -> /suikou/files/m256y/yoshitake.kazutoshi/work
/home/yoshitake.kazutoshi/files/m256y/pp-dev -> /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev
/home/yoshitake.kazutoshi/files/m256y/pp-dev/yoshitake -> /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake
/home/yoshitake.kazutoshi/files/m256y/pp-dev/yoshitake/test -> /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test
/home/yoshitake.kazutoshi/files/m256y/pp-dev/yoshitake/test/QC~seqkit -> /suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/QC~seqkit
/suikou/files/m768/yoshitake.kazutoshi/work/ecoli
/suikou/files/m256y/yoshitake.kazutoshi/work
/suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev
/suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake
/suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test
/suikou/files/m256y/yoshitake.kazutoshi/work/pp-dev/yoshitake/test/QC~seqkit
centos:centos6 quay.io/biocontainers/seqkit:0.12.1--0
using docker
file                                                                   format  type  num_seqs     sum_len    min_len    avg_len    max_len   Q1         Q2     Q3  sum_gap        N50  Q20(%)  Q30(%)
input_1//ecoli.fasta                                                   FASTA   DNA          1   4,641,652  4,641,652  4,641,652  4,641,652    0  4,641,652      0        0  4,641,652       0       0
input_1//fastq_runid_4de97058536f00a50a5594d603041572795f8954_0.fastq  FASTQ   DNA      4,000  25,135,131          5    6,283.8    107,136  991      2,887  7,672        0     13,646   45.11   25.74

  1.  file      input file, "-" for STDIN
  2.  format    FASTA or FASTQ
  3.  type      DNA, RNA, Protein or Unlimit
  4.  num_seqs  number of sequences
  5.  sum_len   number of bases or residues       , with gaps or spaces counted
  6.  min_len   minimal sequence length           , with gaps or spaces counted
  7.  avg_len   average sequence length           , with gaps or spaces counted
  8.  max_len   miximal sequence length           , with gaps or spaces counted
  9.  Q1        first quartile of sequence length , with gaps or spaces counted
  10. Q2        median of sequence length         , with gaps or spaces counted
  11. Q3        third quartile of sequence length , with gaps or spaces counted
  12. sum_gap   number of gaps
  13. N50       N50. https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics#N50
  14. Q20(%)    percentage of bases with the quality score greater than 20
  15. Q30(%)    percentage of bases with the quality score greater than 30
  16. GC(%)     percentage of GC content

PID: 1838626