metagenome~metaWRAP

MetaWRAP binning module. This script should be run after assemble~megahit.

input_1:paired-end FASTQ(.gz) files

input_1/120304_AlnA4_10m_02-08_1.fastq

@SNL162:264:HHT3LBCXX:1:1104:3236:1913 1:N:0:TAAGGCGA+GCGTAAGA
GCTGTATAACATGCTTTTATAAAACCAGGTGATAAAATATTGGGTCTTGATTTATCTCATGGCGGACATCTTACTCATGGTTCTTCAGTTAATTTTAGTGG
+
DBDDD@G11<CGH1111<CCF1FEF0<<D<1<1<1<1111<1<1<1<1<1<FHFIIE1<@<?1/<<C/CEEF1G1<1FG11<1<1<C@<11<F11<DC<GE
@SNL162:264:HHT3LBCXX:1:1104:4529:1931 1:N:0:TAAGGCGA+GCGTAAGA
GGTCTGCACCGGTCAGGTTTGCGCCGGCCAGGTCCACACGGCCGAGTTGTGTATCAGTCAACCGTGCACTGGCCAGGTTCGCGTCGGTCAGGGTCGCGCTC
+
DDDDDEIGGH<C<C<DHGHI0D</DH//<CH1DGGE1@</</<<<F@GHIFEEFHG1<DFCHD/<1<11<<C1<FH1<1CHHDEHHH?GHH?ECEHIIIHE
@SNL162:264:HHT3LBCXX:1:1104:7853:1832 1:N:0:TAAGGCGA+GCGTAAGA
NCCTTATCTAGAAGAAATAAATAGATGCTCCTAATCTTTCTTAAAAGAATTTTTGTAGCTATACCTGTTCTTTTGGTAGTAACTAGTTTAACTTTTATTTT

input_1/120304_AlnA4_10m_02-08_2.fastq

@SNL162:264:HHT3LBCXX:1:1104:3236:1913 2:N:0:TAAGGCGA+GCGTAAGA
TCACAGAACTTTTCAAAATCCGGATCGCTTGAATGTTNCGTTGCAGCGGAACTTAGCAATTTTCGTTTTTCTTTTTGCGCTATCTCTTTATTTTTATCATA
+
0<0<011<D11<<D11111110/00/0011111<1<1#111<1011<//<//<<1111111<11<<11<<111<1<C1<///0<<111<11<1<1D1C?1E
@SNL162:264:HHT3LBCXX:1:1104:4529:1931 2:N:0:TAAGGCGA+GCGTAAGA
ATCCCCAACTGCATAGTGCGCAACTAGCCAGCCTGACNCTGCCGTCCGCTAACCGCTCCAATCAACCGTTGACTGATACACACCTCCCCCGTGTGCCCCTG
+
<D0D0E1D=<DHI?11<<C///</<1111111111<D#1111</0/</<</011///<<F11<<<F1/<CGCE?@11<11<10<<<1<CD/00<1<11<<<
@SNL162:264:HHT3LBCXX:1:1104:7853:1832 2:N:0:TAAGGCGA+GCGTAAGA
NCTTTCTCACTTAAAAAGGGTCCACCAGGAGCCAGGCNAATCAAAATAAAAGTTAAACTAGTTACTACCAAAAGAACAGGTATAGCTACAAAAATTCTTTT

input_1/120311_AlnA21_10m_02-08_1.fastq

@SN700395R:394:HHTGFBCXX:1:1105:1858:2228 1:N:0:AGGCAGAA+GCGTAAGA
GGTCCAATTCATGGGACGCTGCTGGAAAACCCATTGTTTGGGACATGATTCACTATGATGTCCAGCTGATTGGTGGAGTCGCCATGCACCAAGGTAAAATT
+
A@DDBIIIIIIHIHIIIIHHIIHHHIIIHHGHHHEIIEHHHIHHIIIIIIIIIHHIIIIIIIIIIIIIIIIIIHIIIIFIHIIFHIIIICHHHHIIIIIIH
@SN700395R:394:HHTGFBCXX:1:1105:2782:2250 1:N:0:AGGCAGAA+GCGTAAGA
ACTATGGATGGTCAAGATATTTCTTGGGATCAATTTAATTTAAATGTTAAGGCTCAGCTTAATGCGATGAAAGCTTCTAGCAAAAGAGTTGTTTTATTGCT
+
@D?@DHEIHIIIIICHHHHHHHIIHHIHHECHHEHGHHIIII?GHEHHCHEHCGHHHHIHE@CEFH=EHEHCGHHIHI?HH1<F1DGGHHHHIIHIIIIII
@SN700395R:394:HHTGFBCXX:1:1105:3513:2088 1:N:0:AGGCAGAA+GCGTAAGA
NTTGTGGATTGGTCTAATTCTTCACAGAACGTCGCCAAGACGCTGCGCGCGGGGAGCGGCGTTCTGTACAAGGTTTCGCCCCTGTCTCTTATACACATCTC

input_1/120311_AlnA21_10m_02-08_2.fastq

@SN700395R:394:HHTGFBCXX:1:1105:1858:2228 2:N:0:AGGCAGAA+GCGTAAGA
TCATCTACCAATACCGAGTCAATTTCATCGACTATGGCGTAGTGGTGGGGTTTCTGTACCAAATCGGTTGTGGAATGCGCCATATTATCCCGCAAGTAGTC
+
ADA?DHIIIIIHIIIHIHHIEH1GHHIHGIHHIHH?HHH<<?CFHI?E?HCEHIIIEHHHHHHDCEHGEHICEHIIGIDEGHEHHHHIIIIGIHHHEHHHI
@SN700395R:394:HHTGFBCXX:1:1105:2782:2250 2:N:0:AGGCAGAA+GCGTAAGA
CAATAAAACAACTCTTTTGCTAGAAGCTTTCATCGCATTAAGCTGAGCCTTAACATTTAAATTAAATTGATCCCAAGAAATATCTTGACCATCCATAGTCT
+
@DDDD?@HHHHI@HHEHHGHHIIGHHHCHHHHHHHHEEEHIHCH@@FHE?GHHHHIIH?EHIIIHHHIFI?1FG@HH1@D<1CGHI?FG?GHHH?<<FHGH
@SN700395R:394:HHTGFBCXX:1:1105:3513:2088 2:N:0:AGGCAGAA+GCGTAAGA
GGGCGAAACCTTGTACAGAACGCCGCTCCCCGCGCGCAGCGTCTTGGCGACGTTCTGTGAAGAATTAGACCAATCCACAATCTGTCTCTTATACACATCTG

input_2:reference genome file

input_2/final.contigs.fa

>k119_168610 flag=1 multi=3.0000 len=309
GGTGAAGCCTGTTCCCGGTTAAATGTAGAAGTGCATGCCTATTGCCTACTTAAAGACGGTTATCATCTTGTCGTGAAGACTCCAGAGGCGAATCTTAGTCGCTTCATGCGTCAAGTCGACGGCCTATACACTCAACAGTATCAAAAAATGAAGGGTAGTGACGGGTCGCTTTTCAGAGGCCGTTACAAAGCCGTGCTAGTTCAAGCCGAGAAATACCTGCTGCCACTATCGAGCTTTGTACATACGCGAGTGAAGGCCTCTGAGCTGAATAGCTACCCATGGTCCAGCTACTGTCTCTTATACACATCT
>k119_89266 flag=0 multi=1.0000 len=317
GGGTGTTTAATTGGATCTTCTGAGTCATCCATTAATAGATTTTGTTCTGAAACATAAGCTTCATATGTAATTTCATCATTTTCGGCTAACAAATGATAAAAAGGCTGATCTTTTCTAGGACGAACATTCTTTGGAATAGATTGGTACCATTCTTCAGAATTATTAAATTCAAAATCTACATCATATATAACACCTCTAAACTCAAAGTGTTTATGTTTTACAATGTCTCCTATTGAAAATTTTGCTTTTTGAATTGCCATTGTTATTAATATGGGGGTTTAAAAATAAAAATCAATAAAAAACATTTATATTTTTTT
>k119_307462 flag=1 multi=2.0000 len=314
AAAGGGCCGTGCGACCGGCGCAATTTCAGAAAGTGGCCGAGTATTTTTGGCAGATCGGGTCATTTGCAATGGTGATCCACCAACCGTCTATGCTCAAATGTTGCCTGCGGACGGTAACCGCCGCAAGCGGTTGGCTTCCGATCGAATGACACAATATTCGATGGGGCTTTTCGTGTTGTTTTTTGGTACCAAGCGAAAATATCCCAAAATTGCACACCATACAATCTGGATGGGCCCTCGATATAAAGAACTACTTGCGGATATTTTTGACAAAAAAATACTAGCGGACGATTTTTCGCTTTATGTACACCGCC
>k119_138856 flag=0 multi=3.5735 len=255
CACCAAAACAAAAAAAGTTAAGTTTAAAGCTAAAATTGAAATTTTAAATAAAGCAAAAAAGCTAAGCCAAACCCCTATAGTTGCAATAGGGGGAATTAATATTAATAATTATAAAAAACTGTTATTGAACAATGCTAATTTATTAGCTATTTCAGGCTACGTTTGGAATAATAAAAAATATAAACCCTTAGAAACTATAGAAAAACTCAAATGAAAATTAATGCTGGAGAAATAAGAGTTGGAATGCTTTTAGAA
>k119_0 flag=0 multi=1.0000 len=213
TGCAATACATCTTGCATGCCCTCTATTTTCTTTCATATTTATAATTTCAATAGAGTTAATATTTTCAGTACTTTGATAATCATCAATTATCTGCTGAGAAGATGCATCGTTAATAACAATAATTGAAATCTCAGAATTGATACCTTCTATTTCAGAATTTATATTTTCAATTAGCTTATTTAAGCTCTCTCGATCGTTATAAATTGGAATTAA

Option

input_1/ input_2/final.contigs.fa

Output

BIN_REFINEMENT/metawrap_50_10_bins/bin.1.fa

FASTA files for each bin
>k119_138868
ATTTCTCATGATGCAGCTCATGGTGTTGCGGTAAAGAGTAAATTTTGGAATAAATTTTTATTTTCACTTAGTTTTAATTTACAAGGAAATAACGCCTATGTGTGGGGTAAAAACCACAATGAATCTCACCATTTATACACAAATATAGAAGGAAGCGACATAGATGTCTTAAATAATCCACTGCTAAGAATGACGGCAACACAGCCTTTAAAATCTTTTCATAAGTACCAATATCTTTACGCGCCGGTACTCAGCCTATTGTACTCTATTAATTGGTTTTTTATTCGTGATATCCTTATGCTTTTCAGAAAATCTAGTAGAACCATTAAAGTAGATATGCCTATTGTAGAGGTTGTTAAACTTGTTTTGTTTAAGCTACTATACATAGGCTACATGATTATATTACCCGCATATTTACTGCCCTTTGGAATCTATAATGTTATAATCGCGTTTATTTTAAATCACTTTATTGTTTCTATAATATTCACAAGTGTTTTAGGGGTTTCTCACCTCTCAGACTATGTAGCGCACCCTAGCCCTAATGAAGATGGGAAACTACCAATTAGTTGGCCAACCCTACAGATGACTACCTCTGTAGACTATAATGCTAATAGCACATTTCTAAACTGGACTTTAGGAGGGTTTAATGCCCATGCAATGCATCATATTCTACCTAATATAAGTCATGTCCATTATTTAAACATACTCCCTATATTTATTGATACGGCAAAAAAACATGGAGTAAACTACATGAATATGTCATATAAAGAATCTCTTAGCTCATACTATAGGTTTCTTTATAAAATGGGACGTCAATCATCAATAACACCACTTGTATATGAAAGCAAAGCATCTTAATATCCCTGCAGATAATGATTTGTATACCTTCATTTACAATCAAATAAAACTAAAACTACCACTAGACAAAAGCAAAGCAAATAGATACTTTGTCCTTAAGGGATTGTTATATATAACTATAACCATCAAAAGCTCTGTGCTTATTTAGACAATAAACAATCCTGT
>k119_257895
GCTTCACACTCTCTAAAATGGACCTAACAGGCTCTCCTGGATTAAAGACCACACTAGCTAAAAGGCCTTCCTTTACAAAAGGATCTAAATATGAAGCGAATGTCTCTCTTTTCTCCTCAGAGGCGTCTCCAACATGGACAAGAACCAATGATGCCTTAAACAATAAAGCAAGACGGGCGCTCTCGCAAATATTTGCTTTTAAGTTTGGTGAGAAGGTAACTCCAATGCCAATAGTGTTAAATTTTTTTTGGATCAAAACAATGAATTCTTAAGTGTGTTTTCTTTGACGCAAGATATTTTTGTGCCGAAAGGGAAAGATACGTCTTTTTTAAAAGTTTGGAAATATATTTTAAACAACCCGGTTTAAAAGCTTTAAATTATTCTTAAATAATTTTAACAATTCAAGGGTTCTTTCTACTTTGCATATAAATACAACACCTATGTTAGAACTTGCTGGAATTATCATCTTAGGAATTTTTGCACAATGGATTGCTTGGAAAGTAAAAATTCCAGCTATTTTACCTCTTATCTTAATTGGTTTGGCCGTTGGGCCTATGTCAACGTTTTACTCTGAAGACGGGACACAATGGATACAGCCTATATGGAACTCTCAAAAAGGGTTTTTCCCCGGAGAAAGTTTGTTTTATTTTGTTTCCTTGGCCATCGGAATTATACTGTTTGAAGGTGGTTTAACCCTTAAGATGACCGAAATTAAAAAAGTTGGTGGTGTTATTGGAAAGCTTATAAGCCTTGGCTCGATTGTTACTTTTTTTGGAGCGGGTGTGGCTGCACATTATTTTTTGGGCCTTGATTGGCAAGTTTCTTTTTTGTTCTCTGGGCTTATTATTGTTACAGGACCAACGGTAATAACACCTATTTTAAGAAACATTCCACTGCGCAAAGACGTTTCGGCAGTTTTGAAATGGGAGGGTATTTTAATAGACCCTATTGGGGCTCTTGTTGCGGTTTTAGTCTTTGGTTTTATTAGCGTTAATATACCCACCCAAGGGATTGAAGAAATAGCCACTCAGGCCCATGGCTCTGGAGGCAGCTATACAAAACACGCCCTTTTAGAATTTGGAAAAATTATAGTTATTGGATTTGCTTTTGGTCTAGCCGGAGGTTTTGCAATGTACCATGCTGTAAAAAGAAAAGTGATTCCTCACTATCTTTTAAATGTAGCCTCACTCTCTATGGTGCTGCTAATTTTTGTGTTGAGTGATTTGTTTGCTCACGAGTTTGTCCTATTGGCTGTAGTTGTGATGGGGATGTACCTAGGAAACAGCGATTTGCCTAACTTAAAAGAACTGTTATATTTTAAAGAATCTTTAAGCGTACTGCTTATATCCATCTTGTTTATTTTGCTTTCTGCAAACATTAGCATTGATGATTTACTACTTATATATAACTGGGAAACAGCGGTGTTATTTGCTGTGGTTA
>k119_119053
ATAAATGATTGCAGGCTTTGTGTTATTAAACAAAACACTAGTGAGGCTGCAAAAAGCATTCACTATACAAACTACCCTCTAGCGGCTTTGAAACTCAATGGGGTTAAATCTAACGCAAACAACTTCCTTTAGATAGCGGTATGCTATTTCTTTTTTGACCAGCCACAGCCCCGCCCTATTACATGTATAATAACAAAAATGCCTTCTGGAAAACCAGAAGGCATTTTATATATACTTAAAAATAGTAGCTTATGCTTCTACCTCTTGGGTTTTAACCTCTAGTAAGTCACTAGCTTTTTTCTTCACAGTATCATACATCACTGGGGTTGCAATAAACAGTGAAGAGTAGGTTCCTACTACAACACCTACCATAAGGGCAAATATAAGTCCTCGGATAGATTCTGCTCCAATAAAGAAGATACACAACAGTACAATTAATGTAGTTAAAGAGGTATTTAACGTACGGCTTAATGTACTATTTAATGCGCTGTTAATT

output_checkM.txt

The assignment of taxonomy for each bin by CheckM.
[2022-01-19 03:49:56] INFO: CheckM v1.1.3
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bin Id            Marker lineage            # genomes   # markers   # marker sets    0     1    2    3   4   5+   Completeness   Contamination   Strain heterogeneity
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  bin.12   c__Gammaproteobacteria (UID4444)      263         505           231         2    500   3    0   0   0       99.71            0.70               0.00
  bin.2     f__Rhodobacteraceae (UID3340)         84         568           330         11   555   2    0   0   0       97.73            0.20               0.00
  bin.4      o__Actinomycetales (UID1593)         69         400           198         21   377   2    0   0   0       94.22            0.61              50.00
  bin.7     f__Rhodobacteraceae (UID3356)         67         615           329         42   552   21   0   0   0       93.25            3.57              76.19
  bin.9         s__algicola (UID2846)             47         571           303         64   489   18   0   0   0       87.34            2.98              61.11
  bin.5    c__Gammaproteobacteria (UID4443)      356         451           270         82   358   11   0   0   0       87.04            1.96              81.82
  bin.1         s__algicola (UID2847)             33         496           263        102   366   26   2   0   0       75.80            4.00              46.88
  bin.8      p__Proteobacteria (UID3880)         1495        242           151         72   165   5    0   0   0       70.56            2.98              40.00
  bin.10   c__Gammaproteobacteria (UID4443)      356         451           270        131   314   6    0   0   0       67.96            0.92              33.33
  bin.13   c__Gammaproteobacteria (UID4443)      356         451           270        149   289   13   0   0   0       60.64            2.61              61.54
  bin.15    f__Rhodobacteraceae (UID3356)         67         615           329        241   300   68   6   0   0       59.50            9.53               1.16
  bin.6           k__Archaea (UID2)              207         145           103         63    77   4    1   0   0       54.88            4.37              85.71
  bin.14        k__Bacteria (UID2569)            434         278           186        116   147   15   0   0   0       52.91            1.72              20.00
  bin.3    c__Gammaproteobacteria (UID4443)      356         451           270        198   229   24   0   0   0       50.35            5.90              37.50
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

images/120304_AlnA4_10m_02-08.bin.15.png

TPM vs contig length graph for each bin

view all outputs

Log