Links

three lab members working in the field

Data analysis tools

DAM (DOTUR-ARB Matching)

DAM (DOTUR-ARB Matching) program matches a list of query sequences (belonging to a discrete group [i.e. phylum-level cluster] as determined by ARB) to a distance matrix determined OTU group encompassing all sequences in a given query list (as created by the DOTUR program). This allows determination of the percent sequence similarity at which groupings of sequences are identified as a discrete OTU.

DOTMAN (DOTUR Manipulation)

DOTMAN (DOTUR Manipulation) queries selected OTUs (based on DOTUR bins) against a sequence database and generate FASTA files from a user given file. The program is given a range of DOTUR distance values, a DOTUR list file, and a file in FASTA format containing the sequences corresponding to the ID’s in the list file. For each distance value d DOTMAN makes one FASTA file for each of the n largest OTUs. n is user set and less than or equal to the total number of OTUs for a distance d.

Example datafiles from KBS-LTER papers

FinalSet5000

Example of complete list of sequences from GC fractionated clone library.

Sequences can also be downloaded from Genbank using accession numbers EU352912-EU357802


>005.F8
GAGAGGTGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGTTTCTAAAG
GCGCTTTTGTAAGTTATTTTTCAAAGACCGAAGCTCAACTTCGGGAAGGG
AAGTAATACTGCAAGAGTTGAAATATTTCGGGGTTACTGGAACTATCGGT
GTAGGGGTGAAATCCGTTGATATCGATAGGAACTCCAAGGGCGAAGGCAG
GTAACTAGGAATTTTTTGACGCTGATGAACGAAAGCTAGGGGAGCGAAAG
GGATTAGAGACCCCTGTAGTCCTAGCCGTAAACTATGCTCGCTAGACCAG
TGGATTTATCTGCTGGACGTAAGCTAACGCGTGAAGCGAGCCGCCTGGGG
AGTACGACCGCAAGGTTA
>036.F8
GAGAGGTGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGTTTCTAAAG
GCGCTTTTGTAAGTTATTTTTCAAAGACCGAAGCTCAACTTCGGGAAGGG
AAGTAATACTGCAAGAGTTGAAATATTTCGGGGTTACTGGAACTATCGGT
GTAGGGGTGAAATCCGTTGATATCGATAGGAACTCCAAGGGCGAAGGCAG
GTAACTAGGAATTTTTTGACGCTGATGAACGAAAGCTAGGGGAGCGAAAG
GGATTAGAGACCCCTGTAGTCCTAGCCGTAAACTATGCTCGCTAGACCAG
TGGATTTATCTGCTGGACGTAAGCTAACGCGTGAAGCGAGCCGCCTGGGG
AGTACGACCGCAAGGTTA
>019.F8
GAGAGGTGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGTTTCTAAAG
GCGCTTTTGTAAGTTATTTTTCAAAGACCGAAGCTCAACTTCGGGAAGGG
AAGTAATACTGCAAGAGTTGAAATATTTCGGGGTTACTGGAACTATCGGT
GTAGGGGTGAAATCCGTTGATATCGATAGGAACTCCAAGGGCGAAGGCAG
GTAACTAGGAATTTTTTGACGCTGATGAACGAAAGCTAGGGGAGCGAAAG
GGATTAGAGACCCCTGTAGTCCTAGCCGTAAACTATGCTCGCTAGACCAG
TGGATTTATCTGCTGGACGTTAGCTAACGCGTGAAGCGAGCCGCCTGGGG
AGTACGACCGCAAGGTTA
>325.F4
GTGAGGTACAAGCGTTGCCCGGATTTACTGGGCGTAAAGTGTTTCGTAGG
CGCTTTTGTAAGTTATGCTTCAAAGACCGAGGCTCAACCTCGGGAAGGGG
TGTAATACTGCAAGAGTTGAGATATTTTGGGGCTACTGGAACTATCGGTG
TAGGGGTGAAATCCGTTGATATCGATAGGAACTCCAAGGGCGAAGGCAGG
TAGCTAAGAATATTCTGACGCTGAGGAACGACAGCTAGGGGAGTGAAAGG
GATTAGAGACCCCTGTAATCCTAGCCGTAAACTATGCTCGCTAGTCCCCC
GGAGTAATTTGGGGGACGTAAGCTAACGCGTGAAGCGAGCCGCCTGGGGA
GTACGGTCGCAAGACTA
>050.F10
GTGAGGTACAAGCGTTGCCCGGATTTACTGGGCGTAAAGCGTTTCGTAGG
CGCCCAATCGCATCTTCTTTCAAAGCCCGGAGCTTAACTTCGGAAAGGGA
GAAGAGATGGATTTGGGTTGAAATATTTCGGAGCTATTGGAACTATCGGT
GTAGGGGTGAAATCCGTTGATATCGATAGGAACTCCAAGGGCGAAGGCAG
ATAGCTAGGAATCATTTGACGCTGAGGAACGAAAGCTAGGGGAGCGAAAG
GGATTAGAGACCCCTGTAGTCCTAGCCGTAAACTATGCTCGCTACCCCGT
GGATTCGTTCACGGGGGTAAGCTAACGCGTGAAGCGAGCCGCCTGGGGAG
TACGGCCGCAAGGCTA
>093.F8
AGAGGTCACAAGCGTTATCCGGATTTATTGGGCGTAAAGCGTTTCGTAGG
TGGGTTCATAAGTTATCCTTTAAAGACTACGGCTCAACCGGAGGAAGGGG
GATAATACTGTRAGTCTTGATTTTTGGCGGGGCATCTGGAACTGATGGTG
TAGTAGTGAAATACGTTGATATCATCAGGAACTCCAAGGGCGAAGGCAGG
ATGCTAGCCAATTAATGACACTGAGGAACGACAGCTAGGGGAGCGAAAGG
GATTAGAGACCCCTGTAGTCCTAGCCGTAAACTATGCTCGCTAGGGATTT
TACCGTAAGGTTGAGTCCCGTAAGCTAACGCGTTAAGCGAGCCGCCTGGG
GAGTACGACCGCAAGGTTA
>355.F10
AGAGSTTWYAAGCGTTATCCGGATTTATTGGGCGTAAAGCGTTTCGTAGG
CGGATTTTTAAGTTACCCTTCAAAGACTACGGCTTAACCGGAGGAAGGGG
GGTAATACTGAAAGTCTTGATTTTTAGTGGGGTATCTGGAACTGATGGTG
TAGTAGTGAAATACGTTGATATCATCAGGAACTCCGAGGGCGAAGGCAGG
ATACTAACTATCATATGACGCTGAGGAACGACAGCTAGGGGAGCGAAAGG
GATTAGAGACCCCTGTAGTCCTAGCTGTAAACTTTGCTCGCTAGGGATTT
GGGAYKTATTCCGAGTTCCGTAAGCTAACGCGTTAAGCGAGCCGCCTGGG
GAGTACGACCGCAAGGTTA
>048.F8
AGAGACCTCAAGCGTTATCCGGAATCATTGGGCGTAAAGCGTACCGATAG
GTGGTTTACAAAGTCAGAAGTGAAATCTCTCAGCTTAACTGGGCGACTGT
CTTTTGAAACTTGTAAACTTGAGGGGCAAAGAGGAAGCTGGAACAAACGG
TGTAGTAGTGAAATGCGTTGATATCGTTTGGAACACCAATAGCGTAGGCA
GGCTTCTGAGTGCCACCTGACACTGCTAGGACGAAAGCGTGGGGAGCGAT
AAGGATTAGATACCCTTGTAGTCCACGCTGTAAACGATGATGATTAGGTG
CTAGAGAGTATCGACCCTCTTTAGTACCATAGCTAACGCGTTAAATCATC
CGCCTGGGGAGTACGGCCGCAAGGCTA

FinalSet500

Example from first 500 sequences from GC fractionated clone library.


>001.F2    371  bp        rna
GGGGGGGGCA AGCGTTGTTC GGAATTACTG GGCGTAAAGG GCGCGTAGGC
GGTTTGCTAA GTTGGATGTG AAAACTCTGG GCTTAACCCG GAGCCTGCAT
CCAAAACTGG CAAACTTGAG TACTGGAGGG GAAAGCGGAA TTCCTGGTGT
AGCGGTGAAA TGCGTAGATA TCAGGAGGAA CACCGGTGGC GAAGGCGGCT
TTCTGGACAG TAACTGACGC TGAGGCGCGA AAGCTAGGGG AGCAAACAGG
ATTAGATACC CTGGTAGTCC TAGCCCTAAA CGATGGATAC TTGGTGTGAG
GGGGATTGAA TCCCTTCGTG CCGTAGCTAA CGCAATAAGT ATCCCGCCTG
GGGAGTACGG TCGCAAGGCT G
>004.F2    373  bp        rna
GTAGGGGGCA AGCGTTGTCC GGATTCATTG GGCGTAAAGA GCTCGTAGGC
GGCTTGGCAA GTCGGGTGTG AAAACTTCAG GCTCAACCTG GAGCGGCCAC
TCGATACTGC CATGGCTTGA GTCCGGTAGG GGACCACGGA ATTCCTGGTG
TAGCGGTGAA ATGCGCAGAT ATCAGGAGGA ACACCGGTGG CGAAGGCGGT
GGTCTGGGCC GGAACTGACG CTGAGGAGCG AAAGCGTGGG GAGCGAACAG
GATTAGATAC CCTGGTAGTC CACGCCGTAA ACGTTGGGCA CTAGGTGTGG
GGACCTATCG ACGGTTTCCG TGCCGTAGCT AACGCATTAA GTGCCCCGCC
TGGGGAGTAC GGCCGCAAGG CTA
>005.F2    374  bp        rna
GGAGGGTGCG AGCGTTGTCC GGAATCATTG GGCGTAAAGG GCGCGTAGGT
GGCCCGGTCA GTCTTTGGTG AAAGCGCGGG GCTCAACCCT GCGTCGGCCA
GGGATACTGC CGCGGCTCGA GCACTGTAGA GGCAGGCGGA ATTCCGGGTG
TAGCGGTGGA ATGCGTAGAG ATCCGGAAGA ACACCGGTGG CGAAGGCGGC
CTGCTGGGCA GTTTTGCTGA CACTGAGGCG CGACAGCGTG GGGAGCAAAC
AGGATTAGAT ACCCTGGTAG TCCACGCCGT AAACGATGGG CACTAGGCGC
TTGGGGGAGC GACCCCCCGA GGGCCGGCGC TAACGCATTA AGTGCCCCGC
CTGGGGAGTA CGGCCGCAAG GCTG
>006.F2    374  bp        rna
GGGGGGAGCA AGCGTTGTTC GGATTTACTG GGCGTAAAGG GCGCGTAGGC
GGCCACCGCA AGTCGACTGT GAAATCTCCG GGCTTAACTC GGAAAGGTCA
GCCGATACTG CGGGGCTAGA GTGCAGAAGG GGCAACTGGA ATTCTCGGTG
TAGCGGTGAA ATGCGTAGAT ATCGAGAGGA ACACCTGCGG CGAAGGCGGG
TTGCTGGGCT GACACTGACG CTGATTGTGC GAAAGCTAGG GGAGCGAACG
GGATTAGATA CCCCGGTAGT CCTAGCCTTA AACGATGAAT GCTTGGTGTC
TGGGGTTTTA TAGTCCCCGG GTGCCGCCGC TAACGCTTTA AGCATTCCGC
CTGGGGAGTA CGGTCGCAAG ACTG
>007.F2    371  bp        rna
AGAGGGTGCT AGCGTTGTTC GGAATCATTG GGCGTAAAGG GCGTGTAGGC
GGTTTGTTAA GTCATGTGTG AAATCCCTCG GCTCAACCGG GGAACGACGC
ATGAAACTGG CAAGCTAGAG TACCAAAGAG GGGGGTGGAA TTCCCGGTGT
AGCGGTGAAA TGCGTAGATA TCGGGAGGAA CACCTGTGGC GAAGGCGGCC
CCCTGGTTGG ATACTGACGC TGATACGCGA AAGCGTGGGG AGCAAACAGG
ATTAAATACC CTGGTAGTCC ACGCTGTAAA CGATGGGCAC TAGGTGTCCG
GGGTATTGAC CCCCTGGGTG CCGCAGCTAA CGCATTAAGT GCCCCGCCTG
GGGAGTACGG TCGCAAGATT A
>008.F2    371  bp        rna
GGAGGGGGCA AGCGTTACTC GGAATTATTG GGCGTAAAGG GCGCGTAGGC
GGTCGTGTGC GTCGGAGGTG AAATCCCCGG GCTTAACCCG GGAGCTGCCT
CCGATACGGC ATGGCTTGAG TCCGGGAGAG GGGAGCAGAA TTCCCAGTGT
AGCGGTGAAA TGCGTAGATA TTGGGAGGAA TACTGGTGGC GAAGGCGGCT
CCCTGGACCG GTACTGACGC TGAGGCGCGA AAGCGTGGGT AGCAAACAGG
ATTAGATACC CTGGTAGTCC ACGCCGTAAA CGATGGGTGC TTGGTGTCGG
GGGTATCGAC CCCTCCGGTG CCGAAGCTAA CGCATTAAGC ACCCCGCCTG
GGGAGTACGG TCGCAAGGCT G
>009.F2    373  bp        rna
GGGGGGGGCA AGCGTTGTTC GGAATTACTG GGCGTAAAGG GCGCGTAGGC
GGTCAGACCA AGTCGAGTGT GAAGTTCCAG GGCTTAACTC TGGCACGCTC
GCTCGATACT GGTCGGCTAG AGTGTGGAAG AGGATGCTGG AATTCCCGGT
GTAGCGGTGA AATGCGTAGA TATCGGGAGG AACACCAGTG GCGAAGGCGG
GCATCTGGGC CAACACTGAC GCTGAGGCGC GAAAGCTAGG GGAGCAAACA
GGATTAGATA CCCTGGTAGT CCTAGCCTTA AACGATGATG ACTTGGTGTG
TCGGGTTTGT AGTCCCGATG TGCCGGAGCT AACGCGTTAA GTCATCCGCC
TGGGGAGTAC GGTCGCAAGA CTG
>011.F2    371  bp        rna
GAAGGGTGCA AGCGTTAATC GGAATTACTG GGCGTAAAGG GTGCGTAGGC
GGCTGTTTAA GTCTGTCGTG AAATCCCCGG GCTCAACCTG GGAATGGCGA
TGGATACTGG GCAGCTAGAG TGTGTCAGAG GATGGTGGAA TTCCCGGTGT
AGCGGTGAAA TGCGTAGAGA TCGGGAGGAA CATCAGTGGC GAAGGCGGCC
ATCTGGGACA ACACTGACGC TGAAGCACGA AAGCGTGGGG AGCAAACAGG
ATTAGATACC CTGGTAGTCC ACGCCCTAAA CGATGCGAAC TGGATGTTGG
TCTCAACTCG GAGATCAGTG TCGAAGCAAA CGCGTTAAGT TCGCCGCCTG
GGGAGTACGG TCGCAAGACT G

nonGC500


Clone library sequences from non-GC fractionated library.

>KBS.197    371  bp        rna
GTAGGTGGCA AGCGTTGTCC GGATTTACTG GGCGTAAAGA GCGCGCAGGC
GGTCGTTTAA GTCGAATGTG AAAGCCCCCG GCTCAACTGG GGAGGGTCAT
TCGATACTGA TCGACTTGAA GGCAGGAGAG AGAAGTGGAA TTCCCGGTGT
AGTGGTGAAA TGCGCAGATA TCGGGAGGAA CACCAGTGGC GAAGGCGACT
TCCTGGCCTG TTCTTGACGC TGAGGCGCGA AAGCTAGGGT AGCAAACGGG
ATTAGATACC CCGGTAGTCC TAGCCGTAAA CGATGGACAC TAGGTGTTGG
TGGTATCAAC CCCGCCAGTG CCGTAGCTAA CGCATTAAGC GCCCCGCCTG
GGGAGTACGG CCGCAAGGCT A
>KBS.263    371  bp        rna
AGAGGGTGCA AGCGTTGTTC GGAATTATTG GGCGTAAAGC GCGTGTAGGC
GGCTTGGCAA GTCGGGTGTG AAATCCCTCA GCTTAACTGA GGAAGTGCGC
CCGAAACTGC CGAGCTTGAG TACCGGAGAG GGTGGCGGAA TTCCCCAAGT
AGAGGTGAAA TTCGTAGATA TGGGGAGGAA CACCGGTGGC GAAGGCGGCC
ACCTGGACGG ATACTGACGC TGAGACGCGA AAGCGTGGGG AGCAAACAGG
ATTAGATACC CTGGTAGTCC ACGCCGTAAA CGATGAGAAC TAGGTGTCGT
GGGTGTTGAC CCCTGCGGTG CCGTAGCTAA CGCATTAAGT TCTCCGCCTG
GGAAGTACGG CCGCAAGGCT A
>KBS.407    371  bp        rna
GTAGGTGGCA AGCGTTGTCC GGATTTACTG GGCGTAAAGA GCGCGCAGGC
GGTCGTTCAA GTCGCGTGTG AAAGCCCCCG GCTCAACTGG GGAGGGTCAC
GCGATACTGA TCGACTCGAA GGCAGGAGAG GGAAGTGGAA TTCCCGGTGT
AGTGGTGAAA TGCGTAGATA TCGGGAGGAA CACCAGTGGC GAAGGCGACT
TCCTGGCCTG TTCTTGACGC TGAGGCGCGA AAGCTAGGGG AGCAAACGGG
ATTAGATACC CCGGTAGTCC TAGCCGTAAA CGATGGACAC TAGGTGTTGG
TGGTATCAAC CCCGCCAGTG CCGAAGCTAA CGCATTAAGT GTCCCGCCTG
GGGAGTACGG CCGCAAGGCT A
>KBS.443    371  bp        rna
GTAGGTGACA AGCGTTGTCC GGATTTACTG GGCGTAAAGA GCGCGCAGGC
GGTCGATCAA GTCGAGTGTG AAAGCCCCCG GCTCAACTGG GGAGGGTCAT
TCGAAACTGG TCGACTCGAA GGCAGGAGAG GGTAGTGGAA TTCCCGGTGT
AGTGGTGAAA TGCGTAGATA TCGGGAGGAA CACCAGTGGC GAAGGCGACT
ACCTGGCCTG TTCTTGACGC TGAGGCGCGA AAGCTAGGGG AGCAAACGGG
ATTAGATACC CCGGTAGTCC TAGCCGTAAA CGATGGACAC TAGGTGTTGG
TGGTATCAAC CCCGCCAGTG CCGAAGCTAA CGCATTAAGT GTCCCGCCTG
GGGAGTACGG CCGCAAGGCT A
>KBS.505    371  bp        rna
GAAGGGTGCA AGCGTTACTC GGAATTACTG GGCGTAAAGC GTGCGTAGGT
GGTTTGTTAA GTCTGATGTG AAAGCCCTGG GCTCAACCTG GGAATTGCAT
TGGATACTGG CAGGCTAGAG TGCGGTAGAG GATGGCGGAA TTCCCGGTGT
AGCAGTGAAA TGCGTAGAGA TCGGGAGGAA CATCTGTGGC GAAGGCGGCC
ATCTGGACCA GCACTGACAC TGAGGCACGA AAGCGTGGGG AGCAAACAGG
ATTAGATACC CTGGTAGTCC ACGCCCTAAA CGATGCGAAC TGGATGTTGG
GAGCAATTAG GCTCTCAGTA TCGAAGCTAA CGCGTTAAGT TCGCCGCCTG
GGGAGTACGG TCGCAAGACT G
>KBS.540    371  bp        rna
GTAGGTGGCA AGCGTTGTCC GGATTTACTG GGCGTAAAGA GCGCGCAGGC
GGTCGTTCAA GTCGCGTGTG AAAGCCCCCG GCTCAACTGG GGAGGGTCAC
GCGATACTGA TCGACTCGAA GGCAGGAGAG GGAAGTGGAA TTCCCGGTGT
AGTGGTGAAA TGCGTAGATA TCGGGAGGAA CACCAGTGGC GAAGGCGACT
TCCTGGCCTG TTCTTGACGC TGAGGCGCGA AAGCTAGGGG AGCAAACGGG
ATTAGATACC CCGGTAGTCC TAGCCGTAAA CGATGGACAC TAGGTGTTGG
TGGTATCAAC CCCGCCAGTG CCGAAGCTAA CGCATTAAGT GTCCCGCCTG
GGGAGTACGG CCGCAAGGCT A
>KBS.673    372  bp        rna
AGAGGGTGCG AGCGTTGTTC GGAATTACTG GGCGTAAAGC GCGCGCAGGC
GGCTTCTTAA GTCTGGCGGT CAAATGCCGG GGCTCAACCC CGAGCGTGCC
GCGGATACTG GGGAGCTGGA GACGAGTAGA GGCAAGCGGA ATTCCGGGTG
TAGCGGTGGA ATGCGTAGAG ATCCGGAAGA ACACCGGAGG CGAAGGCGGC
TTGCTGGGCT CGGTCTGACG CTGAGGCGCG AAAGCGTGGG GAGCGAACAG
GATTAGATAC CCTGGTAGTC CACGCCGTAA ACGATGGGCA CTAGACGCGG
GGGGGATCGA CCCTCTCCGT GTCGAAGCTA ACGCGATAAG TGCCCCGCCT
GGGGAGTACG GCCGCAAGGC TG
>KBS.678    371  bp        rna
GAAGGGTGCA AGCGTTACTC GGAATTACTG GGCGTAAAGC GTGCGTAGGT
GGTTTGTTAA GTCTGATGTG AAAGCCCTGG GCTCAACCTG GGAATTGCAT
TGGATACTGG CAGGCTAGAG TGCGGTAGAG GATGGCGGAA TTCCCGGTGT
AGCAGTGAAA TGCGTAGAGA TCGGGAGGAA CATCTGTGGC GAAGGCGGCC
ATCTGGACCA GCACTGACAC TGAGGCACGA AAGCGTGGGG AGCAAACAGG
ATTAGATACC CTGGTAGTCC ACGCCCTAAA CGATGCGAAC TGGATGTTGG
GAGCAACTAG GCTCTCAGTA TCGAAGCTAA CGCGTTAAGT TCGCCGCCTG
GGGAGTACGG TCGCAAGACT G

Office of Research & Educational Opportunities for Student (OREOS)