for i in *.bam; do (samtools view $i | grep -v "^M_SOLEXA" | cut -f 9 | awk '{ if ($1 > 0) print }'; samtools view $i | grep "^M_SOLEXA" | cut -f 10 | /mnt/solexa/bin/src/pipe_length.py) | sort -n | uniq -c > ${i/bam/lengths}; done
That bit of code took me the better part of a week to figure out, not to mention the downstream analysis it complicated for another couple of days.
It's the parenthesis that do it. So simple, yet so elusive.
What the code does is it takes multiple files of DNA sequences (and other info) and makes another file that contains the lengths of the sequences and how many sequences are of that length. Without the parenthesis I had to make two files. With them, it all goes into one.
Someone who knows what they're doing can do that in less than 5 minutes. They could even do it the 2 file way I was doing it, and still complete the analysis in a day, I'm sure.
I've got a lot to learn.
0 comments:
Post a Comment