26 March 2012

Assemblies are measured by the size and accuracy of their contigs and scaffolds. Assembly size is usually given by statistics including maximum length, average length, combined total length, and N50. The contig N50 is the length of the smallest contig in the set that contains the fewest (largest) contigs whose combined length represents at least 50% of the assembly. The N50 statistics for different assemblies are not comparable unless each is calculated using the same combined length value. Assembly accuracy is difficult to measure. Some inherent measure of accuracy is provided by the degrees of mate-constraint satisfaction and violation . Alignment to reference sequences is useful whenever trusted references exist.

@x=sort{$b<=>$a}@x;
my $count=0; my ($n50,$n80,$n90);
my $len_mean=$tlen/$n; foreach my$i(@x)
{
$count+=$i;
$n50=$i if (($count>=$tlen*0.5)&&(!defined $n50));$n80=$i if (($count>=$tlen*0.8)&&(!defined$n80));
$n90=$i if (($count>=$tlen*0.9)&&(!defined $n90)); }  #/usr/bin/perl -w use strict; my ($len,$total)=(0,0); my @x; while(<>){ if(/^[\>\@]/){ if($len>0){
$total+=$len;
push @x,$len; }$len=0;
}
else{
s/\s//g;
$len+=length($_);
}
}
if ($len>0){$total+=$len; push @x,$len;
}
@x=sort{$b<=>$a} @x;
my ($count,$half)=(0,0);
for (my $j=0;$j<@x;$j++){$count+=$x[$j];
if (($count>=$total/2)&&($half==0)){ print "N50:$x[$j]\n";$half=$x[$j]
}elsif ($count>=$total*0.9){
print "N90: $x[$j]\n";
exit;
}
}


or run this command as before:
Code:

perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;[email protected],$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my$j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90:$x[\$j]\n";exit;}}' contigs.fa