Is there a recommended cutoff for Hardy Weinberg Equilibrium and minor allele frequency for whole genome analysis?
We are really not sure if there are accepted rules of thumb, so a more cautious answer is, that it depends. There are certain population structures where large departures from HWE are legitimate, as well as regions of the genome prone to copy number deletions that could result in large departures from HWE. If you ignore that and assume the p-values were uniformly distributed, then by chance alone for say a 500K dataset, the number of p-values < .01 out of 500,000 would be 500,000*.01=5000. So if you picked a threshold of .01, you would be throwing away 5000 snps that by chance alone have a HWE < .01. Some people pick a .001 cutoff, so they are throwing away no more than 1/10th of a percent of the real data (perhaps much more of the bad data). Again, if you had reason to believe the departures from HWE were real, you might modify this. It would be worthwhile to see how many snps fail your threshold and to see if it is what is expected by statistical chance. In a very high quality data s