CCEGA HAPMAP Simulator - Disease ModelThis page describes the disease model file format |
||
|
We allow for three different disease model specification types. The absolute genotype (AG) format specifies the P(genotype|disease) for each joint genotype for the L disease loci. There are a total of 3L such joint genotypes. The AG probabilities should add to 1.0, although there is some allowance for rounding error. Another specification types is genotype relative risk (GRR), which specifies the values P(disease | genotype)/P(disease|referent genotype). GRR also requires that the user specify the overall probability of disease. GRR requires that at least one of the joint genotypes has a relative risk value of 1 (and is therefore the referent genotype). Finally, the disease model can be specified in terms of the absolute risk of disease (AR), which is P(disease|genotype) for each joint genotype. For GRR and AR, the HapMap data are used by HAP-SAMPLE to automatically convert the values into the appropriate values P(genotype|disease) from which disease genotypes will be drawn.
The 3L joint genotypes are assumed to be ordered by sorting on the last of the disease loci, then sorting on the second-to-last disease locus, etc (example below). The order in which disease loci are specified is entirely up to the user. Currently HAP-SAMPLE allows only one disease locus per chromosome. File formats are given in more detail below, with comments following "#" characters. The comments should not appear in the actual disease model files. Example 1: AG format, one disease SNP This is for the genotypes in the following order:1 # L=number of disease loci rs868559 # the causal SNP. Must be in the HapMap CEU data 0.005 # disease prevalence AG # tells HAP-SAMPLE to use AG format 0.4225 # P(genotypes | disease) start here 0.455 0.1225 0 1 2 Example 2: AG format, two disease SNPs, one per chromosome 2 # L=number of disease loci rs9439462 # first causal SNP rs4662920 # second causal SNP 0.01 # disease prevalence AG # tells HAP-SAMPLE to use AG format 0.541696 # P(genotypes | disease) start here 0.270848 0.033856 0.094208 0.047104 0.005888 0.004096 0.002048 0.000256 This is for the genotypes in the following order: Example 3: Genotype Relative Risk 2 # L=number of disease loci rs9439462 # first causal SNP rs868559 # second causal SNP 0.001 # disease prevalence GRR # tells HAP-SAMPLE to use GRR format 1.1 # relative risks start here 2.2 3.3 1 # at least one relative risk must be 1 2 3 0.3 0.6 0.92 # L=number of disease loci rs9439462 # first causal SNP rs868559 # second causal SNP 0.05 # disease prevalence (this value is actually ignored in AR format) AR # tells HAP-SAMPLE to use AR format 0.005 # P(disease|genotype) starts here 0.02 0.03 0.005 0.002 0.003 0.001 0.002 0.003 |
||