Statistical Issues in the Analysis of Quantitative Traits in Combined Crosses
 Fei Zou,
 Brian S. Yandell and
 Jason P. Fine
 Corresponding author: Fei Zou, Department of Statistics, 1210 W. Dayton St., Madison, WI 53706. Email: feizou{at}stat.wisc.edu
Abstract
We consider some practical statistical issues in QTL analysis where several crosses originate in multiple inbred parents. Our results show that ignoring background polygenic variation in different crosses may lead to biased interval mapping estimates of QTL effects or loss of efficiency. Threshold and power approximations are derived by extending earlier results based on the OrnsteinUhlenbeck diffusion process. The results are useful in the design and analysis of genome screen experiments. Several common designs are evaluated in terms of their power to detect QTL.
QUANTITATIVE trait analysis has many applications in plant and animal breeding and in human genetics. Mapping quantitative trait loci (QTL) that influence agriculturally important traits such as grain yield in rice or milk production in cows can help scientists produce specimens with more desirable qualities. Complex human diseases, like breast cancer and diabetes, are known to have genetic etiologies. Animal models may be useful in studying their origins.
Most existing statistical methods have been developed for experimental designs with a single cross from two inbred parents (Lander and Botstein 1989; Haley and Knott 1992; Zeng 1993, 1994; Jansen and Stam 1994). Doerge et al. (1997) provided a comprehensive review of methodologies for detecting and locating genes affecting quantitative traits in experimental breeding populations. However, quantitative traits are often influenced by several genes with large effects (major QTL) and many genes with relatively small effects (polygenes). In animal science, where outbred parental populations are available, the polygenic effect has been taken into consideration (Fernando and Grossman 1989). In horticulture, less attention has been paid to genes with small effects, perhaps because researchers are able to rely on simple crosses such as F_{2} or backcross (BC).
The effects of polygenes on standard approaches to major QTL mapping are not well understood. With a single cross, the progeny have identical relationships given the QTL genotypes, resulting in a compound symmetry structure (Yandell 1997, Ch. 25). Thus, unbiased estimates of QTL effects are still obtained when the polygenic effect is ignored, even though the power to detect the QTL is influenced by the magnitude of the polygenic effect. The situation is more complicated with several crosses, since the correlations may not be the same among all individuals. To avoid this difficulty, researchers may analyze data for each cross separately and then compare and combine the results in some fashion. Hence, some power to detect the QTL may be lost and estimates of QTL effects may be less precise.
Recently, methods were proposed to analyze all crosses simultaneously. Bernardo (1994) used Wright's relationship matrix A to accommodate differential correlations when analyzing diallel crosses. However, when closely related crosses are from a small number of inbred lines, it is more reasonable to treat the polygenic effect as fixed. Rebai et al. (1994a) extended the regression method of Haley and Knott (1992) to several F_{2}'s from a diallel design of multiple inbred lines with all effects fixed. Elston (1990) proposed models for discriminating among modes of inheritance, including onelocus, twolocus, polygenic, and mixed major locus/polygenic inheritance when considering the F_{1} and the reciprocal backcrosses derived from two inbred lines. The polygenic effect is treated as fixed and different phenotypic means and variances are used for different crosses. However, flanking marker information is not utilized and an estimate of the QTL position is not provided.
In this article, we consider an arbitrary number of crosses from multiple inbred lines. While we were preparing this manuscript, Liu and Zeng (2000) proposed a fixedeffect model to analyze combined crosses from multiple inbred lines (with or without overlapping inbred lines). Our model includes both QTL and polygenic effects and is a special case of their heteroscedastic model in the sense that the fixed effect and the variance component identify the polygenic effect. For this reason, we refer readers to Liu and Zeng (2000) for the analysis of combined lines. Our focus is the practical implications of the polygenic effects for QTL mapping, specifically bias and efficiency. Furthermore, we calculate threshold values for controlling the genomewise type I error rate. Theoretical approximations were developed to address threshold and power (Lander and Botstein 1989; Dupuis and Siegmund 1999; Rebai et al. 1994b, 1995) in some standard designs. However, these methods are either impractical or inappropriate with combined crosses. Our general formulas are widely applicable and easy to implement.
SIMULATION STUDY OF BIAS AND EFFICIENCY
If one combines different crosses simultaneously but ignores the different relationships among individuals, substantial bias may result. In this section, we show the effect of polygenes on the QTL estimates. We examine two crosses, BC1 and F_{2}, from common inbred parents P1 and P2. Although the design is simple, it illustrates the key issues. The additive effect of a single major QTL is set to 0 (i.e., no QTL) and 5, respectively, with no dominance effect. Five markers are located at 0, 20, 40, 60, and 80 cM. The major QTL is located at 30 cM. The environmental errors are identically distributed for BC1 and F_{2} and are sampled from N(0, 25). One hundred individuals from BC1 and F_{2} are simulated without background polygenes or with 10 background polygenes. The 10 background polygenes are in coupling phase and have common additive effects (i.e., allele substitution effect α_{k}, k = 1, 2,..., 10) 1 or 2 (see Fernandoet al. 1994). This leads to expected polygenic differences between F_{2} and BC1 of 5 and 10, respectively. We fit the model using Liu and Zeng (2000), hereafter called “model P.” In addition, we employed traditional interval mapping by ignoring the polygenic effects, hereafter called “model N.” For each parameter combination, 100 simulated datasets were analyzed. The results are presented in Table 1.
We observe that when there are no polygenes both models consistently estimate the QTL effects. However, model P gives more accurate estimates than model N when there are polygenic effects. The bias of model N increases as the expected polygenic differences between BC1 and F_{2} increase. In summary, our simulations indicate that when analyzing combined crosses, the polygenic model produces more precise and less biased estimates than the traditional interval mapping method.
THRESHOLD AND POWER CALCULATIONS
On the basis of the simulations in the above section, fitting combined crosses (Liu and Zeng 2000) has many advantages. Calculating thresholds and power is an important practical issue in the design and analysis of such studies. The usual pointwise significance level based on the chisquare approximation is inadequate because the entire genome is tested for the presence of a QTL. Lander and Botstein (1989) showed that with an infinitely dense map, the LOD score may be approximated in large samples by an OrnsteinUhlenbeck diffusion process for BC. Dupuis and Siegmund (1999) derived a similar result for F_{2}. These approximations provide formulas for the threshold and power.
For more general models (Liu and Zeng 2000), no such approximation is available. Churchill and Doerge (1994) used a randomization idea to calculate the threshold. The approach is applicable for all designs, with a dense or sparse map. However, the method is computationally intensive. In addition, since the thresholds depend on the observed data, it is unclear how to compare various designs. Rebai et al. (1994b, 1995) gave an upper bound for the threshold for BC and F_{2} based on Davies (1977, 1987). The calculation is formidable, even for an F_{2} population, and is not exact. Piepho (2001) proposed an efficient numerical method to compute the thresholds in Rebai et al. (1994b, 1995) for general designs.
Our approach extends the OrnsteinUhlenbeck large sample approximations. It is quite simple and practically useful. Calculating the threshold and power under different map distances can be accomplished with closedform expressions arising from the OrnsteinUhlenbeck setup. Simulations shown below indicate this works well with realistic sample sizes.
Two inbred strains: In this section, we consider combined crosses from two inbred parents (P1 and P2), BC1, F_{2}, and BC2. Our goal is to extend Dupuis and Siegmund (1999). Generalizing the results to other designs (Liu and Zeng 2000) is straightforward. In the sequel, we assume an equispaced marker map. Suppose
The distribution of 2 LR(d) depends on
To demonstrate the OrnsteinUhlenbeck equivalence, the covariances at different loci d_{1} and d_{2} are proved to be
General Results: The derivations above can be generalized to more complicated models, including those in Liu and Zeng (2000). Advanced crosses, such as F_{x}, x ≥ 2, and models with covariates are also possible. Our framework can be modified for a wide variety of designs.
As before, let the model be
The formula for power may also be obtained. However, it is quite complicated and is omitted here.
SIMULATION STUDY OF THRESHOLDS AND POWER
We investigated the performance of (1) with different marker distances and different polygenic backgrounds. Thresholds for the loglikelihood were based on interval mapping with combined BC1, F_{2}, and BC2 crosses. n_{1} = n_{2} = n_{3} = 100, giving 300 observations in total and chromosome length = 100 cM. The marker interval lengths are set at 10, 5, and 2 cM, respectively. Different polygenic effects are sampled, as reflected by models a–d (see legend of Table 2 for details; Table 3). The approximations from (1) with v(a{2βΔ}^{1/2}) are always smaller than the empirical thresholds derived in the simulations. However, as the interval length decreases, our approximations are more similar to the empirical thresholds. In general, the dense map assumption (v = 1) produces conservative thresholds. Since more markers are likely to be typed around promising loci (Lander and Kruglyak 1995), the stringent thresholds based on a dense map should be used even with a sparse map. Also, the approximations provide conservative control of the genomewise type I error rate. Note that (1) gives upper and lower bounds for the threshold with v = 1 (assuming a dense map) and with v(a{2βΔ}^{1/2}) (using the true map distances), respectively.
Next, we evaluate the power with different proportions of BC1, F_{2}, and BC2. The power is calculated for dominant (δ_{1} =δ_{2}) and additive (δ_{2} = 2δ_{1}) models. We compare our results with those of Dupuis and Siegmund (1999) for the dominant model. We use the same values of the noncentrality parameter. In theory, as the proportion of F_{2} approaches 1, our power approximation should agree with Dupuis and Siegmund (1999). Other noncentrality parameters in Dupuis and Siegmund (1999) show the same pattern and are omitted. For the additive model, the comparisons are qualitatively similar.
Figures 1 and 2 exhibit the power curves. When the polygenes are in linkage equilibrium and have only additive effects, the phenotypic variation due to polygenes and environment satisfies
In Figure 1, the proportions of BC1 and BC2 are assumed equal. When the QTL is dominant, power is gained by using BC populations unless there is no polygenic effect (i.e., λ_{2} is close to 1). The larger the polygenic effects, the greater is the gain with BCs. However, when the QTL is additive, F_{2}'s tend to have more information for detecting a QTL than do BCs, unless σ_{P} » σ_{e} (i.e., the polygenic effects are very large). Note that when the proportion of F_{2} approaches 1, our results again match those of Dupuis and Siegmund.
In Figure 2, we allow the proportions of BC1 and BC2 to be unequal with a dominant QTL. In this case, BC1 is more powerful than F_{2}, which is expected. When the QTL is additive, both BC1 and BC2 individuals have identical contributions in detecting the QTL, so only the total proportion of BC1 and BC2 influences the power, as shown in Figure 1a.
CONCLUSION
In this article, we addressed some important practical issues in the analysis of closely related crosses derived from multiple inbred lines when both QTL and polygenes influence a trait. We showed that biased and inefficient estimates of the QTL effects may occur if the polygenic effect is ignored. We derived simple and general approximations for the threshold and power to detect a QTL, allowing different designs to be compared.
Based on our power calculations, we find that the F_{2} population is more robust in detecting QTL than the two backcross populations. This confirms Liu and Zeng (2000). Thus if the goal is to detect the QTL, then using a large F_{2} population is highly recommended. However, scientists may not be able to produce enough F_{2} individuals or may for other reasons use different crosses. In this situation, analyzing all the data simultaneously is preferred. This strategy improves the power to detect major QTL. In addition, this is an opportunity to detect potential polygenic effects. The derivation of the threshold approximation is easily extended to other designs beyond the combination of BC1, F_{2}, and BC2. However, to our knowledge, the theoretical computation of thresholds involving multiple QTL is an open problem.
Acknowledgments
We thank anonymous reviewers for their critical reading of this manuscript. This research was supported in part by the U.S. Department of Agriculture Hatch project through the University of Wisconsin, College of Agricultural and Life Sciences.
APPENDIX
In this section, for combined crosses from two inbred parents, we prove that the likelihood ratio 2 LR(d) can be partitioned into the sum of the squares of two asymptotically independent OrnsteinUhlenbeck processes through an orthogonal transformation. Define
Making the orthogonal transformation
For the same reason,
For BC1,
Footnotes

Communicating editor: ZB. Zeng
 Received August 14, 2000.
 Accepted April 18, 2001.
 Copyright © 2001 by the Genetics Society of America