搜档网
当前位置:搜档网 › Pre-harvest Forecasting of County Wheat Yield and Wheat Quality Conditional on Weather Information

Pre-harvest Forecasting of County Wheat Yield and Wheat Quality Conditional on Weather Information

Pre-harvest Forecasting of County Wheat Yield and Wheat Quality Conditional on Weather Information

Byoung-Hoon Lee

Graduate Research Assistant

Department of Agricultural Economics

Oklahoma State University

Stillwater, OK 74078

bh.lee@https://www.sodocs.net/doc/a718389621.html,

Philip Kenkel

Professor and Fitzwater Chair for Cooperative Studies

Department of Agricultural Economics

Oklahoma State University

516 Agricultural Hall

Stillwater, OK 74078

phil.kenkel@https://www.sodocs.net/doc/a718389621.html,

B. Wade Brorsen

Regents Professor and Jean & Patsy Neustadt Chair

Department of Agricultural Economics

Oklahoma State University

414 Agricultural Hall

Stillwater, OK 74078

wade.brorsen@https://www.sodocs.net/doc/a718389621.html,

Selected Paper prepared for presentation at the Southern Agricultural Economics Association Annual Meeting, Corpus Christi Texas, February 5-8, 2011.

Copyright 2011 by Byoung-Hoon Lee, Philip Kenkel, and B. Wade Brorsen. All rights reserved. Readers may make verbatim copies of this document for non-commercial purpose by any means, provided that this copyright notice appears on all such copies.

Abstract:

Wheat regression models that account for the effect of weather are developed to forecast wheat yield and quality. Spatial lag effects are included. Wheat yield, protein, and test weight level are strongly influenced by weather variables. The forecasting power of the yield and protein models was enhanced by adding the spatial lag effect. Out of sample forecasting tests confirm the models’ usefulness in accounting for the variations in average wheat yield and qualities.

Key words: prediction, protein, spatial lag, test weight, weather, wheat yield

1.Introduction

Winter wheat production in the Southern Plains is a mostly dry land crop with substantial year-to-year variation in yields and quality due to rainfall, temperature and other weather events. If wheat yield and wheat quality response to weather conditions could be predicted early and accurately, the information could be widely used. The information could be particularly important to farmers optimizing late season agronomic and marketing decisions and to grain elevators and millers for purchasing decisions. Thus, there has been increasing interest in the use and development of robust crop weather response models.

Numerous models have been estimated to predict crop yield based on weather conditions. Two main prediction approaches are simulation models and multiple regression models. A number of comprehensive agricultural simulation models are now available to predict yield and variability of wheat. Jones and Kinir (1986) suggested a model to simulate the effects of genotype and weather conditions on crop yield, Duchon (1986), Claborn (1998), Bannayan, Crout, Hoogenboom (2003), and Tsvetsinskaya et al. (2003) predicted yields using weather forecasts and scenarios using the Crop Environment Resource Synthesis (CERES) simulation model. For the Great Plains, Eastering et al. (1998) and Wang et al. (2006) used the Erosion Productivity Impact Calculator (EPIC) model and Eastering et al. (1998) found spatial disaggregation of climate data enhance predictions. Using CERES-Wheat model, Weiss et al. (2003) investigated the responses of wheat yield and end-use quality using nitrogen management and planting dates data. The simulated results depended on spatial locations and climate changes, and also soil water stress and management of nitrogen strongly influenced yield distributions and kernel nitrogen content. Walker (1989) combined simulation and multiple regression to develop physiologically and regionally weighted drought indices from temperature and precipitation data.

The forecasts showed the indices well explain the variation of inter-regional and annual yield within a growing season.

A simulation model is designed to simulate crop yield using details about crop biology. However, as noted by Walker (1989), a simulation approach requires extensive information such as soil type, plant parameters, and weather data related with crop development stage, which are often not readily available. Tannura, Irwin, and Good (2008) argue that an important limitation of crop simulation models is that they are likely to ignore the influence of technology development over time. Bechter and Rutner (1978) and Just and Rausser (1981) found single-equation models forecast more accurately than large econometric models and we should expect a similar result for agronomic models.

Thus, many previous studies have preferred a regression approach rather than a large simulation model when the goal is forecasting. Studies using the multiple regression approach include Yang, Koo, and Wilson (1992), Dixon et al. (1994), Kandiannan et al. (2002), and Chen and Chang (2005) who used various production functions to capture the effect of climate variables on observed crop yield level and to predict crop yield. Irwin, Good, and Tannura (2008) and Tannura, Irwin, and Good (2008) modified Thompson’s (1964) corn and soybean regression model and found crop yield strongly related to weather conditions such as temperature, rainfall, technology, and other weather variables. As Tannura, Irwin, and Good (2008) and other studies have proven, multiple regression models have high explanatory power and can represent relationships between weather conditions and crop yield. Thus, the multiple regression model approach is not only easier to use, it is also likely more accurate than the simulation model approach.

Several studies investigated the influences of weather conditions, genotype, and their interaction on wheat quality. The crop maturation period, such as milk development, heading, and ripening stages are the critical stages in determining wheat quality (FAO, 2002). Graybosch et al. (1995), Johansson and Svensson (1998), Smith and Gooding (1999) and Guttieri et al. (2000), and Johansson, Prieto, and Gissen (2008) developed quality models that showed the effect of weather and environment strongly influenced protein content and test weight of wheat. Smith and Gooding (1999) argued predicting grain quality before wheat harvest would be important information to grain buyers, and to farmers to help optimize agronomic activity, particularly, a late application of nitrogen fertilizer to increase protein content (Woolfolk et al., 2002). Britt et al. (2002) estimated six yield and quality of cotton response functions and profit functions as a function of weather information and input and output prices. Regnier, Holcomb, and Rayas-Durate (2007) investigated the variations in flour and dough functionality traits associated with environmental factors and found the interaction between crop years and production regions was a significant factor for flour and dough qualities since growing conditions and climate conditions differ among the regions and across years.

Unlike previous yield regression models, most quality-related model studies did not measure prediction performance of their models and also used analysis of variance (ANOVA), Spearman rank correlation analysis or simple regression models without precise diagnostic tests for model misspecification.Therefore their methods may lead to biased and inconsistent estimates (McGuirk, Driscoll, Alwang, 1993).

The extensive previous studies have limitations. One is that the previous regression studies cited have solely estimated the impacts on yield and quality level, respectively, and did not deal with agronomic tradeoffs between yield and quality of wheat. Also few focused on

prediction and most studies did not consider out of sample forecasts but measured in sample fit. In-sample fit can be inaccurate because most models, including ours, are developed from pretesting over a large number of alternative specifications.

The other is that many of the above studies have either used data from a single location or have not used the extra information provided by spatial data. The increasing availability of spatial climate information makes it important to incorporate this new level of information to improve forecasts. Anselin (1988) explained that when using spatial data, the dependent variable at each location may be correlated with observations of the dependent variable at neighboring locations. This is defined as spatial contiguity (lag) effect. If this effect is ignored in a model specification, the estimates in the general model are likely to be biased. Therefore, in order to get more accurate forecasts, the crop response model using spatial data needs to include a spatial lag effect.

In addition, Oklahoma has two unique resources for examining the relationship between weather and wheat yields and quality. The Oklahoma Mesonet consists of 120 automated stations covering Oklahoma with one or more stations in each of Oklahoma's 77 counties. Plains Grains, Inc. (PGI) is a private, nonprofit wheat marketing organization based in Stillwater, Oklahoma. PGI evaluates wheat quality, including milling and baking quality from an extensive network of samples at the county level. These two unique data sets provide the opportunity to examine the ability to predict wheat yield and quality with weather data. These two data sets (meso-scale weather data and elevator scale quality data) are highly disaggregated. Thus, the disaggregated data sets could provide more precise wheat yield and quality predictions than was possible with the data sets used in past research.

The objective of the study is to develop wheat regression models to account for the impact of weather on wheat yield and quality and to predict (forecast) wheat yield and quality level accurately. In other words, the primary purpose of the study is to use weather information to predict wheat yield and wheat quality and to select variables and functional forms to estimate parameters and then measure how well the developed models forecast.

2.Conceptual framework

Previous studies have used knowledge about biological development stages of crops to help select the explanatory variables. Dixon et al. (1994) and Kafumann and Snell (1997) specified weather variables for their corn yield regression models that were based on biophysical stages of corn1. On the other hand, Yang, Koo, and Wilson (1992) and others used planting season and growing season precipitation and average temperature. Hansen (1991), Tannura, Irwin, and Good (2008) and others estimated the effect of calendar month precipitation and temperature variables on soybean and corn yield during crucial development periods to forecast potential crop yield. Even though biological stages of crops do not precisely correspond with calendar months, a number of previous regression response models have used weather variables defined on a monthly average calendar basis. Previous studies also assume every cross sectional location has the same development stages since it is very difficult to match the precise time point of crop development stages at every location. Another reason is the estimated results using monthly weather variables were similar with that of stage basis variables. For example, Dixon et al. (1994) compared weather variables based on biological stages with variables that based on fixed

1 The corresponding weather variables were specified based on the six weeks before and three weeks after silking point rather than calendar months basis because corn is critically sensitive to precipitation in June and mid‐July in Midwestern U.S.

calendar months and found the forecasting performance and R2 of the two models only changed slightly.

Weather strongly affects four stages2 of wheat development that determine wheat production level and qualities (FAO, 2002). Aitken (1974), Miralles and Slafer (1999), and Acevedo et al. (2002) argued mainly temperature and precipitation influence wheat development; the most crucial stages of wheat yield are from double ridge to anthesis (flowering) (GS2) and from anthesis to maturity (GS3) since kernel number and weight are being determined at that time (figure 1). Meanwhile, the influence of temperature and precipitation during grain filling are widely known to influence wheat quality characteristics. Graybosch et al. (1995), Johansson and Svensson (1998), Stone and Savin (1999), and Smith and Gooding (1999) found weather has deep impacts on grain quality; for instance, increased temperatures during grain filling tend to increase protein and reduce mean grain weight. Stone and Savin (1999) argued that 70-80 % of total protein is accumulated during the grain filling period.

Winter wheat of the southern Great Plains is typically planted in early September through the middle of November. In general winter wheat harvest begins toward the end of May in southern Oklahoma and continues until about the middle of July (IPM Center, 2005). According to crop weather summary in Oklahoma (DOA, 2000), wheat begins to double ridge and joint in February. Southwestern counties begin to head by the end of March. In April, anthesis is begun and some wheat in south Oklahoma begins the grain filling period, and finally wheat harvest begins approximately May 20th in the southern counties.

2 The stages can be categorized as germination to emergence (E), from germination to double ridge (GS1), from double ridge to anthesis (GS2), and grain filling period from anthesis to maturity (GS3) (FAO, 2002).

Using the above described general relation of weather variables and wheat by growth stages, the study selects calendar months during GS2 and GS3 and specifies appropriate calendar month weather variables for growing periods that correspond to these biological wheat development stages.

Eastering et al. (1998) used a fine spatial scale to reduce statistical bias from aggregation and confirmed the difference between the observed and estimated yield was greatly reduced when data scale was disaggregated to around 37mile × 50mile. Unfortunately, their method requires a very fine data scale and cannot be used with our data. On the other hand, Anselin (1988) assumed generally the dependent variable or residual at each location may be correlated with neighboring locations’ dependent variables or residuals. For this spatially correlated data or residuals, the dependence is termed as spatial autocorrelation or spatial lag (contiguity) effect. This indicates that dependents or residuals are spatially autocorrelated and then violate the general assumption of statistically independent observations. If the spatial lag effect is not considered, estimates will be biased and inconsistent.

In addition, in order to estimate crop response to weather conditions previous studies have used regional models using regional cross-sectional data. However, the regional data such as observed yield, quality level, and weather variables are generally aggregated considerably beyond the county level. If point estimates (weather, yield, quality) are observed near the border of neighboring regions, there is an opportunity for spatial autocorrelation. For instance, grain produced in one county could be shipped to an adjoining county (this would only affect the quality data since the yield data are based on ARS yields which are in turn based on producer reports of harvested production). Some cropland will be closer to a weather station in a neighboring county than weather stations in its own county. Thus, weather measures in a

neighboring county should help predict yield. Thus, a spatial lag model is superior and ignoring this lag would cause parameter estimates to be biased and inconsistent.

Anselin et al. (2008) and Anselin and Bera (1998) express the neighbor relation with a spatial weights matrix, and the elements of reflect the potential spatial relations between observations that correspond to the spatial weights structure. The spatial weights matrix can be expressed as binary contiguity sharing a common border, distance contiguity including nearest neighbor locations, and inverse distance between two observations.

Anselin and Bera (1998) suggest two main alternative models of spatial autocorrelation: the spatial lag model, and the spatial error model. The main purpose of the former is to predict the spatial patterns such as cluster and random correlation, while the latter is to increase the efficiency of estimates (Bongiovanni and Lowenberg-DeBoer, 2001). A spatial lag model is used here since the explanatory variables in neighboring counties are expected to help predict our dependent variables. The general regression function can be expressed as:

1

where is a vector of dependent variables, is the matrix of independent variables, and

~ 0, is a vector of stochastic error terms. The spatial lag model is

2

where is the spatial autoregressive coefficient, is N × N spatial weight matrix (Greene, 2008). This is similar to including a lagged dependent variable in a time series model, except that endogeneity is created because the lagged effects go both directions. The weights matrix is standardized so that rows sum to 1 such as = /∑ where are elements of . If

0, the dependent variable at each location is positively correlated with other location’s

dependent variables. Hence, the spatial lag model can be estimated with instrumental variables such as two stage least square (2SLS) and generalized method of moments (GMM ) or with maximum likelihood (ML) (Lambert and Lowenberg-DeBoer, 2001), and 2SLS is used here (see appendix 1).

3.Data

The wheat yield data (from 1994-2009) are from 67 counties in Oklahoma and were obtained from ‘Crop Production Report’ of United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS). Oklahoma has 77 counties, but ten of them are not included due to having little wheat acreage. The cross-sectional time-series data is composed of 1,072 observations (16 years*67 counties).

The wheat quality data are obtained from Plains Grains, Inc (PGI)3. PGI tests 96 samples that were collected on a “grainshed” basis from grain elevators when at least 30% of the local harvest was completed. The term “grainshed” was developed by PGI and represents regions within each state in which the majority of the wheat is marketed through a terminal elevator, river elevator or train loading facility (figure 2). There are 8 grain sheds in Oklahoma. PGI collects representative wheat quality samples from country or terminal elevators. Generally elevators take samples from each truckload arriving at the elevator and the grain is sampled using a hand grain probe. Each elevator directly tests these samples about test weight and moisture content and then these samples typically accumulate in a barrel. Lastly, the elevators barrel is sampled by PGI’s representative using a hand grain probe.

3 Plains Grains Inc.(PGI) is located in Oklahoma that does a wheat quality survey and quality testing of hard red winter wheat to provide end‐use quality information to the wheat buyer and producer and publish Wheat Quality Report PGI (2009).

The samples from county and terminal elevators are sent to USDA, ARS hard winter

wheat Quality Lab in Manhatan, KS. Twenty-five quality parameters are analyzed in order to provide data that specifically describes the quality of wheat (PGI, 2009). These quality data were used for the quality models and correlation analysis. The available historical quality data set from 2004 to 2010 crop years. For more precise analysis, the study matched elevators’ quality data with weather data from the closest Mesonet stations. This means one weather station per elevator was used to estimate wheat quality models, not a county average. Wheat quality characteristic data are protein content (% mb: moisture base) and test weight (lb/bu); t he quality data were from 96 elevators based on 2010 (figure 3).

Weather data (from January 1, 1994 to May 31, 2010) were obtained from the Oklahoma Mesonet. Each of Oklahoma’s 77 counties has one or more Mesonet stations. The selected daily data are daily rainfall (in), daily maximum (minimum) air temperature (°F), daily average air temperature (°F), total solar radiation (MJ m-2d-1), and growth degree days (GDD)4. For all Mesonet stations, the daily observations are aggregated to monthly averages. Generally there is one station per county. For counties with multiple stations an average of all stations in the county is used for yield models; however, quality models use only data from the closest weather station. Several weather stations were added during the study period so the closet weather station sometimes varied by year.

4.Empirical model specification

To specify accurately the underlying relationships between yield and quality variables and

weather variables, the study first examined the relationships between weather variables and yield

4GDD=[(Tmax+Tmin)/2]‐Tb, 32?F or 39.2?F as the base temperature (Tb) for physiological process in wheat(Cao and Moss, 1989), the GDD vary with growing stage and allow a rough estimation of when a given growth stage is going to occur at a particular site.

and quality level using the correlation coefficients and graphical displays using proc GAM in SAS (SAS Institute Inc. 2004). GAM allows exploration of data and visualizing structure, and is useful for investigating the relations between dependent and independent variables (see appendix table 1, figures 4-15). Appendix table 1 shows all weather variables have a high correlation with dependent variables: yield and quality level during the growing season. Precipitation shows less correlation with yield than do average temperature while both variables are associated with yield. Maximum temperature and minimum temperature have all low correlation coefficients and negatively signed with protein and test weight, however, the two variables in quality models were statistically significant. Even though solar radiation and GDD have high correlation coefficients, the variables in the models were not statistically significant and therefore those variables were excluded in the model specification. That disagrees with Dixon et al. (1994) since the solar radiation variable in their model specification is essential. Precipitation is quadratically related with yield; however, temperature has a linear relation with yield. Thus, the yield response model used linear and quadratic terms of precipitation and a linear term for temperature (see figures 4and 5). On the other hand, in the quality response model there is no evidence that weather variables have a nonlinear relation with quality. Therefore, the quality response model used a linear specification.

Meanwhile, the study considered several alternative functional forms such as parametric methods: linear, Cobb-Douglas, translog, square root, spline, and semi-parametric method which does not assume a specific functional form.

Cobb-Douglas and linear model estimates showed not only statistically significant individual coefficients, but also relatively high pseudo R2 (variance ratio) between in sample annual predicted yield and annual actual yield during 1994-2009, therefore, we selected linear form and

Cobb-Douglas form for yield response model, meanwhile, the quality response model adopted a linear form. The models have the same individual fixed effect and random effect; the functional form can be written as5

3 Linear form

4 ln

Cobb Douglas form

and also can be expressed as spatial lag model form using spatial lag term:

3.1

4.1 ln ln

where is the wheat yield of county i and time t, are individual fixed effects for counties, are the weather variables, and is a N × N spatial weights for cross-sectional dimension, ~ 0, is a stochastic error term, 0, is year random effect, and these error terms are assumed to be independent and identically distributed. The yield response model is composed of county fixed effect, year random effect, and three weather variables from February to April such as monthly average rainfall, squared average rainfall, and average temperature that correspond to before and after the anthesis period in Oklahoma because yield is mostly determined before the grain filling stage.

5 Linear form equation (3) and Cobb‐Douglas form equation (4) can be represented as matrices and vectors: and ln , ~ 0,σ and be also rewritten in expected mean form as E

and E exp σ /2 respectively. Therefore, when we compare predictions (expected values) between two functional forms accurately, these mean forms are carefully considered.

As discussed, wheat quality depends on the growth periods such as milk development,

heading, and ripening stages. In Oklahoma the wheat growth stages during March to May or

June in the northern region contribute to grain filling which relates strongly to wheat quality.

Additionally, the quality model employed agronomic tradeoff relationship between yield and

quality of wheat using the predicted yield level from yield response model and can be expressed

as

5

5.1

where is composed of either protein content (12 % mb: moisture base) and test weight

(lb/bu), and is a N × N spatial weights for cross-sectional dimension and time t since the

number of elevators vary by year, therefore weight structure also varies from year to year, for

protein; weather variables are monthly average maximum and the monthly average rainfall from

March to May, for test weight; weather variables used in this model included monthly average

rainfall for March, April, and May and maximum and minimum temperatures for April and May

were based on the heading and ripening period such as before and after anthesis season.

Estimation method and procedure

The study first tests spatial autocorrelation using proc VARIOGRAM in SAS (SAS

Institute Inc. 2004). The most generally used test for spatial autocorrelation is Moran’s I test 6

6 Moran’s I statistic is I N S

where, is a vector of dependent values for each time period , is a spatial weights matrix, N is observations, and S is the aggregation of all elements in . In general, a Moran's I

(Griffith, 1987). Proc VARIOGRAM is used to calculate the Moran's I statistic, Z score, and p-value for testing the hypothesis of no spatial autocorrelation.

The study second adopts maximum likelihood estimation method (Greene, 2008, p. 400) and tests the heteroskedasticity and nonnormality of residuals using a likelihood ratio test, and Shapiro–Wilk test. If hereroskedasticity is formed in the wheat response models’ error terms, multiplicative heteroskedasticity 7 will assumed (Greene, 2008, p. 170). If non-normality is formed, the GMM or alternative estimation ways which do not require specific distribution, or a transformation method can be used to modify.

If the dependent variable values are correlated with values of nearby locations based on the Moran’s I statistic results, the models will include the weighted dependent variable of equation (2) and be estimated using instrumental variables (see appendix 1). Using proc IML in SAS (SAS Institute Inc. 2004) spatial weights matrix for first ( ) and second order ( ) are constructed based on inverse distance between two observations and where inverse distance matrices: = 1/ up to cut off miles. At that time, GeoDa software (Luc Anselin, 2004) was used to measure Arc distances among observations for yield and cut off distance using the Oklahoma counties is 49.6 miles, otherwise 0.For quality observations, cut off distances vary over every year since the number of elevators differs by year, and therefore actual distances were used.

In addition, the developed models need to be evaluated for accuracy using out-of-sample forecasting test rather than only a fitness test using historical data. Since the models were selected by pretesting, in sample tests will overestimate their accuracy. To test the out-of-sample

statistic positive and large near one indicates positive autocorrelation while that is negative near one indicates negative autocorrelation (ESRI 2006).

7 If residuals are heteroskedastic, residual term (ε

) can be expressed as general multiplicative heteroskedasticity form: ε ~N 0, or exp α where α and are a vector of parameters and the matrix of independent variables.

forecasting power for the developed models, the yield and quality forecasts will be evaluated for 2010 out of sample. Also the forecasts will benchmark against previous actual six year average. These tests are truly out-of -sample since the models were developed before the 2010 harvest. RMSE, MAE, and Theil’s U1 coefficient8 as measures of forecasting accuracy for all developed models were used to evaluate the forecasting performance of the models. The first two forecast error statistics (RMSE and MAE) depend on the scale of the dependent variable as relative measures. The Theil coefficient is scale invariant and always lies between zero and one, that is, zero means a perfect fit (Eviews 2000).

5.Empirical Results

The study first tested spatial autocorrelation for dependent variables. Table 1shows a strong spatial lag effect for yield and protein data with a Moran’s I statistic of 0.0078 and 0.0254 and p-values of 0.0001. For test weight data, however, the p-value is 0.2642, indicating the null hypothesis : 0 no spatial lag effect could not be rejected. Therefore, the study needed to employ the yield response models in (3.1) and (4.1), and the protein response models in (5.1). Table 1. Tests of No Spatial Autocorrelation for Wheat Yield, Protein, and Test Weight

Moran's Index Expected

Index

SD z-score p-value

Yield 0.00784

*** -0.000910.00051217.09 <.0001 Protein 0.02540

*** -0.002190.0010626.03 <.0001 Weight -0.00101

*** -0.002190.00106 1.12 0.2642 Note: *** significant at 1%, Ho: no spatial autocorrelation.

The study second estimated equations (3) – (5.1) using SAS proc MIXED (SAS Institute Inc. 2004) and then the residuals of the estimated models were tested for heteroskedasticity and

8 = ∑

∑ ∑

where and are the prediction value and the corresponding actual value of county i respectively (Eviews, 2000, p. 337).

nonnormality (appendix table2). The test results showed linear yield models’ LR statistics are

7.82) that is, the null hypothesis of smaller than critical value at the 5% level ( , .

homoskedasticity was not rejected for linear yield models; while, the Cobb-Douglas yield models’ calculated LR statistics were 19.1 for general model and 16.1 for spatial model, and thus the null hypothesis was rejected at the 5% level. On the other hand, all quality models’ LR

9.47). The null that statistics were greater than critical value at the 5% level ( , .

residuals are homoskedastic was rejected, we assume multiplicative heteroskedasticity (see Greene, 2008 p. 523). Nonnormality tests showed we can reject the null of normality for all models except test weight, the only linear yield model that did not have heteroskedasticity. As appendix table 2 shows, normality of residuals is still present after correction for heteroskedasticity9.

Comparing Yield Response Models and Spatial Yield Response Models

Table 2 shows the estimated yield response models and spatial yield response models. Log likelihood statistics were used to select the proper model and in this case (-2 log likelihood), smaller is better. LR test was also used to test for spatial lag effect in the models. The null hypothesis of no spatial lag effect ( 0) was rejected. The estimated coefficients indicate how weather variables affect wheat yield. The weather variables were all significant at a critical level of 5% for all yield models. Precipitation has a positive relation with yield; while, squared

9 However, proc MIXED procedure does not provide for nonnormal residuals. Hence, in order to handle nonnormality and heteroskedasticity of residuals proc GLIMMIX procedure in SAS (SAS Institute Inc. 2004) was used. If the EMPIRICAL option (FIRORES) is specified, the procedure provides MacKinnon and White (1985)’s heteroscedasticity‐consistent covariance matrix estimators (HCMM) to estimate standard errors. The GMM procedures in GLIMMX only give OLS parameter estimates and those are not efficient. We use proc MIXED and correct for heteroskedasticity to increase efficiency. Our standard errors are not adjusted for nonnormality, but that is of less concern here since our objective is forecasting.

precipitation and temperature are negatively related to yield. This is consistent with Yang et al.

(1992). Finally, spatial yield response model’s log likelihood statistic indicated the accuracy of

the yield response models could be significantly improved by adding the spatially lagged

dependent variable (appendix table 3).

Table 2. Yield Model and Spatial Yield Model Estimates, 1994-2009

Yield Response Spatial Yield Response

Linear Cobb-Douglas

Cobb-Douglas

Linear

Coeff.p-value Coeff.p-value Coeff. p-value

Variable Coeff.

p-value

Intercept 98.621 <.0001 12.732<.000166.7900.0012 9.879 0.0004

Precipitation 0.569

<.0001 0.379<.00010.3010.0018 0.164 0.0196

Precipitation2 -0.008

<.0001 -0.049<.0001-0.0050.0002 -0.025 0.0225

Temperature -1.610

<.0001 -2.605<.0001-1.362<.0001 -2.547 <.0001

Spatial lag 0.7900.0002 0.915 <.0001

-2 Log Likelihood 6496.2 -696.86481.5-717.4

Note: A first-order and second-order spatial weight matrices were used as instruments for the spatial lag term as WX, W2X.

To measure readily how weather variables affect yield, the elasticity for weather

variables was calculated. Table 3 shows the estimated coefficients in Cobb-Douglas form are

elasticities. Therefore, precipitation elasticity at mean precipitation is calculated as 0.067 to 0.12

for yield response models and 0.069 to 0.144 for spatial yield response models, that is, as the

precipitation is increased by 1%, the average yield level would be expected to rise by 0.067% to

0.12% and 0.069 % to 0.144% in the yield response models and the spatial response models

respectively. Temperature elasticity was measured as -2.6 to -2.74 in yield models, that is,1%

rise in temperature decreases the average yield level by 2.6% to 2.74%, however, for spatial

yield models, temperature elasticity was estimated as -2.32to -2.55, so we cannot decide which

variables more affect yield level of wheat due to the units on variables are arbitrary.

Table 3. The Elasticity of Weather Variables

Response

Yield

Spatial Yield Response

Linear Cobb-Douglas Linear Cobb-Douglas

Precipitation 0.116 0.067 0.144 0.069

Temperature -2.738 -2.605 -2.316 -2.547

Forecast error statistics for all yield models are summarized in table 4. The calculated statistics showed forecasts from the yield response model were similar or slightly less accurate than spatial yield response models. The linear model was slightly less accurate out of sample

just as it was in sample. The models for yield performed better relative to the benchmark 5year year average.

Table 4. Out of Sample Forecast Error Statistics for Yield Models, 2010

Average (2005-2009) Yield Response Spatial Yield Response Forecast Errors wo/weather effects Linear Cobb-Douglas Linear Cobb-Douglas RMSE 7.436

4.035

4.420 4.087 4.172

3.095 MAE 6.136

3.410 3.146 3.247

0.00364 Theil U1 0.0081

0.003790.003700.00364 Note: County level data were not yet available, so these evaluations were done with the available district data.

Protein Response Model and Weight Response Model

The estimated quality response models of equations (5) and (5.1) for wheat characteristics: protein and test weight level are reported in table 5. Yield and weather variables were all significant at the 5% level. Precipitation and maximum temperature positively affect protein and test weight. Minimum temperature is negatively related with test weight. High yield reduces protein, while, yield positively influenced test weight. These relationships between weather variables and wheat quality are consistent with the findings of Johansson and Svensson (1998), and Smith and Gooding (1999), who found warm temperature affects crude protein positively and precipitation at the end of the season has significant positive correlation with protein concentration. For test weight temperature positively influences test weight and rainfall also is partially associated with test weight. Even though the spatial lag term was not significant

相关主题