Hypothesis

For now, suppose we are only interested in national values. Taking the latest 5 years of data, BEAD would typically calculate the estimate of the ratio as Z˜=X¯¯¯¯Y¯¯¯¯=∑t=1TXt∑t=1TYtZ~=X¯Y¯=∑t=1TXt∑t=1TYt \widetilde{Z} = \dfrac{\overline{X}}{\overline{Y}} = \dfrac{\sum\limits_{t=1}^{T}X_t}{\sum\limits_{t=1}^{T}Y_t} where Z˜Z~\widetilde{Z} is the ratio of sample averages of variablse XXX and YYY. But what is the justification for this sample statistic? Technical Definition of BEAD’s Target BEAD needs a value that would represent a best guess for future values. Which would you say is the more appropriate estimand? The ratio of population averages νz=E[X]E[Y]νz=E[X]E[Y] \nu_z = \dfrac{\mathbb{E}[X]}{\mathbb{E}[Y]} oooor the population average ratio μz=E[XY]μz=E[XY] \mu_z = \mathbb{E}\Bigg[\dfrac{X}{Y}\Bigg] Note: expectations are taken over time - hence time series issues are central to our context. As a start, we assume stationarity (pop. mean is finite and independent of time) and ergodicity (time average equals ensemble average). Comparison of Estimands Assume XXX and YYY are jointly distributed F(X,Y)F(X,Y)F(X,Y) with finite moments (mean & variance). When are these two estimands νzνz\nu_z and μzμz\mu_z the same? When are they different? Lets evaluate μz−νz=E[Z]−E[X]E[Y]=E[Z]E[Y]E[Y]−E[ZY]E[Y]=−Cov(Z,Y)E[Y](since Cov(Z,Y)=E[ZY]−E[Z]E[Y])μz−νz=E[Z]−E[X]E[Y]=E[Z]E[Y]E[Y]−E[ZY]E[Y]=−Cov(Z,Y)E[Y](since Cov(Z,Y)=E[ZY]−E[Z]E[Y]) \begin{align*} \mu_z - \nu_z & = \mathbb{E}[Z]-\dfrac{\mathbb{E}[X]}{\mathbb{E}[Y]} \\ & = \dfrac{\mathbb{E}[Z]\mathbb{E}[Y]}{\mathbb{E}[Y]}-\dfrac{\mathbb{E}[ZY]}{\mathbb{E}[Y]} \\ & = -\dfrac{Cov(Z,Y)}{\mathbb{E}[Y]} \quad \qquad \text{(since } Cov(Z,Y)=\mathbb{E}[ZY]-\mathbb{E}[Z]\mathbb{E}[Y]\text{)} \end{align*} The Inequality Rule So the inequality rule is IfCov(Z,Y)>0IfCov(Z,Y)<0IfCov(Z,Y)=0thenμz<νzthenμz>νzthenμz=νzIfCov(Z,Y)>0thenμz<νzIfCov(Z,Y)<0thenμz>νzIfCov(Z,Y)=0thenμz=νz \begin{align} \text{If} \quad Cov(Z,Y) > 0 \quad &\text{then} \quad \mu_z < \nu_z\\ \text{If} \quad Cov(Z,Y) < 0 \quad &\text{then} \quad \mu_z > \nu_z\\ \text{If} \quad Cov(Z,Y) = 0 \quad &\text{then} \quad \mu_z = \nu_z \end{align} So if the ratio of variables ZZZ and the denominator variable YYY are mean independent, then the two estimands are the same. Do we have enough structure to expect covariance to be positive, negative, or zero? Ratio ZZZ as a Share If ZZZ is a “share” variable, then by definition we must have Z∈[0,1]Z∈[0,1]Z \in [0,1]. Said differently, we have the structure that X≤YX≤YX\leq Y which suggests (not gaurantee) that Cov(X,Y)≥0Cov(X,Y)≥0Cov(X,Y)\geq 0. If Cov(X,Y)≥0Cov(X,Y)≥0Cov(X,Y)\geq 0 then Cov(X,1Y)≤0Cov(X,1Y)≤0Cov(X,\frac{1}{Y})\leq 0. Result: this actually buys us the fact that Cov(Z,Y)≥0Cov(Z,Y)≥0Cov(Z,Y)\geq 0 and hence μz≤νzμz≤νz\mu_z \leq \nu_z. Proof: Cov(Z,Y)Cov(Z,Y)E[Y]=E[ZY]−E[Z]E[Y](defintion of covariance)=E[X]−E[X1Y]E[Y]=E[X]−E[Y](Cov(X,1Y)+E[X]E[1Y])=E[X]E[Y]−Cov(X,1Y)−E[X]E[1Y]=E[X](E[Y]−E[1Y])−Cov(X,1Y)Cov(Z,Y)=E[ZY]−E[Z]E[Y](defintion of covariance)=E[X]−E[X1Y]E[Y]=E[X]−E[Y](Cov(X,1Y)+E[X]E[1Y])Cov(Z,Y)E[Y]=E[X]E[Y]−Cov(X,1Y)−E[X]E[1Y]=E[X](E[Y]−E[1Y])−Cov(X,1Y) \begin{align*} Cov(Z,Y) & = \mathbb{E}[ZY]-\mathbb{E}[Z]\mathbb{E}[Y] \qquad \qquad \text{(defintion of covariance)}\\[.15cm] & = \mathbb{E}[X]-\mathbb{E}\Big[X\frac{1}{Y}\Big]\mathbb{E}[Y]\\[.15cm] & = \mathbb{E}[X]-\mathbb{E}[Y]\Big( Cov\Big(X,\frac{1}{Y}\Big)+\mathbb{E}[X]\mathbb{E}\Big[\frac{1}{Y}\Big] \Big)\\[.15cm] \dfrac{Cov(Z,Y)}{\mathbb{E}[Y]} & = \mathbb{E}[X]\mathbb{E}[Y]- Cov\Big(X,\frac{1}{Y}\Big)-\mathbb{E}[X]\mathbb{E}\Big[\frac{1}{Y}\Big] \\[.15cm] & = \mathbb{E}[X]\Big(\mathbb{E}[Y]-\mathbb{E}\Big[\frac{1}{Y} \Big] \Big)-Cov\Big(X,\frac{1}{Y}\Big) \end{align*} Hence Cov(Z,Y)≥0■Cov(Z,Y)≥0◼Cov(Z,Y) \geq 0 \quad \blacksquare Check Sample Covariance PCT & CAG Likely want to err on the high side. But the high side depends on covariances so lets check that for the PCT context. Below, I plot the sample covariances using Kynetec data at the state, pesticide type, AI, crop level between 1998 and 2023 from PctCropTreated_State_straight_NoSeedTreatPCT_2023.xlsx . Note: outliers are removed ad-hoc by limiting range of values. The data suggests that there is no real difference between the ratio of (pop) averages and average of ratios. For PCT, which ever is higher is due to random noise. Recommendation: calculate both and choose higher one. Conclusion Strong foundation for choosing one estimand over another. Estimand preference = estimator preference unless we have non-random sampling errors Future work: consider non-random errors and evaluate sample covariances for market share context.

test test test

Annotators

URL

Annotators

Annotators

URL