Explain the batch means for interval estimation in steady state simulation.

21views

written 5.0 years ago by

• modified 5.0 years ago

Batch means for Interval Estimation in steady state simulation

One problem of replication method is that one has to delete some initial data. i.e. 'd' observations from some of the 'R' replications. $\therefore$ We throw away a total number of 'dR' observations from all the observations. This is information loss.

The issue can be overcome by using an experimental design that is based on single long replication but computation of std of the sample mean, as data is dependent, so our estimator is biased.

Using "BATCH MEANS" method the above problem can be solved.

Batch means method

Divide the output data from one replication (after appropriate deletion) into a few large batches.
Process the means of these batches as if they are independent.
Calculate the batch means $(\overline{y_j})$ based on the form of the raw output data

(a) Continuous time process:
- If no. of batches are 'k', then batch size $m = \frac{t_{SE}}{k}$
- Batch mean $\overline{y_j} = \frac{1}{m} \int_{(j-1)m}^{jm} y (t + T_0) dt \quad For \ j = 1,2,...,k$
- The $j^{th}$ batch means the time weighted average of the process over the time interval $[T_0 + (j-1)m, T_0 + j_m]$
(b) Discrete time process:
- If the number of batches are 'k' then batch size $m=\frac{(n-d)}{k}$
- Batch mean = $\overline{y_j} = \frac{1}{m} \quad \sum_{i = (j-1) m+1}^{jm} \quad y_{i+d} \quad for \ j = 1,2,..k$
- Different batch means those we are getting using above equation are given below

enter image description here

Now estimate the variance of the sample mean:

$\frac{s^{2}}{k}=\frac{1}{k} \sum_{j=1}^{k} \frac{\left(\overline{y}_{j}-\overline{y}\right)^{2}}{k-1}$

$=\frac{1}{k(k-1)} \sum_{j=1}^{k} \overline{y}_{j}^{2}-k \overline{y}^{2}$

Where $\overline{y}$ is the overall sample mean of data after deletion.

Guidelines for choosing batch size (m) of no. of batches (k)

For fixed sample size, no. of batches $10 \lt k \le 30$ should be.
The lag - 1 autocorrelation $\rho_{1}=\cos \left(4,4_{j+1}\right)$ is used to examine the dependence between batch.
Total sample size is chosen sequentially, then run length increases as the batch size and number of batch increases.

Current Strategy

Collect o/p data from single replication and delete appropriate amount of data (at least 10d) from it.
Calculate the batch means for $(100 \le k \le 400)$ batches and estimate the sample lag - 1 autocorrelation of the batch means

$$ \bar{\rho_1} = \frac{ \sum_{j-1}^{k-1}(\overline{y_j} - \overline{y})(\overline{y}_{j+1} - \overline{y})}{\sum_{j=1}^k (\overline{y_j} - \overline{y})^2}$$

Check if the correlation is sufficiently small

if $\overline{\rho_1} \le 0.2$, then
- Rebatch data into $30 \le k \le 40$ batches
- Form a confidence interval using $k-1$ degrees of freedom for the t-distribution and estimate the variance of the sample mean $\overline{y}$
if $\overline{\rho_1} \le 0.2$, then
- Increase replication from 50% to 100%
- Go to step 2
Check again the confidence interval by examining the batch means for independence test by calculating test statistics 'c' where

$$c=\sqrt{\frac{k^{2}-1}{k-2}}\left(\overline{\rho_{1}}+\frac{\left(\overline{y}_{1}-\overline{y}\right)^{2}+\left(\overline{y}_{k}-\overline{y}\right)^{2}}{2 \sum_{j=1}^{k}\left(\overline{y}_{j}-\overline{y}\right)^{2}}\right]$$
- If $c \lt z_\alpha$ then accept the independence of the batch means else extend the replication by 50% to 100% and go to step 2
- If it is difficult to extend replication, then rebatch the data into k = 10 batches
- Form a confidence interval using k - 1 degrees of freedom for the t-distribution and estimate the variance of the sample mean $\overline{y}$

ADD COMMENT EDIT