I just started reading up on random number tables, but Im not really sure how to read the above table/interpret the data. How would I go about to draw a random sample of size $n =10$ without replacement from a numbered population of size $N=100$ ? And also a random sample of size $n=15$ without replacement from a numbered population of size $N=123$ ?
asked Jun 26, 2012 at 12:09 41 1 1 gold badge 1 1 silver badge 2 2 bronze badges$\begingroup$ Is this a homework exercise or are you actually trying to generate random numbers in this way? If it is homework, please add the homework tag; if it's not homework my advice would be to put away the table and use a computer $\endgroup$
Commented Jun 26, 2012 at 12:18A random number table is designed to create uniformly distributed values; this use is straightforward. The somewhat tricky part to do correctly and efficiently is to sample without replacement. I will describe this because it is a useful algorithm for any statistician to know: random permutations are very important (for resampling and bootstrapping, for instance) and, because they may be generated a huge number of times, efficiently generating them can be essential.
Usually, one designates a starting entry in the table and a rule for selecting sequences of digits. For instance, for sampling from $100$ objects, first number them from $0$ through $99$ in any order you like. Without referring to the table, you might stipulate that you will start in row 3, column 1 and pick every other digit, grouping them in pairs. This determines the sequence
89, 10, 58, 44, .
Via your numbering system you will interpret this sequence as elements of the population.
When $N$ is not a power of $10$ you can proceed in several ways. Perhaps the most efficient is to partition the random digits into longish groups, interpret them as values in the interval $[0,1)$ by placing an implicit decimal point in front of them, multiplying those by $N$, and rounding down. For example, with $N=123$ items numbered from $0$ through $122$ and again starting in row 3 column 1, this time grouping the digits in sixes, you would produce
123 * .859414 = 105, 123 * .075682 = 9, 123 * .414020 = 50, 123 * .156114 = 19, .
This can be performed in the field without any tools at all (apart from pencil and paper if you need those for the multiplications).
The most straightforward way to accomplish this is simply to remove duplicates as they are encountered in the sequence of indexes generated above. Two tricks are frequently employed when doing this manually (in the field) or by computer with very large populations.
The first one is that when sampling more than half the population, instead identify the elements that are not needed (without replacement), then just keep the rest for the sample.
There is an elegant algorithm to avoid searching for duplicates. It generates a random permutation (an ordered subset of the population of specified size); the ordering is useful in its own right. You begin by writing the identifiers of the population in any sequence you like. As an example, if we were to sample from a population of $N=10$ of individuals named , we would begin with, say, this array preceded by a set of the elements put into the sample so far (none of them):
<>, a b c d e f g h i j
Generate a sequence of random numbers as before. For this example let's use the previous sequence .859414, .075682, .414020, . . Because the array currently has $10$ elements in it, we use the first random number from the table to compute the random index 10 * 0.859414 = i. Interpret this as an index*into the array, remembering the array indexes must start at $0$. Swap the marked element of the array with the element at this index, keep the new element at the first index for your sample, and then drop the first element from the array altogether, leaving this:
(i), b c d e f g h a j
Repeat, bearing in mind the array now has only $N-1$ = $9$ elements. So we use the second random value from the table to compute 9 * 0.075682 = 0. That happens to designate the first element of the remainder of the array, which is swapped with itself and also put into the growing sample:
(i, b), c d e f g h a j
Another repetition produces the index 8 * .414020 = 3, identifying the value f in the array, which is swapped with the initial c:
(i, b, f), d e c g h a j
You can see that the only actual changes made to the array are the swaps: the random permutation--here (i, b, f)--automatically appears in the first $k$ entries of the array after $k$ swaps are performed in this manner. In this fashion you can generate a random $k$-permutation by means of just $k$ (uniformly distributed) random numbers and $k$ swaps: a highly efficient procedure. Moreover, in cases where the sample size is not fixed in advance, this procedure can be iterated with the remaining non-sampled elements.
BTW, to prove that all permutations are equally likely to occur in this algorithm, note that all elements of the population have equal chances of being the first chosen. The proof is finished by induction (because this is a recursive algorithm).