Appendix 1:Description of the Survey


I. Sample Design

The sample design of the survey has changed over time, but it has always been representative of the U.S general population age 12 and older and has always oversampled youths and young adults. The 1993 NHSDA employed a multistage area probability sample of 26,489 persons. The first stage of selection is a sample of 117 Primary Sampling Units (PSUs), each consisting of counties (administrative subdivisions of States) or groups of counties such as metropolitan areas. Within these PSUs, segments (such as city blocks or enumeration districts) are selected. In 1993, 3,124 segments were selected, and in each of these segments a listing of all addresses was made, from which a sample of 100,332 addresses was selected. Of these, 89,962 were determined to be eligible sample units. In these sample units (which can be either households or units within group quarters), sample persons were randomly selected (with unequal probabilities) using a screening procedure carried out by interviewers.

The 1993 NHSDA sampled segments were allocated equally into four separate samples, one for each three month period during the year, so that the survey is essentially continuously in the field. By assigning the appropriate selection probabilities at the PSU, segment, and person levels, oversampling of certain subpopulations of interest is accomplished. In 1993, these subpopulations were young people (age 12-34), African-Americans, Hispanics, and six large metropolitan areas. The six metropolitan areas were New York, Washington, D.C., Miami, Chicago, Denver, and Los Angeles. Persons age 18-34 identified as current cigarette smokers by the household screening respondents were also oversampled.

Although they are not oversampled, the survey does include persons living in noninstitutional group quarters when these units fall into the sample. This primarily consists of students living in dormitories, but also includes some homeless persons who are living in shelters at the time that the shelter addresses are selected.

II. Data Collection Methodology

The data collection method used in the NHSDA is to conduct in-person interviews with sample persons, incorporating procedures that would be likely to maximize respondents' cooperation and willingness to report honestly about their illicit drug use behavior. Introductory letters are sent to sampled addresses, followed by an interviewer visit. A five-minute screening procedure involves listing all household members along with their basic demographic data and possible selection of sample person(s). This selection process is designed to provide the necessary sample sizes for specified population groups by selecting either 0, 1, or 2 persons per household, depending on the composition of the household.

Interviewers attempt to conduct interviews in a private place, away from other household members. The interview averages about an hour, and includes a combination of interviewer-administered and self-administered questions. With this procedure, the answers to sensitive questions (such as those on illicit drug use) are recorded by the respondent and not seen or reviewed by the interviewer. After these answer sheets are completed, they are placed by the respondent in an envelope, which is sealed and mailed to the contractor, Research Triangle Institute, with no personal identifying information attached.

III. Data Processing

Upon receipt, questionnaires are checked for critical identification and demographic data, then keyed to disk. This creates a file consisting of one record for each completed interview. Extensive within-record consistency checks and resolution of most inconsistencies and missing data are done using machine editing routines, called logical imputation. For some key variables that still have missing values after the application of logical imputation, statistical imputation is used to replace the missing data with appropriate valid response codes. Two types of statistical imputation procedures are used. Hot-deck imputation involves the replacement of a missing value with a valid code taken from another respondent who is "similar" and has complete data. Logistic regression models are also used to determine replacement values for some variables.

Each record (i.e., respondent) is assigned an analysis weight which incorporates:

Data are generally released to the public about six months after the end of data collection. Public use data files are available 1-2 years after completion of data collection.

IV. Preliminary Versus Final Estimates

Estimates presented in this report are considered preliminary because they are based on the initial weighting, editing, and imputation procedures implemented immediately after data collection was completed (December 1993).Further analyses of the 1993 NHSDA data and evaluation of the estimation procedures is ongoing, and may result in revisions in later data releases. However, if no such revisions are deemed necessary, final estimates will be the same as the preliminary estimates presented in this report. Final estimates will be published in Population Estimates, which will be available later this year and in Main Findings, which will be published in 1995. SAMHSA will also release additional analyses from the 1993 NHSDA through additional Advance Reports and other published reports.


Return to INDEX