National Study of Youth and Religion, Wave 1 (2003)

Data Archive > U.S. Surveys > General Population > National > National Studies of Youth and Religion > Summary

The National Survey of Youth and Religion (NSYR) is a nationally representative telephone survey of 3,290 English and Spanish-speaking teenagers between the ages of 13 and 17, and their parents. The NSYR also includes 80 oversampled Jewish households, not nationally representative, bringing the total number of completed NSYR cases to 3,370. The purpose of the NSYR is to research the shape and influence of religion and spirituality in the lives of American youth; to identify effective practices in the religious, moral, and social formation of the lives of youth; to describe the extent and perceived effectiveness of the programs and opportunities that religious communities are offering to their youth; and to foster an informed national discussion about the influence of religion in youth's lives, in order to encourage sustained reflection about and rethinking of our cultural and institutional practices with regard to youth and religion.

Data File
Cases: 3,370
Variables: 915
Wave 1 Weights
The NSYR1_806 and later versions of Wave 1 data include three weight variables “rweight1”, “rweight2”, and “nweight2”. Rweight1 is a raw weight and adjusts for differential probabilities of selection into the sample. For the nationally representative sample, rweight1 was constructed using the number of phone numbers for the household and the number of teens in the household between ages 13 and 17 to determine the probability that a specific teen would be selected for the sample. For the Jewish oversample, rweight1 is constructed using only the number of teens in the household between the ages of 13 and 17. Since the oversample was drawn from a list of phone numbers, multiple phone numbers in the household was not assumed to increase the probability of selection. It is important to be aware that while rweight1 is a weight for probability of selection for all cases in the data, it was constructed differently for the oversample.

Rweight2 is also a raw weight and is only available for the nationally representative portion of the sample. Rweight2 adjusts for differential probabilities of selection, like rweight1, but also incorporates adjustments for census region and household income to correct for sampling bias related to these variables. When unweighted NSYR data are compared to census data, there are slight differences in the distribution across census regions and household income brackets. Therefore, rweight2 incorporates a post-stratification adjustment to CPS totals for census region (West, Midwest, South, Northeast) and income (defined as <$20K, $20K-$40K, $40K-$60K, $60K-$100K, >$100K). (Note: There were no missing data on Census region. However, approximately 6 percent of NSYR observations were missing on income. Consequently, income was imputed using the following variables in this order of assigned importance: resident father figure's education, resident mother figure's education, marital status, homeownership status, and race.)

Nweight2 is a normalized version of rweight2. Normalization is the process of dividing weights to the point that weighted data have the same N as the unweighted sample. Raw weights weight the sample up to the size of the population. We recommend the use of raw weights when using software developed for analysis of survey data, e.g., Stata or SAS, especially when using commands designed for survey analysis such as "svymean" or "svyregress" in Stata. The only exception to this is when software documentation specifically requests that users normalize the weights before estimation. It is the data user’s responsibility to determine whether raw or normalized weights should be used in an analysis.The weights provided for Wave 1 data are weights for U.S. teenagers, not for parents of teenagers, which would require a different, and as of yet unconstructed, weight.
Data Collection
Date Collected: July 2002 to April 2003
Funded By
The Lilly Endowment, Inc.
Collection Procedures
The survey was conducted from July, 2002 to April, 2003 by researchers at the University of North Carolina at Chapel Hill using a random-digit-dial (RDD) method, employing a sample of randomly generated telephone numbers representative of all household telephones in the 50 United States, including Alaska and Hawaii. The national survey sample was arranged in replicates based on the proportion of working household telephone exchanges nationwide. This random-digit-dial method ensures equal representation of listed, unlisted, and not-yet-listed household telephone numbers. Eligible households included at least one teenager between the ages of 13-17 living in the household for at least six months of the year. In order to randomize responses within households, and so to help attain representativeness of age and gender, interviewers asked to conduct the survey with the teenager in the household who had the most recent birthday. Parent interviews were conducted with either a mother or father, as they were available; although the survey asked to speak with mothers first, believing that they may be better qualified to answer questions about their families and teenagers. Step-parents, resident grandparents, resident partners of parents, and other resident parent-like figures were also eligible to complete the parent portion of the survey.

For more information, see
Sampling Procedures
An RDD telephone survey sampling method was chosen for this study because of the advantages it offers compared to alternative survey sampling methods. Unlike school-based sampling, for example, our RDD telephone method was able to survey not only school-attending youth, but also school dropouts, home-schooled youth, and students frequently absent from school. Using RDD, we were also able to ask numerous religion questions which many school principals and school boards often disallow on surveys administered in school.

For more information, see
Principal Investigators
Dr. Christian Smith
Department of Sociology
University of Notre Dame

Dr. Lisa Pearce
Department of Sociology
University of North Carolina, Chapel Hill
Related Publications
Smith, Christian and Melinda Lundquist Denton. 2003. “Methodological Design and Procedures for the National Survey of Youth and Religion (NSYR).” Chapel Hill, NC: The National Study of Youth and Religion.

Smith, Christian and Melinda Lundquist Denton. 2005. Soul Searching: The Religious and Spiritual Lives of American Teenagers. Oxford: Oxford University Press.

See for a list of publications.
All publications using NSYR data must contain the following acknowledgement:
“The National Study of Youth and Religion,, whose data were used by permission here, was generously funded by Lilly Endowment Inc., under the direction of Christian Smith, of the Department of Sociology at the University of Notre Dame and Lisa Pearce, of the Department of Sociology at the University of North Carolina at Chapel Hill.”
Missing Data
With the exception of a few created variables, the standard “.” indicator of missing data has not been used. In the actual dataset (but not the codebook survey instrument), for all variables, DON’T KNOW=777, REFUSED=888, and NOT ASKED=999. The 999–NOT ASKED response indicates a valid skip of the question. In other words, a respondent does not have a response for that question because they were not asked the question as a result of the intended skip patterns in the survey. In a very few cases, there is a value of 666, which indicates an INVALID SKIP. These are cases where a respondent was incorrectly skipped out of a question due to a computer or programmer’s error. The use of these codes instead of traditional missing data indicators means that analysts must be very careful to be aware of these cases in their analyses. Stata will not recognize 777, 888 or 999 as missing data. Therefore, unless you tell it otherwise, the stats package will include 777, 888, and 999 as actual values in your analysis. Always pay attention to the value of skip code indicators.
Religion Variables
The religion questions in the NSYR survey are complex. We have worked hard to create interpretable religion variables to be used in analysis. All of the original variables have been left in the dataset. However, many of these are incomplete because they were asked of only a subset of the respondents or because they do not include open-ended verbatim responses. For consistency across analyses, we ask that all analysts use the standard integrated religion variables created by NSYR as the starting point for their analysis. These variables include the following:

W1 Parent Integrated Religion Variables
(and many created dummy variables [prefix=“bnp”])

W1 Teen Integrated Religion Variables
(and many created dummy variables [prefix=“bnt”])

Reltrad is the variable categorizing teens into major religious types (similar to preltrad and the RELTRAD method in Steensland et al. 2001). Reltrad was created based on the type of religious congregation that the teen said they attend. This was not as straightforward as it was for parents, however, because teens were not asked their exact denomination. If the church type they provided was not sufficient to place the teen into a reltrad category, additional variables from both the parent and the teen were used to make a determination. Those variables included: attend, attoth2, churtype, othchat, othchur, pattend, pdenom, prace, preltrad, prlspatt, prlspaty, prlspchu, prlspchu, prltnaty, prltnch1, prltnch2, prltnch3, psex, psprelig, relig1, relraise, reltrad, and teenrace.

When there was not enough information from the above variables to make a conclusive decision, the teen was categorized as “indeterminate”.

Reltrad is the variable that will be used for most analyses. Note that teens who say they are "not religious" or who never attend services still may be categorized into one of the religious categories if that is the tradition they named or if they attend occasionally with their parents there.

Relcats is identical to reltrad with the exception that all of the (1) never-attenders and (2) self-identified "not religious" teens were moved into "Not Religious." This boosts the relative religiosity of the remaining religious types. For example, if the question of interest is the effect of being Evangelical Protestant compared to Mainline Protestant on smoking, relcats removes those who do not think of themselves as Evangelical Protestant (by saying they are "NR") and who never attend religious services.

Therefore, which of these variables is best to use depends significantly on the analysis. Using the wrong variable could produce results driven not by empirical reality but the construction of the measure. The default is reltrad, but certain analyses might call for relcats instead. Feel free to inquire with NSYR if you have questions.
Jewish Oversample
In addition to the original Wave 1 national sample of 3,290 cases, the NSYR also conducted surveys with a modest oversample of Jewish households (80 Jewish oversample completes in all) in order to obtain a large enough number of cases with which to conduct meaningful statistical analyses of Jewish youth. For a complete description of the oversample see the “Methodological Design and Procedures for the National Survey of Youth and Religion (NSYR)” (Smith and Denton, 2003). This oversample is NOT nationally representative and is meant to be used primarily to bolster Jewish-specific analyses. Therefore, we generally recommend that the oversample be excluded from most analyses of the general sample.

Researchers using simple descriptive statistics to make claims about the characteristics (but not the size) of the Jewish population may include the Jewish oversample cases in analyses but should use the simple probability of selection weight (rweight1) to correct for number of teenagers in the household. Researchers using multivariate statistics with religious category variables who feel comfortable using the Jewish oversample data do not, in our view, need to use the national weight (rweight2) if they control for region and family income, which these weights adjust for, but they should use the simple selection probability weight (rweight1) and should include the Jewish oversample dummy variable in their models to statistically remove any possible unidentified effect of sampling bias inherent in those cases net of the other independent variables the models control for, or, again, they may simply exclude the Jewish oversample cases from analysis and work with a significantly lower number of Jewish cases.
School Variables
The fielding of the survey began in July of 2002 and continued until April 2003. During the summer months of the survey, parents and teens were asked to answer school questions in reference to the school they attended during the previous school year. On September 20, 2002, the wording of the survey instrument was changed to ask parents and teens to answer questions about the current school year. Because parents and teens did not necessarily complete their surveys at the same time, there are some cases where the parent completed prior to the wording change and the teen completed after the wording change. The data contain a flag for all parent surveys completed prior to the wording change on 9/20/02 (“p_flag”) and a flag for all teen surveys completed prior to wording change (“t_flag”). We have created a variable, “pschdate,” that indicates whether both parent and teen answered questions about the last school year, parent answered about last school year and teen answered about the current year, or both answered about the current school year.

The parent survey included a question that asked about the teen’s grade in school, “pschgrad”. Note that parents who completed the survey prior to 9/20/02 reported the teen’s grade in school during the 2001-2002 school year, while those who completed after 9/20/02 are reporting on the teen’s grade during the 2002-2003 school year. Again, in some cases, teens may be answering school questions about a different school year than their parents reported about. We have created a new, consistent variable, “pschgra2,” that indicates the grade in school for which the teen answered the school questions. In cases where the parent and teen answered questions about different school years, we added one year to the parent’s answer about their teen’s grade in school.