The Amish have largely remained an enigma to social science researchers, due to a lack of large-scale data. By coding data from directories of Amish in Holmes County, Ohio, and the surrounding areas (which contain information on roughly one in every six Amish in the world), this project provides a new source of data that allows people to explore demographics, occupational shifts, and retention among a significant proportion of the Old Order Amish.

The investigator for this project, Benjamin McKune, was a graduate student at the Pennsylvania State University and a Research Associate at the ARDA. In March 2014, he tragically passed away before he could finish his Ph.D. This dataset contains the data that he collected for his dissertation.

Data File
Cases: 48,710
Variables: 675
Weight Variable: None
Data Collection
Date Collected: 1965, 1973, 1981, 1988, 1996, 2000, 2005, and 2010
Collection Procedures
The data come from directories of the Holmes County Amish (and surrounding areas), volumes of which contain detailed records of births, marriages, and occupations of heads of households, and which are published and sold commercially in the form of county directories.

This project focuses on the Holmes County area of Ohio for several reasons. First, Amish directories of this area contain alphabetical listings of each household and the occupation of each household head, a fact which permits comparisons of occupational types within and between communities. The directories are an especially useful source of occupational data, as they contain the occupations of all live male heads, with less than one percent of the listings containing incomplete information. Second, the directories have been published regularly about every eight years since 1965, thus permitting an examination of occupational shifts for individual Amish men over time, as well as a study of the changing occupational structure of the Holmes County Amish population as a whole. Third, the directories contain data on approximately 85% of all Amish adherents in the Holmes County area, which is home to the largest Amish settlement of all, containing roughly one in every six of the Amish in the world. Fourth, the directoriesí listings of birth dates, marriage dates, parents, and children in each Amish household allow for a multigenerational analysis of occupational shifts.

The investigator used text-recognition software to convert scans of the directories into text. The text-recognition software is only 98% accurate at most, so one in 50 characters on average would be wrong. After the data were checked for errors and fixed, a PHP script was used to convert the raw data into syntax that could be read into a Personal Ancestral File, a genealogical software package. The program recognized duplicates based on: 1) first name, 2) last name, and 3) exact date of birth. Since parents and their children were listed systematically in the directories, it did not take long to match duplicated individuals and create a database that tracked the relationships of the people in the directories.

The database was then exported into Microsoft Excel. Another PHP script was written that converted the addresses into Google Maps queries. The investigator was then able to obtain the latitude and longitude for all Amish households listed in the directory. This geographical information was then transferred to ArcGIS to compute additional variables.
Principal Investigators
Benjamin McKune
Identifying Information
The original dataset contained information that people could use to identify specific Amish individuals and households, like first name, last name, latitude, and longitude. Due to confidentiality concerns, the ARDA has excluded these variables from the dataset.
Outlying Values
Unfortunately, Benjamin McKune did not have the opportunity to clean all of the data before he passed away. The ARDA has recoded some values that were extreme outliers as missing. Scholars should take care when using this dataset to identify other outliers and potentially influential cases.
Church Codes for Children
The final set of variables in this dataset concerns the church codes for Amish individuals' children. The original dataset had variables for every directory year and for up to 20 children. Church codes were only recorded, however, for the first child (every directory year) and for the second child (every directory year except 2005). Those variables that did not contain any church codes have been removed from this dataset.

