Abstract
Statistics education reformers have for years called for the use of real data in teaching introductory statistics (Ballman, 1997; Garfield et al., 2004; Hogg, 1991). Instructors now have ready access to cases, textbook problems and other exercises with accompanying well-documented sets of real or realistic data. On-line portals and data libraries provide a huge array of real data sets keyed variously to substantive topics and statistical techniques suitable for introductory students. The vast majority of these real datasets tend to have already been cleaned up by their preparers. As enriching as these resources are, relatively few of them offer students first-hand experi ence with the essential messiness of “real” real data. There is a good case to be made that data cleaning and preparation belong in introductory courses (Burger & Leopold, 2001). Certainly, problems of missing, dirty, and incomplete data are important topics within the field (Hoyle, 1971; Rubin, 1976; Wagner, 2002). Using field data from the Wright Brothers’ 1904 experiments, this case leads introductory or intermediate students through a process of data preparation, illustrating five common steps in data preparation and cleaning: standardizing the format of data records, deciding how to treat ambiguously recorded data, conversion of measurements to a single standard unit, detecting and resolving issues with outliers, and imputation of missing data.