Abstract
“Big data” is rarely ready for analysis when it arrives on your desktop. The issues are familiar, yet they aren’t often what comes to mind in discussions of the “Sexiest Job of the 21st Century.” These are tasks that consume a massive fraction of project time, yet receive comparably little attention in textbooks, professional journals or blogs. The dirty work includes activities such as assembling a data set from disparate sources, exploring data for various forms of messiness, imputing and otherwise addressing missing data, identifying and dealing with outliers, recoding observations to regularize inconsistencies and reduce dimensionality. Fortunately for JMP users, the latest versions of JMP provide intuitive and highly visual tools to perform diagnostics and automate some of the nastier janitorial chores in the analytics workflow. This presentation demonstrates some of the ways that the Query Builder, other Tables menu platforms and Column utilities can expedite the dirty work with some large, publicly available data sources.