Trends, developments and what the past year of sports and politics taught us about variability and statistical predictions.
By James J. Swain
“It is difficult to make predictions, especially about the future.”
– Danish saying, variously attributed to Niels Bohr or Yogi Berra
We were repeatedly reminded several times last year that variability can confound statistical predictions and unlikely events do occur. Upsets in sports and politics are always news, since having the underdog beat the “sure thing” is surprising and noteworthy. What is exciting in sports is unexpected in politics, since we expect our predictions to do better when the business is serious. We certainly don’t expect to see another “Dewey Wins!” headline, but both the Brexit vote and Trump’s election clearly confounded consensus predictions. In the latter case, the actual margins in several key states were very small – but in politics as in sports a win is a win.
It was also noteworthy that while data-savvy campaign teams seemed to be the story in the previous election cycle, Trump’s campaign seemed to demonstrate that they weren’t essential. The savvy predictions may have been correct, yet an 80 percent chance of winning is not a certainty, and the less likely outcome is still possible.
Upsets in statistical prediction was not the only big story in statistics this year. The inability of researchers to replicate published experiments in several fields, such as psychology, have called published experimental results into question. It has also led to revisions in thinking about the old standby, the p-value. For instance, in one study of 100 articles in top psychology journals, only about 36 percent of the significant results were successfully replicated. Last May the American Statistical Association issued a statement condemning the use of any single measure, such as p-values, as a substitute for scientific reasoning. One journal, Basic and Applied Social Psychology, has eliminated their use altogether.
Problems with an over reliance on the p-value have been known for years. In traditional hypothesis testing, the p-value is the probability of observing a statistic of the value (or larger) than the observed statistics under the null hypothesis. The null hypothesis is rejected when the p-value is sufficiently small, under the assumption that the alternative is the more likely explanation. Of course, in any large number of experiments, a “significant” result (i.e., one with a low p-value) is increasingly likely to occur, as quantified by the Bonferroni inequality. That is why running many experiments and reporting only the “significant” ones distorts the actual p-value.
One way to deal with the uncertainty with what p-value means is through experimental replication, which can either confirm the noteworthy result or fail to do so. In the latter case, the lack of significant result in the replication suggests that the first was simply a “false positive.” Since journals generally prefer novel results to replication of existing results, there is little incentive for independent replication.
Software for Statistics
The goal in any statistical investigation is to bring some insight forth from the data, whether confirmation of a research hypothesis, or the reassurance that some process is still ticking along at the proper precision and regularity, or in building a usable model. To obtain these useful results, software must be able to perform a variety of functions including data acquisition and editing, presentation of results or relations among variables, transformations as needed, and computations to support the analysis.
Computers were once human, as the recent hit film “Hidden Figures” illustrates. At Langley, the best computers were prized for their insight into the underlying analysis and physical processes as well as computations [1]. The best modern software should provide the same assistance, both the computations that we choose, as well as further tools to enable further analysis that are suggested by analysis. The investigation is usually iterative, using one result to suggest alternative approaches and further experiments.
Software will also include the ability to compute critical values from the reference sampling distributions such as the normal, t and F, from which p-values (for instance) can be computed. In fact, many of our critical mathematical and statistical tables were first computed by human computers in the early part of last century. This is noted in another book about human computers, “When Computers were Human” [2].
Software offers more than simply computations. Exploratory analysis was in part designed to generate quick pictures of the data that could be assembled quickly and by hand – dot plots, stem-and-leaf and the box plot, for instance, minimizing complexity of computation for insight. Increasingly, multiple plots are provided in arrays or at the margins of other plots. For instance, box plots or histograms display the marginal distributions while the central plot provides the scatter plot. In multivariate investigations, a two-dimensional array of two-dimensional scatter plots helps the analyst visualize higher dimensional relationships. The best software provides the interactive ability to manipulate plots interactively to identify points or sets of points that are noteworthy (e.g., outliers) or to transform the variables within a graph. This is a particular strength of the JMP software.
Software provides a greatly enhanced range of graphical displays. Graphics are an excellent way to visualize data – to see distributions and commonalities across variables or in location. Data can also be summarized geographically. A recent popular interest article in The New York Times is representative of the possibilities. In the 2016 presidential election results, voting for Donald Trump was more highly correlated with certain popular television shows than with presidential voting in the last election. The cultural divide remarked upon during the election was paralleled with selections from among 50 television shows across the counties of the United States and then correlated to election results. The correlation is more easily understood graphically than numerically [3].
Finally, good statistical software can assist in the design of experiments. A good analysis, often in the context of the old PDCA cycle of “plan, do, check and act” begins with a question and a plan for the collection of experimental data. Software can be used to assist in sample size computations through power analysis, or provide specialized designs for a range of designs in one or more variables.
Modern software has the additional advantage that it opens analysis to a wider circle of individuals who would not be able to perform the analyses themselves. Since computations are less of a requirement, introductions to statistics are available to a wide array of individuals. The American Statistical Association sponsors teacher clinics for classes and poster competitions at the K-12 level, and AP statistics courses are growing quickly as well.
Software Survey Products
This survey of products is an update of the survey published in 2015. The biennial statistical software products survey in this issue provides capsule information about 19 products selected from 13 vendors. The tools range from general tools that cover the important techniques of inference and estimation, as well as specialized activities such as nonlinear regression, forecasting and design of experiments. The product information contained in the survey was obtained from product vendors and is summarized in the following tables to highlight general features, capabilities, computing requirements, and to provide contact information. Many of the vendors have their own websites for further, detailed information, and many provide demonstration programs that can be downloaded from these sites. No attempt was made to evaluate or rank the products, and the information provided comes from the vendors themselves. The survey will be available on the Lionheart Publishing website (http://www.orms-today.org/surveys/sa/sa-survey.html). Vendors that were unable to make the publishing deadline will be added to the online survey.
Products that provide statistical add-ins available for use with spreadsheets remain popular and provide enhanced specialized capabilities for spreadsheets. The spreadsheet is the primary computational tool in a wide variety of settings, familiar and accessible to all. Many procedures of data summarization, estimation, inference, basic graphics and even regression modeling can be added to spreadsheets in this way. An example is the Unistat add-in for Excel. The functionality of products for use with spreadsheets continues to grow, including risk analysis and Monte Carlo sampling, such as Oracle Crystal Ball.
Dedicated general and special purpose statistical software generally have a wider variety and depth of analysis than available in the add-in software. For many specialized techniques such as forecasting, design of experiments and so forth, a statistical package is appropriate. In general, statistical software plays a distinct role on the analyst’s desktop, and provided that data can be freely exchanged among applications, each part of an analysis can be made with the most appropriate (or convenient) software tool.
An important feature of statistical programs is the importation of data from as many sources as possible, to eliminate the need for data entry when data is already available from another source. Most programs have the ability to read from spreadsheets and selected data storage formats. Within the survey we observe several specialized products, such as STAT::FIT, which are more narrowly focused on distribution fitting than general statistics, but of particular use to developers of models for stochastic systems, reliability and risk.
James J. Swain ([email protected]) is professor in the Department of Industrial and Systems and Engineering Management at the University of Alabama in Huntsville. He is a member of ASA, INFORMS, IIE and ASEE.
References
- Margot Lee Shetterly, 2016, “Hidden Figures,” William Morrow.
- David Alan Grier, 2005, “When Computers Were Human,” Princeton University Press.
- Josh Katz, 2016, “‘Duck Dynasty’ vs. ‘Modern Family’: 50 Maps of the U.S. Cultural Divide,” The New York Times, The Upshot, Dec. 27. Available online at: https://www.nytimes.com/interactive/2016/12/26/upshot/duck-dynasty-vs-modern-family-television-maps.html?_r=0
Click here to view the 2017 Statistical Analysis Software Survey
Save