Problematic Statistics: Why Gathering Data Remains a Barrier

Other than the pure applicant data that Emory would, reasonably, be unable to provide, there are many other pure data problems that present themselves in this project. Here are just some of the few.

1. We could not reasonably expect Emory to release (at least recent) applicant information, even anonymously. Released data on accepted students could reveal information on the selective process which can very easily be called into question. This lack of information prevents any project such as this to be grounded with extreme statistical significance.

Additionally, because the college admissions process is so heavily guarded, it would create a major headache if data on admitted vs rejected students was ever leaked to the public. Especially in the current social environment around college acceptances, such information could create basis for lawsuits.

2. In order to account for all the variables within this project, we need to take a look at the data for socioeconomic status in our focus areas (Buckhead, College Park, etc). However, even after narrowing our locations to these small counties, there are still massive variation. Let us, for a moment, consider that it is possible to gather the necessary data, that all of it is truthful, and that there is sufficient data to account for statistical variance. Even with these three factors guaranteed, it would still be difficult to decide on the number that uniformly and fairly represents the entire county. If we decide on the mean, ultra high income families or those with special circumstances would skew the entirety of the data to the right. If we decide on the median, we are effectively ignoring the upper and lower echelons of data, meaning that our final conclusion would be less accurate the farther we moved away from the central data point.

3. Within this project, we also require government data that deals with spending per student in a certain areas. This, of course, brings in another complication as we decide how to deal with school districts within a county. Like the household income, difference districts receive different amounts of funding. How the individual schools actually use and report this information has shaky credibility at most. And if we could even bring all this data together, we would still need to differentiate between the source of funding – private and public money sources are important to distinguish.

All of these challenges point towards a need to avoid empirical, sweeping conclusions. In our investigation to find trends in admissions, we must take care to justify how a certain set of data applies to a certain conclusion. It may be more productive to actually to form separate admissions conclusions based on very specific sets of requirements. The decisions made with regards to how data will be interpreted should be an ongoing, evolving process.

Leave a Reply Cancel reply