Big Data: What Can Be Learned from How We Learn?

July 19, 2014

Security and privacy
Public policy

Viktor Mayer-Schonberger and Kenneth Cukier’s book Big Data: A Revolution That Will Transform How We Live, Work and Think opens with a fascinating example of how Google—with Google Flu Trends—created a way to track the progress of the flu in 2009. Google’s algorithm compared five years of web logs—50 million of the most common search terms—with CDC data, and created a predictive model that “proved to be a more useful and timely indicator than government statistics.” And Google Flu Trends could make these assessments in real time, unlike the CDC with its week or two-long lag.

You would think this would be an unqualified success for big data: providing a real world model of its power. 

Well…not so much.

Google Flu Trends missed the swine flu pandemic, overestimated seasonal flu in 2012 by 50% and—according to Science—has over-estimated the prevalence of flu for 100 out of the last 108 weeks. In the Science article, Harvard researchers cautioned that “the core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.”

So what does the supposed Big Data era hold for education? Mayer-Schonberger and Cukier only spend two paragraphs or so tackling the subject, predicting various potential opposing outcomes. Firstly, there’s the possibility of big data as a knee-jerk pseudo-solution for poor test-scores:

Education seems on the skids? Push standardized tests to measure performance and penalize teachers or schools that by this measure aren’t up to snuff. Whether the tests actually capture the abilities of schoolchildren, the quality of teaching, or the needs of a creative, adaptable modern workforce is an open question- but one that the data does not admit.”

And, lastly, a profile of big data used as a diagnostic tool to course-correct learning solutions:

“[Udacity, Coursera, and edX] track[s] the web interactions of students to see what works best pedagogically. Class sizes have been at the level of tens of thousands of students, producing extraordinary amounts of data. Professors can now see if a large percentage of students have rewatched a segment of a lecture, which might suggest they weren’t clear on a certain point. In teaching a Coursera class on machine learning, the Stanford professor Andrew Ng noted that around 2,000 students got a particular homework question wrong—but produced the exact same incorrect answer. Clearly, they were all making the same error. But what was it?

With a little bit of investigation, he figured out that they were inverting two algebraic equations in an algorithm. So now, when other students make the same error, the system doesn’t simply say they’re wrong; it gives them a hint to check their math. The system applies big data, too, by analyzing every forum post that students have read and whether they complete their homework correctly to predict the probability that a student who has read a given post will produce correct results, as a way to determine which forum posts are most useful for students to read. These are things that were utterly impossible to know before, and which could change teaching and learning forever.”

How Educational Institutions are Utilizing Big Data

Education technology expert Ellen Wagner directs the Predictive Analytics Reporting Framework. She uses the data from 1.8 million students to help predict whether a college freshman will drop out or graduate. She analyzes patterns to see what interventional measures can be taken to increase the number of college graduates.

Every incoming student at the University of Texas at Austin is entered into the Dashboard (a giant data set based on students from the past decade). The system has 16 different analyses to see what freshmen need extra support. Southern Illinois University analyzed applicants' grade point averages and test scores to grant a better picture of college success. The system has resulted in minor improvements in both retention and GPAs. The University of Hawaii used data to learn that students who took at least 15 credits were more likely to eventually graduate, leading to the creation of a customized set of courses to incoming freshmen.

In the 2014 Wall Street Journal article Big Data Enters the Classroom: Technological Advances and Privacy Concerns Clash, Renaissance Learning reportedly has “data on 10.7 million students across the country, who regularly take quizzes through the company's portal. Chief Executive Jack Lynch says he believes soon it will be possible for the country to drill down to find out which states or districts are doing best at setting up their curricula or teaching fractions.”

But the collection of student data isn’t universally viewed as an altruistic cure-all to education ills. In April, 2014, inBloom (a non-profit data service for managing student data) lost its last client, the state of New York, due to privacy concerns. Previously, states had dropped the service due to the protestations of parents and education activists worried that student data would be exploited, either for commercial purposes or through hacking. According to a recent study by Fordham University Law School, 95 percent of schools and districts rely on a mix of third-party cloud providers for data storage and internal data mining:—with less than 7 percent of these vendors restricting the commercial use of student information—and the Universities of Maryland and Indiana suffered data breaches exposing student Social Security numbers and other personal information.

Yet, according to the authors of Big Data, despite the concerns, the Big Data phenomenon is here to stay, and that “the ideal of identifying causal mechanisms is a self-congratulatory illusion; big data overturns this.” We’re harvesting an unprecedented amount of information with no concrete plan for what to do with it, or how to deal with it. Yet companies and educational entities sense the value (and power) of this data, even if its most practical uses are only truly revealed in the years ahead.

With the amassing of data, causation—and, therefore, the scientific revolution—has been replaced with correlation, the authors assert. But human progress itself has been due to the posing of meaningful questions—the why?—and the careful conducting of experimentation and analysis to reveal conclusions. So even if big data is poised to upend all of that, we should think twice before rushing past observations in our relentless pursuit of practical conclusions.