Visualizing Matrices

Matrices are everywhere in Data Science. In fact, they are everywhere in Computational Science. Stanford’s Margot Gerritsen has a wonderful video on Visualizing Matrices.

First, she walks us through the basic structure of a matrix. Yes, a matrix consists of rows and columns. But it should be viewed as a system of equations with an underlying structural relationship among the coefficients.

Then, Gerritsen shows us a neat visualization trick called a “spy plot”: we turn every nonzero into a dot. This quickly reveals possible relationships among the coefficients.

Next, Gerritsen takes a matrix and represents it as a graph structure consisting of nodes and edges. Then comes the grand finale:

“Here’s an idea:

Give each node an electrical charge.
Make each edge a spring

Drop the whole system on the floor
and let it “wobble” until it finds its
optimal (minimal energy) state.”

Voila! The matrix now comes alive, revealing a range of beautiful forms. This is one of my favorite videos on the Internet.

On Adaptive Learning: A Partial Response to Audrey Watters

In “The Algorithmic Future of Education”  Audrey Watters offers a sweeping critique of adaptive learning, arguing that “robot tutors” (her phrase) don’t benefit learners, they are not anything new under the sun, and that, worst of all, they represent a cunning ploy by industry (in league with administrators and managers) to “subjugate labor” and to create “austerity”.  According to Watters, adaptive learning and “algorithms” propel us towards a mechanized world which devalues learning, devalues labor, and devalues “caring”. Her latest piece on adaptive learning is part of her general skepticism and ongoing criticism of the educational technology industry.  

Watters is one of the few observers and critics struggling with some of the deeper questions about educational technology. What is it’s shape? Who benefits? Where is it going? Who controls it?  What does it mean for privacy and autonomy? Of late, she has also set her sights on some of the utterly nonsensical claims coming out of the adaptive learning world. (No researcher in their right mind would make such claims, nonetheless the claims are out there and unfortunately cast a shadow over the entire industry.)  Audrey Watters is a gadfly in the best sense of the term and her arguments need to be taken seriously.

If Watters is asking the right questions, her answers are at times questionable. In a future post I hope to provide a fuller response to “The Algorithmic Future of Education.” In this post I have a narrower aim, which is to point out that Watters characterization of adaptive systems suffers from a factual error and this leads to sloppy generalizations:

“What makes ed-tech programming “adaptive” is that the AI assesses a student’s answer (typically to a multiple choice question), then follows up with the “next best” question, aimed at the “right” level of difficulty. This doesn’t have to require a particularly complicated algorithm, and the idea actually based on “item response theory” which dates back to the 1950s and the rise of the psychometrician. Despite the intervening decades, quite honestly, these systems haven’t become terribly sophisticated, in no small part because they tend to rely on multiple choice tests.”

The assertion is simply incorrect. It’s also not the first time Watters has portrayed adaptive systems as based primarily on multiple-choice questions; powered algorithmically by item response theory (IRT); and, having seen no advances in the “intervening decades”.  She repeats it in her TedxNYED talk and the theme runs as a current through her other publications.

Why is this incorrect? For example, Aleks, which is one of McGraw-Hill Education’s adaptive learning platforms, has never used multiple choice questions. It has been around for nearly a decade. Second, the algorithmic theory (Knowledge Space Theory) behind Aleks is unrelated to Item Response Theory (IRT). Third, unlike Knewton, Aleks is not a black box and never has been. Its algorithmic basis was developed by mathematicians and cognitive scientists working at UC Irvine and University of Brussels. As a result, there is an extensively published and peer-reviewed research trail on how it works and how it aims to advance learning outcomes.

The same can be said of modern Intelligent Tutoring Systems (ITS). Even a cursory investigation of ITS reveals that a shift occurred in the 1970s and 1980s away from Computer Assisted Instruction (CAI) systems, which were based on behaviorist assumptions, to ITS which tried to incorporate advances in computer science, cognitive psychology, and artificial intelligence. Intelligent Tutoring Systems have never been about multiple-choice questions nor have they drawn primarily on IRT as their underlying pedagogical framework.

As a cultural critic and cultural anthropologist Watters will appreciate, I am sure, the importance of genealogy.  IRT emerged in response to the increasing emphasis on high-stakes testing in the US. IRT has also been associated with regimes for measuring “intelligence”.  Aleks, and systems like it, have an entirely different genealogy. The were designed to provide learners with feedback. The world of IRT is the world of summative assessments. The world of well-designed adaptive systems such as Aleks, on the other hand, is the world of formative assessments. There is overwhelming evidence in learning science that formative assessments are among the most important levers we have for improving learning outcomes. Testing is not Learning. Researchers working in this space know it and get it. They have also known it for a long time.

The best research also shows that our goal should never be to replace teachers. Our goal should be to empower teachers and support student-centered learning environments. Some of us believe that technology has a place in realizing this goal. Some of us also believe that which technologies are effective, and in which contexts, should be demonstrated by research and evidence, not anecdotes. 

[Note: The views stated in this blog are my own. I am not speaking on behalf of my employer, McGraw-Hill Education. This post has also not been reviewed or cleared by my employer.]

Lorena Barba’s Keynote: Computational Thinking is Computational Learning

“If you can do mathematics with a dynamic technology instead of with a static one, then perhaps you can do real mathematics instead of denatured mathematics and thereby open the possibility of a Samba School effect.” —Seymour Papert

The term computational thinking was coined by MIT’s Seymour Papert and popularized by Carnegie Mellon’s Jeannette Wing.  In a series of papers and talks Wing, now Corporate Vice President at Microsoft Research, has argued that the approach to problem solving and design which underlies computer science training is a “universally applicable attitude and skill set that everyone, not just computer scientists, should be eager to learn and use.” Wing’s ideas have gained some traction in the educational community, but not as much as I think they deserve.

In a brilliant keynote at SciPy2014 Lorena Barba riffs off Jeannette Wing but also brings us full circle to a number of Papert’s important ideas in educational design. Barba suggests that computational learning is a type of learning and with tools such as IPython Notebook we can finally begin to shape educational experiences in science and mathematics which is truly interactive and based on sharing and collaboration. Barba is first and foremost a scientist. But we look forward to seeing her emerge as a powerful and eloquent voice in education.

Taking Control of Your Data: HTTP with Accountability (“HTTPA”)


In the world of Big Data there is increasing concern about privacy. But the conventional wisdom, even among many “experts”, is that privacy is a relic of the past. The thinking goes: “Companies and organizations can make a best effort at securing private information but in the Internet Age a person’s data trail, once it is in external hands, is no longer subject to control.” As Cory Doctorow points out in “The Curious Case of Internet Privacy”, we have all been led to accept the Faustian bargain that being on the Internet means trading privacy for services.

Does it really have to be this way?

Tim Berners-Lee, the inventor of the World-Wide Web, and his team of researchers at MIT are taking this issue head-on by paving the way for a new protocol called “HTTPA”. HTTPA stands for “HTTP with Accountability”.

The basic idea is that each item of private data would be tagged with a “uniform resource identifier” (URI) specifying conditions of use of that data: “Here you go. I am giving this information about me, but I am also telling you how you can use it.” As the data winds itself through the bowels of various databases the URI also provides the basis for constructing an audit trail:

“When the data owner requests an audit, the servers work through the chain of derivations, identifying all the people who have accessed the data, and what they’ve done with it.”

If it can be made to work, HTTPA will be ground breaking technology in support of privacy. Berners-Lee team see the new protocol as being voluntary:

“It would be up to software developers to adhere to its specifications when designing their systems. But HTTPA compliance could become a selling point for companies offering services that handle private data.”

Where and how will these audit trails be stored? Doesn’t that open up a major point of vulnerability if there is a central database that records everything I do?  The technology behind HTTPA is intriguing. Its security uses distributed hash tables, which is at the heart of peer-to-peer networks such as BitCoin. This means in part that there is no central or “national” record of my activities and all transactions would be secured with high-grade cryptography.

Of course, the devil is in the details. If HTTPA can be made to work, we can finally begin to unwind ourselves from the seemingly inherently Internet Faustian bargain. As Cory Doctorow has noted, letting users control their data does not have to destroy business or slow down analytics. Progressive companies will seize the duty to secure privacy as an opportunity to create value:

“Right now, the users and the analytics people are in a shooting war, but only the analytics people are armed. There’s a business opportunity for a company that wants to supply arms to the rebels instead of the empire.”


Measuring Education Quality: A First Look at US College Dropout Statistics

Turn on, Tune in, Drop Out

In Introduction to College Unbound: The Future of Higher Education and What It Means for Students, Jeffrey J. Selingo opens with Samantha Dietz’s story. A top student in high school, Samantha earns a 3.9 Grade Point Average (GPA), takes Advanced Placemennt and International Baccalaureate courses, and participates in the debate club, Harvard Model Congress, and the student newspaper. Although she is the first in her family to go to college, Samantha by all indicators is on the road to college success.

After applying to more than half a dozen schools Samantha enrolls at Farleigh Dickinson College in New Jersey, primarily because it offers her “the most financial aid, nearly all of it in grants that wouldn’t have to be paid back”. Samantha struggles her entire first year and eventually drops out. Selingo writes:


Jeff Selingo

Jeff Selingo

“The story of Samantha Dietz is not unique. It reflects a broad, national trend in American higher education, where some 400,000 students drop out every year. (Note: I believe the number is much higher.)

“What Dietz failed to examine was Fairleigh Dickinson’s graduation rate. In 2006, only 38 percent of its students graduated within six years, a rate well below all of the other schools she had considered. The two other local schools on her list, Rutgers and Drew, graduated more than 70 percent of their students within six years. Though Fairleigh Dickinson was giving Dietz a boatload of money, her chances of emerging at the other end with a degree were pretty dismal.” 


What is an Acceptable Graduation Rate?

Let’s consider Samantha Dietz’s story writ large. Would you send your child to a 4-year college or university whose six-year graduation rate is below ten percent? As a student would you incur a large debt knowing beforehand that your institution’s graduation rate is below ten percent? As an investor (i.e. taxpayer) would you continue to invest year in, year out in institutions whose graduation rates fall below ten percent? As an accreditor would you continue to validate institutions whose graduation rates fall below ten percent? Would your decision be any different if the graduation rate were twenty-five percent? What about fifty-percent? Where would you set the threshold? Do graduation rates matter in assessing education quality? If so, should graduation rate data be easily accessible for every institution? And how should the data be used and interpreted?

This is the first in a series of posts towards understanding how to measure education quality through data analysis. My goal is to stimulate dialogue among educators and policy makers. Another goal is to catalyze a community of practice among data analysts and educators interested in examining educational data.

In this first post I examine graduation rates at 4-year colleges and universities.

(The dataset in this post derives from the Delta Cost Project, which in turn is based on Integrated Postsecondary Education Data System (IPEDS) data as made available by the National Center for Education Statistics. I plan to share my code (Python) and analyzed datasets at my Github web site. If you notice any inaccuracies in the data or my analysis, please contact me at: alfred(dot)essa(at)gmail(dot)com.)

Graduation Rates and Variability

Let’s begin our analysis of US graduation rates with a frequently cited statistic:

In the US on average less than 60 percent of students seeking a bachelor’s degree at a 4-year institution complete that degree within six years.

Statistical thinking begins with averages. But every schoolboy knows that an average by itself doesn’t tell us very much.  How can we deepen our understanding of this statistic?  A good first approach is to consider the spread or “variability” in the data.

A Box Plot visualization shows at a glance not only the median (red line) but the range of the middle 50% of observations (the top and bottom boundaries of the box or rectangle). The whiskers indicate the maximum and minimum. (A Box Plot organizes the data set into quartiles: Q1, Q2, Q3, Q4.) Figure 1 below is a Box Plot of graduation rates for the years 2006 – 2009. We can see that the median is approximately .5 and only a quarter of institutions have graduation rates above .6 or 60%.

Figure 1: Graduation RatesFigure 2 below is a Box Plot which breaks down graduation rates for the year 2009 by category of institution: 4-Year Public, 4 Year Private Non-Profit, and 4-Year Private For-Profit. The respective medians are:  .46, .55 and .29.

Figure 2: Graduation Rates by Category

Graduation Rates and Education Quality

Let’s deepen our intuition of the dataset by displaying graduation rates not in the aggregate but for each institution. Each bubble is a 4-year institution. The size of the bubble is FTE (full-time-equivalent). Private non-profit institutions are represented as green. Public institutions are blue. Private for-profit institutions are brown.2009FTE

I have also taken the liberty of indicating the area in the graph corresponding to high-performers and low-performers.

The underlying intuition is that one measure of education quality is tuition vs graduation rates (i.e. relatively high graduation rates and low tuition rates would be one indicator of high-performance. Conversely, relatively high tuition and low graduation rates would be one indicator of low-performance.)

The second visualization presents the same set of institutions but the size of the bubble is Pell Grant size. The Federal Pell Grant Program provides need-based grants to low-income students to promote access to postsecondary education. The amount of aid available in 2011 was $35.8 billion.


We return to our initial question: what is an acceptable graduation rate? We can see with these visualizations that a significant number of institutions have graduation rates below .25 and a number of them fall below .10.  In addition some of the institutions have large enrollments, relatively high tuition and are recipients of significant federal funding in the form of Pell grants.

Graduation Rates and Economic Cost – First Approximation

What is the economic cost of these low graduation rates? Let’s create a “toy model” to derive a first approximation, beginning with the economic cost of first-year attrition.

Suppose at University X we begin with a full-time freshman cohort of 10,000 students. Let’s assume a first-year retention rate of 75%, meaning that only 7,500 students return as sophomores. (The underlying data supports this assumption. The greatest attrition takes place during the first year.) Let’s further assume that taxpayers subsidize on average, either through federal or state funding, $2,500 per student per year. (The underlying data also shows that this is a very conservative estimate.) Given our two assumptions we can estimate that $6.25M of taxpayer money is lost in one year ($2,500 per student x 2,500 students) at this university and associated with this cohort.

If we scale the numbers the wasted investment gets very large very quickly. Think how many colleges and universities are in each state. For 25,000 students it’s $62.5M. For 250,000 students it reaches $625M. For 2,500,000 students it’s $6.25B. That’s B for Billion and B for Big.

According to our model this means that each year the cost to taxpayers of first-year attrition alone is in the hundreds of millions dollars and more likely in the billion range. If we factor in attrition in subsequent years the economic cost gets even larger.

Based on a very simple calculation we can estimate that the annual economic cost (i.e. waste) of low graduation rates is easily in the tens of billions of dollars. The opportunity cost is much higher.

We also have not considered social costs. Consider Samantha Dietz’s story playing itself out hundreds of thousands of times each year.

Let’s add one more assumption to our toy model. Let’s assume that student herself pays $2,500 per year towards her education. We then have a symmetrical set of escalating debt per year due to first year attrition ($6.25M, $62.5M, $625M). This set of costs is borne by the student and translates into student debt. (A recent study by the Consumer Protection Financial Bureau indicates that student loans held or guaranteed by the federal government has crossed the astonishing $1 trillion mark. Student loan debt is exceeded only by mortgage debt and is now even greater than consumer debt.)

Top, Bottom, and Value Institutions – Year 2009

Below is a chart of institutions with lowest graduation rates (2009 data). For purposes of comparison I have included sector (1=public; 2=private, non-profit; 3=private, for-profit), first-year retention, tuition, and federal pell grant received by the institution. (I will provide the full data set in spreadsheet format so that you can run your own analysis.)


Next is a list of institutions with the highest graduation rates in 2009. The list is not surprising. It’s mostly the Ivy League schools and elite liberal arts colleges.


I have also included a list of “value institutions”. I define a value institution as having relatively high graduation rates and relatively low tuition and fees.



Preliminary Conclusions

Good data analysis is inherently Socratic. We pose an initial set of questions which lead to further questions. While we can’t reach any definitive conclusions from our initial analysis we can begin to pose a number of questions for further exploration.

First, a significant number of colleges have graduation rates below 25% and some even lower than 10%. An obvious next question is to overlay and correlate student preparation for college with success rates. It can be argued that institutions with high graduation rates enroll students who are well-prepared for college. Can we normalize the data to show which institutions achieve relatively high graduation rates with less well prepared students?

Second, most of the top performers, according to one measure (high graduation rate and low tuition), are public institutions. This seems to contradict the charge that public institutions are inherently inefficient compared to the private sector. Here also we need to take a deeper look. What do costs, efficiencies, and performance look like when we take into account state subsidies for public institutions?

Third, private for-profit institutions tend to fare the worst in terms of graduation rates, in some cases lower than 10%. A number of them also charge relatively high tuition. Are all private for-profit institutions in the same boat? Do some fare better than others? Should we be investing more in public institutions based on this data? Should we strengthen incentives for private institutions to spur further competition? Or both?

Fourth, a significant amount of federal grant money in the form of Pell grants goes to institutions with very low success rates. This was the most mind boggling “information” hidden in the data. We seem to be wasting tens of billions of dollars in the name of providing greater access to higher education. But the money is simply not benefiting the students. Do Pell grants need to be reformed? What additional data do we need to cast more light on public spending in higher education?

Finally, student loan debt now exceeds $1 trillion. The statistic on its own is alarming. But if we overlay it with the assumption — validated provisionally by our analysis — that the debt, in the majority of cases, turns out not to be an investment but bad debt, then the signs point to further erosion of the middle class and the American Dream.

(Note: I am grateful to the Delta project for making available the data sets for this analysis.)

Can We Improve Retention Rates by Giving Students Chocolates?

Course Signals’ Retention Claim

Course Signals is an early-warning alert system for identifying at-risk students developed at Purdue University and made available commercially by Ellucian. It has been claimed that use of Course Signals “boosts graduation rate by 21 percent”.  The claim is suspect and continues to be repeated without scrutiny.  I wrote a simulation to test the claim.

My conclusion from looking at the simulation data: the direct causality attributed to Course Signals is erroneous. In fact, the causation is the reverse of what is claimed. Students who take Course Signals courses are not more likely to graduate than non-Course Signals students (at least not directly and at the rates suggested), rather students who graduate are more likely to take Course Signals courses.  Recall Euthyphro’s dilemma.

This is a classic example of correlation being used to make claims about causality. X correlates with Y. Therefore, X causes Y. An obvious fallacy. In this case, X = students taking two or more Course Signals courses and Y = increased graduation rates.

What the simulation shows, first of all, is that if X indeed causes Y, then we can get the same retention effect by giving students chocolates. The chocolates given don’t even have to be very many or all that expensive. A few Hershey kisses will do. But, of course, we can’t improve retention rates by giving a couple of Hersheys kisses to students. What the simulation also shows is that if we were to give Hersheys kisses to students randomly, those who graduate are more likely to have more chocolates.

Note: I owe the insight to Mike Caulfield who pointed out anomalies in the data and re-framed, correctly I believe, how the Course Signals data should be viewed. The aim of the simulation is to provide some data to back up Caulfield’s insight.

Simulation Design and Approach

The simulation is based on a model which tracks a cohort of students over four years. We begin with a sample cohort (e.g. 10,000) who enter the university as freshmen. A randomly chosen subset of the students (e.g. 20%) each year take courses where they are given chocolates. This is the analog of taking Course Signals courses. Each year a randomly chosen subset of students (e.g. 10%) drop out of the institution. The base parameters of the model (e.g. cohort size, percentage of chocolate dispensing courses, drop out rate) can be changed by the user.

The model tracks chocolates dispensed and compares retention rates between students who received no chocolates, students who received at least one chocolate, and students who received more than two chocolates. The simulation demonstrates that students who receive two or more chocolates consistently have significantly higher retention rates than students who received no chocolates. From the simulation data it would be erroneous, therefore, to conclude that we can significantly improve retention rates by merely giving chocolates to students. The simulation data illustrates that students who graduate are more likely to receive more chocolates, and not the reverse.


The simulation, and the model upon which it is based, has not been verified and, therefore, the assumptions, the code, and the conclusions might be erroneous. The code is made available in my Github library in the form of an iPython notebook for review and criticism. It should also be noted the simulation is not intended as a general criticism of the Course Signals software, which was ground breaking in learning analytics. No doubt Course Signals software has many benefits, including improving course-level grades. The simulation is offered as a test of the claim that Course Signals directly leads to significant gains in retention.

Sample Results from the Simulation

The following are some results from the simulation. The first row displays retention rates for students who received no chocolates. The second row displays retention rates for students who received at least one chocolate. The last row shows students who received two or more chocolates. Why track students who received two or more chocolates? Because the authors of the study claim that two is the “magic number” where significant retention gains kick in.

The simulation data shows us that the retention gain for students is not a real gain (i.e. causal) but an artifact of the simple fact that students who stay longer in college are more likely to to receive more chocolates. So, the answer to the question we started off with is “No.”. You can’t improve retention rates by giving students chocolates.

Ret2 Ret1 Ret3 Ret4 Ret5

Rwandan Tragedy

The Rwandan Tragedy: Data Analysis with 7 Lines of Simple Python Code

There is a lot of discussion these days about Big Data, Machine Learning Algorithms, and Advanced Statistics. But with this case study I hope to illustrate that you don’t have to be a “Rocket Data Scientist” to take advantage the tools now beginning to be available for data analysis.

The available of “open datasets”, when combined with tools such as Python/Pandas can empower historians, policy makers, and ordinary citizens and students to form powerful insights into a wide range of phenomenon.

The Rwandan Tragedy Case Study shows via data that during a 50-year period (corresponding to the World Bank dataset on life-expectancy and mortality) Rwanda was the epicenter of one of the worst catastrophes and episodes of barbarity in human history.

The IPython notebook and dataset for this tutorial is available at my GitHub site in the pda (Python for Data Analysis) repository:


Joseph Blitzstein and “The Soul of Statistics”

Reasoning with “uncertainty” is at the core of analytics. The science of uncertainty itself has two faces: probability and statistics. Probability allows us to calculate the “likelihood” and “unlikelihood” of events. If God, Death, and Taxes are the only certainties, then the rest of life lies squarely in the realm of probability. Statistics allows us to reason from what is known to what is unknown. Taken together the two disciplines allow us to cast a beam for peering across space and time.

Given the importance of the two disciplines, not just for modern science but also for critical thinking, it’s a wonder they remain marginal to the curriculum.

In a delightful video Harvard’s Joseph Blitzstein gives us a flavor of the “soul of statistics”. As the example of “missing data” illustrates ordinary reasoning is full of pitfalls.



Blitz’s complete lecture course (Stat110), available on both YouTube and iTunes University, is a tour-de-force. It’s not easy going but one of the best open content materials on the web.  Blitzstein notes in interview that “I talk a lot about paradoxes and results which at first seem counterintuitive, since they’re fun to think about and insightful once you figure out what’s going on.”


Smart Education Meets Moneyball (Part I)

Wired Magazine. Innovation Insights

John Baker, Desire2Learn, 04.09.13

Smart colleges and universities are beginning to use predictive analytics to transform massive amounts of data into active intelligence, using it to help their customers – i.e., students – learn their course material more effectively and boost their grades.

Analytics in education is empowering the learner in every step of their journey, not just with course success. Which courses should I select? What should I major in? What is the quickest and least costly path to graduation? What is the right career for me? Smart companies can learn from education analytics by extending their own “big data” efforts to identify and groom hidden talent (a plotline of the movie “Moneyball”) and to create a culture that emphasizes lifelong learning built around collaboration and teamwork.

Read More



The Learning Registry: How to Liberate Learning Resources

How can I discover valuable learning resources? This is still an unsolved problem from technical infrastructure perspective.  

Funded by the US Department of Education and led by Dr. Marie Bienkowski, the Learning Registry is an exciting project for “liberating” learning resources from the web.  The Learning Registry is another Learning Object Repository (LOR). Rather, it’s an ambitious “store-and-forward” data exchange network that incorporates “paradata” (contextual learning data) about the learning resource.

The following presentation by Dr. Marie Bienkwoski provides an overview: