### Turn on, Tune in, Drop Out

In Introduction to College Unbound: The Future of Higher Education and What It Means for Students, Jeffrey J. Selingo opens with Samantha Dietz’s story. A top student in high school, Samantha earns a 3.9 Grade Point Average (GPA), takes Advanced Placemennt and International Baccalaureate courses, and participates in the debate club, Harvard Model Congress, and the student newspaper. Although she is the first in her family to go to college, Samantha by all indicators is on the road to college success.

After applying to more than half a dozen schools Samantha enrolls at Farleigh Dickinson College in New Jersey, primarily because it offers her “the most financial aid, nearly all of it in grants that wouldn’t have to be paid back”. Samantha struggles her entire first year and eventually drops out. Selingo writes:

*“The story of Samantha Dietz is not unique. It reflects a broad, national trend in American higher education, where some 400,000 students drop out every year. ( Note: I believe the number is much higher.)*

*“What Dietz failed to examine was Fairleigh Dickinson’s graduation rate. In 2006, only 38 percent of its students graduated within six years, a rate well below all of the other schools she had considered. The two other local schools on her list, Rutgers and Drew, graduated more than 70 percent of their students within six years. Though Fairleigh Dickinson was giving Dietz a boatload of money, her chances of emerging at the other end with a degree were pretty dismal.” *

### What is an Acceptable Graduation Rate?

Let’s consider Samantha Dietz’s story writ large. Would you send your child to a 4-year college or university whose six-year *graduation rate* is below *ten percent*? As a student would you incur a large *debt* knowing beforehand that your institution’s graduation rate is below ten percent? As an investor (i.e. taxpayer) would you continue to *invest* year in, year out in institutions whose graduation rates fall below ten percent? As an accreditor would you continue to *validate* institutions whose graduation rates fall below ten percent? Would your decision be any different if the graduation rate were *twenty-five percent*? What about fifty-percent? Where would you set the threshold? Do graduation rates matter in assessing *education quality*? If so, should graduation rate data be easily *accessible* for every institution? And how should the data be used and interpreted?

This is the first in a series of posts towards understanding how to measure *education quality* through data analysis. My goal is to stimulate dialogue among educators and policy makers. Another goal is to catalyze a community of practice among data analysts and educators interested in examining educational data.

In this first post I examine graduation rates at 4-year colleges and universities.

(The dataset in this post derives from the Delta Cost Project, which in turn is based on Integrated Postsecondary Education Data System (IPEDS) data as made available by the National Center for Education Statistics. I plan to share my code (Python) and analyzed datasets at my Github web site. If you notice any inaccuracies in the data or my analysis, please contact me at: alfred(dot)essa(at)gmail(dot)com.)

### Graduation Rates and Variability

Let’s begin our analysis of US graduation rates with a frequently cited statistic:

In the US on average less than 60 percent of students seeking a bachelor’s degree at a 4-year institution complete that degree within six years.

Statistical thinking begins with averages. But every schoolboy knows that an average by itself doesn’t tell us very much. How can we deepen our understanding of this statistic? A good first approach is to consider the spread or “variability” in the data.

A Box Plot visualization shows at a glance not only the median (red line) but the range of the middle 50% of observations (the top and bottom boundaries of the box or rectangle). The whiskers indicate the maximum and minimum. (A Box Plot organizes the data set into quartiles: Q1, Q2, Q3, Q4.) Figure 1 below is a Box Plot of graduation rates for the years 2006 – 2009. We can see that the median is approximately .5 and only a quarter of institutions have graduation rates above .6 or 60%.

Figure 2 below is a Box Plot which breaks down graduation rates for the year 2009 by category of institution: 4-Year Public, 4 Year Private Non-Profit, and 4-Year Private For-Profit. The respective medians are: .46, .55 and .29.

### Graduation Rates and Education Quality

Let’s deepen our intuition of the dataset by displaying graduation rates not in the aggregate but for each institution. Each bubble is a 4-year institution. The size of the bubble is FTE (full-time-equivalent). Private non-profit institutions are represented as green. Public institutions are blue. Private for-profit institutions are brown.

I have also taken the liberty of indicating the area in the graph corresponding to high-performers and low-performers.

The underlying intuition is that one measure of education quality is tuition vs graduation rates (i.e. relatively high graduation rates and low tuition rates would be one indicator of high-performance. Conversely, relatively high tuition and low graduation rates would be one indicator of low-performance.)

The second visualization presents the same set of institutions but the size of the bubble is Pell Grant size. The Federal Pell Grant Program provides need-based grants to low-income students to promote access to postsecondary education. The amount of aid available in 2011 was $35.8 billion.

We return to our initial question: what is an acceptable graduation rate? We can see with these visualizations that a significant number of institutions have graduation rates below .25 and a number of them fall below .10. In addition some of the institutions have large enrollments, relatively high tuition and are recipients of significant federal funding in the form of Pell grants.

### Graduation Rates and Economic Cost – First Approximation

What is the economic cost of these low graduation rates? Let’s create a “toy model” to derive a first approximation, beginning with the economic cost of first-year attrition.

Suppose at University X we begin with a full-time freshman cohort of 10,000 students. Let’s assume a first-year retention rate of 75%, meaning that only 7,500 students return as sophomores. (The underlying data supports this assumption. The greatest attrition takes place during the first year.) Let’s further assume that taxpayers subsidize on average, either through federal or state funding, $2,500 per student per year. (The underlying data also shows that this is a very conservative estimate.) Given our two assumptions we can estimate that $6.25M of taxpayer money is lost in one year ($2,500 per student x 2,500 students) at this university and associated with this cohort.

If we scale the numbers the wasted investment gets very large very quickly. Think how many colleges and universities are in each state. For 25,000 students it’s $62.5M. For 250,000 students it reaches $625M. For 2,500,000 students it’s $6.25B. That’s B for Billion and B for Big.

According to our model this means that *each year* the cost to taxpayers of *first-year attrition alone* is in the hundreds of millions dollars and more likely in the billion range. If we factor in attrition in subsequent years the economic cost gets even larger.

Based on a very simple calculation we can estimate that the annual economic cost (i.e. waste) of low graduation rates is easily in the tens of billions of dollars. The opportunity cost is much higher.

We also have not considered social costs. Consider Samantha Dietz’s story playing itself out hundreds of thousands of times each year.

Let’s add one more assumption to our toy model. Let’s assume that student herself pays $2,500 per year towards her education. We then have a symmetrical set of escalating debt per year due to first year attrition ($6.25M, $62.5M, $625M). This set of costs is borne by the student and translates into student debt. (A recent study by the Consumer Protection Financial Bureau indicates that student loans held or guaranteed by the federal government has crossed the astonishing $1 trillion mark. Student loan debt is exceeded only by mortgage debt and is now even greater than consumer debt.)

### Top, Bottom, and Value Institutions – Year 2009

Below is a chart of institutions with lowest graduation rates (2009 data). For purposes of comparison I have included sector (1=public; 2=private, non-profit; 3=private, for-profit), first-year retention, tuition, and federal pell grant received by the institution. (I will provide the full data set in spreadsheet format so that you can run your own analysis.)

Next is a list of institutions with the highest graduation rates in 2009. The list is not surprising. It’s mostly the Ivy League schools and elite liberal arts colleges.

I have also included a list of “value institutions”. I define a value institution as having relatively high graduation rates and relatively low tuition and fees.

### Preliminary Conclusions

Good data analysis is inherently Socratic. We pose an initial set of questions which lead to further questions. While we can’t reach any definitive conclusions from our initial analysis we can begin to pose a number of questions for further exploration.

First, a significant number of colleges have graduation rates below 25% and some even lower than 10%. An obvious next question is to overlay and correlate student preparation for college with success rates. It can be argued that institutions with high graduation rates enroll students who are well-prepared for college. Can we normalize the data to show which institutions achieve relatively high graduation rates with less well prepared students?

Second, most of the top performers, according to one measure (high graduation rate and low tuition), are public institutions. This seems to contradict the charge that public institutions are inherently inefficient compared to the private sector. Here also we need to take a deeper look. What do costs, efficiencies, and performance look like when we take into account state subsidies for public institutions?

Third, private for-profit institutions tend to fare the worst in terms of graduation rates, in some cases lower than 10%. A number of them also charge relatively high tuition. Are all private for-profit institutions in the same boat? Do some fare better than others? Should we be investing more in public institutions based on this data? Should we strengthen incentives for private institutions to spur further competition? Or both?

Fourth, a significant amount of federal grant money in the form of Pell grants goes to institutions with very low success rates. This was the most mind boggling “information” hidden in the data. We seem to be wasting tens of billions of dollars in the name of providing greater access to higher education. But the money is simply not benefiting the students. Do Pell grants need to be reformed? What additional data do we need to cast more light on public spending in higher education?

Finally, student loan debt now exceeds $1 trillion. The statistic on its own is alarming. But if we overlay it with the assumption — validated provisionally by our analysis — that the debt, in the majority of cases, turns out not to be an investment but bad debt, then the signs point to further erosion of the middle class and the American Dream.

(Note: I am grateful to the Delta project for making available the data sets for this analysis.)