Data Analysis Comes to Python: The Pandas Library

The giant panda is renowned throughout the world.  Not as well known but fast gaining in popularity is Pandas, a Python-based data library for data analysis.

Among programming languages Python has emerged as the language of choice for a broad array of applications in scientific computing. Python is easy to learn and easy to use, yet flexible enough to handle difficult computational tasks. It lends itself well to those who wish to focus their energy on solving domain-specifc problems instead of mastering the arcane twists and turns of programming and programming languages. Python’s greatest strength is the growing set of re-usable computational routines in the form of open source libraries such as SciPy and NumPy.

By comparison data analysis capabilities in Python has lagged behind.  Python has been very strong for data wrangling but, compared to domain-specific languages such as R, quite weak for data analysis and modeling. That’s starting to change with the Pandas library. Pandas release 0.10.1 came out in January 2013. The project’s founder Wes McKinney has also recently authored a fine book Python for Data Analysis through O’Reilly publishing. For Python + data analysis enthusiasts these are welcome developments.

I will write more about Python + Pandas for data analysis in future postings.

Reflections on Lev Gonick’s The Year Ahead in IT, 2013

Each year I look forward to reading Lev Gonick’s “The Year Ahead in IT”.  Gonick is CIO at Case Western Reserve University, a strategic thinker, and deep analyst of ICT trends in higher education. This year’s analysis doesn’t disappoint. I highlight some key themes in Gonick’s article and finish by posing a question.


Gonick cites the French philosopher Jacques Ellul, who fifty years ago in The Technological Society anticipated that one day soon all information would become available instantaneously:

“There will no longer be any need of reading or learning mountains of useless information; everything will be received and registered according to the needs of the moment.”

If alive today Ellul would be writing about the perils and promises of technology for the likes of Wired Magazine.  Although not as well known or influential as Marshall McLuhan, Ellul also had the singular ability to synthesize trends in different domains and to project their far-reaching implications for society.

As a philosopher and theologian Ellul was especially sensitive to the ethical dimensions of an accelerating “technical” society in which machines represent new devices in our world but also new ways of thinking about the world. And much like McLuhan, but more exacting and ruthless in his logic, Ellul drives home the point that how we think about the world acts back and changes who we are as moral agents.

(It’s a shame that Ellul is not more widely known or widely read. The documentary “The Betrayal by Technology” provides a good introduction to his work.)

Ellul envisioned a society in the not too distant future where information is stored in massive “electronic banks” and transmitted effortlessly to our brains through a type of intravenous system. Ellul’s vision is no longer science fiction. In a way he was perfectly describing the emerging convergence of digital content and learning platforms on the Internet.


Open content, in repositories such as Wikipedia and MIT’s Open Courseware, is growing rapidly. Paralleling this growth in the availability of rich content, the Internet is now enabling more sophisticated forms of learning that go beyond simple information look-up and retrieval.

Semantic search engines coupled with sophisticated learning analytics, as embedded in learning platforms, will move us towards a type of “nervous system” for instantaneously acquiring information. Companies such as Desire2LearnCoursera, and Udacity, non-profits such as Khan Academcy, and educational consortia such as EdX are building out massively scalable platforms for personalizing learning. These learning platforms will serve as major nodal points of the nervous system forecast by Ellul.

Some combinations are merely additive while others are seeds for germinating Kurzweillian singularities. As Gonick observes we are rapidly approaching the second generation of open content. The phenomenon of open content and congruent technologies such as MOOCs will inevitably make learning accessible to more and more people around the globe:

“Today, millions of students are experimenting with first-generation open content. Within the next year or two, more than 50 million diverse open educational learners will find compelling motives to access the single largest, dynamic body of student-centered learning materials available.”

Both Ellul and Gonick recognize that mere information is not knowledge. The construction of knowledge from information requires social mediation. The second and third generations of open content and learning platforms, therefore, will squarely need to attack the social mediation of learning.


Social learning is a theme that George SiemensStephen Downes, and Dave Cormier have emphasized from the very beginning in their pragmatic commentaries on learning. These Canadians not only pioneered MOOCs but recognized early on that the design of learning systems must intrinsically take into account the social dimension of learning. Their time is now.

This brings us full circle to Ellul. A core insight, I believe, of Gonick’s article is that the learning enterprise, as well as the technology systems used to support it, must have spaces for self-reflection as part of their design:

“The emerging learning enterprise is about designing and creating experiences that provide opportunities to discover and gain 21st-century competencies based on assembly, synthesis, perspective, critique, and interconnected systems thinking. It is precisely the role anticipated by Ellul to create opportunities for conscious self-reflection.”

In the context of learning, social interaction and self-reflection are allied concepts. Those familiar with Eric Mazur’s pioneering work at Harvard recognize that the core of “interactive learning” is “peer instruction”, which provides a mechanism for surfacing self-reflection.


I want to close with a question for reflection. As we fully approach the world of “instantaneous information” some of the core functions of the University are rapidly becoming obsolete.

The traditional library as the “repository” of information has disappeared. The lecturer as the “transmitter” of information will soon disappear. Universities must confront the following question head-on if they are to survive the next decade:

Which capabilities must the University shed, which must it retain, and which must it evolve?

The march of technology is inevitable. As Gonick observes, “the year ahead will remain turbulent for universities and opportunistic for learners.” Intrepid universities will seize the opportunity. Fearful ones will continue to dream the past.

Alfred Essa

What is Analytics?

Analytics can be defined in various ways. In this presentation I suggest that there are three levels of Analytics capability or maturity:

  • Analytics I is data about the Past and, at best, data about the Present. Analytics I is the realm of traditional reporting and traditional Business Intelligence
  • Analytics II is data about the Future. Analytics II is the realm of Predictive Analytics and Forecasting.
  • Analytics III is data about the Desired Future. Analytics III is the realm of Optimization, namely finding the best path among alternatives.

Exclusive: Desire2Learn buys Bill Gates-backed virtual guidance counselor Degree Compass



Canadian ed tech company Desire2Learn has acquired Degree Compass, a course recommendation engine developed at Austin Peay State University with funding from the Bill and Melinda Gates Foundation.

For most college students, picking classes involves reading up on options or chatting with classmates and professors. But, if Canadian ed tech companyDesire2Learn has its way, more students will consult a data-driven “virtual guidance counselor.”

On Thursday, the company, which is a stone’s throw away from Research in Motion in Ontario’s Kitchener-Waterloo area, announced that it acquiredDegree Compass, a predictive analytics tool developed with funding from the Bill and Melinda Gates Foundation.

>>> Read More



Case Study: This case study from the EDUCAUSE book Game Changers describes the Degree Compass course recommendation.

How It Works: This New York Times infographic displays how Degree Compass works.

Changing the Financial Calculus: In this video Bill Gates discusses Degree Compass in the context of improving student success but also changing the financial calculus.

Making Informed Choices: This Educause video features Tristan Denley, provost and vice president for academic affairs at Austin Peay University. He answers the questions: What are the various ways that analytics can be used to inform student choices?


President Crow

“No More Excuses”: Michael M. Crow on Analytics

Michael M. Crow, President of Arizona State University, understands the strategic importance of Analytics in education. Read this insightful interview by EDUCAUSE CEO and President Diana Oblinger in Educause Review:

Michael M. Crow became the 16th president of Arizona State University in July 2002, with the goal of transforming ASU into what he calls a “New American University”—an institution combining the highest levels of academic excellence, inclusiveness to a broad demographic, and maximum societal impact. His view included increasing graduate numbers, graduation rates, and freshman-retention rates while also expanding ethnic and economic diversity.

Today, ASU has established more than a dozen new transdisciplinary schools and large-scale research initiatives and has nearly tripled research expenditures. Enrollment increased 30 percent since 2002, to a record 72,254 undergraduate and graduate students in fall 2011. Minority enrollment as a percentage of the total student population increased 52 percent, to 31 percent of the total student body. ASU awarded 17,090 degrees in 2011, up 51.5 percent from 2002. Undergraduate degrees increased 42.4 percent during that time. The six-year graduation rate for the freshman cohort entering in fall 2004 was 58.7 percent, up 19.3 percent from the rate for the cohort that entered in fall 1995. Freshman persistence in fall 2011 increased to 84 percent, 9 percent higher than in fall 2002. In addition, the number of first-time, full-time, low-income Arizona freshmen increased 647 percent from FY2003 through FY2011.

President Crow attributes much of this success to the use of analytics. Recently, EDUCAUSE President Diana Oblinger talked with Crow about why the university moved toward using analytics, how ASU has implemented analytics, where there could be problems, and what he sees ahead.


Alfred Essa

Alfred Essa: The State of Analytics in Higher Education (Interview)

I’ve known Alfred Essa for a couple of years. From the start, Al struck me as one of those people that is truly focussed on improving higher education.  After a career at various organizations in the US, includingMIT and Minnesota State Colleges and Universities, Essa now serves as the Director of Innovation and Analytics Strategy at Desire2Learn, the Canadian education technology company (which recently accepted an 80 million dollar investment from OMERS). 


Full Interview in Keith Hampson’s Higher Education Management Blog

Learning From Data – [Prof. Yaser Abu-Mostafa, Caltech]

Learning From Data is a great Open Course on Machine Learning. It covers “basic theory, algorithms, and applications of machine learning—the discipline that deals with enhancing the ability of computational systems to learn from data, enabling them to improve their performance with experience. Examples of machine learning applications include systems used by banks to determine whether to approve applications for credit cards based on financial data, and the Netflix system that tries to anticipate how much a given subscriber will enjoy a particular movie.”

Prerequisites: Basic probability, matrices, and calculus.

Aaaron Sloman

Is Education Research a Form of Alchemy? – [Aaron Sloman]

Aaron Sloman is Honorary Professor of Artificial Intelligence and Cognitive Science at the University of Birmingham. Here he reflects on what is known about learning, and on the difficulties that exist in understanding it.

Is education research a form of alchemy?

Alchemists did masses of data collection, seeking correlations. In the process they learnt a great many useful facts – but lacked deep explanations. Searching for correlations can produce results of limited significance when studying processes with an underlying basis of mechanisms with astronomical generative power. But this correlation-seeking approach characterises much educational research.

Accelerated progress in chemistry came from developing a deep explanatory theory about the hidden structure of matter and the processes such structure could support (atoms, subatomic particles, valence, constraints on chemical reactions, etc.). Thus deep research requires (among other things) the ability to invent powerful explanatory mechanisms, often referring to unobservables.

My experience of researchers in education, psychology, social science and similar fields is that the vast majority of the ones I have encountered have had no experience of building, testing, and debugging, deep explanatory models of any working system. So their education does not equip them for a scientific study of education, a process that depends crucially on the operations of the most sophisticated information processing engines on the planet, many important features of which are still unknown.

<Read More>

Peter Norvig

Peter Norvig: The 100,000-student classroom

Via Seb Schmoller. Fortnightly Mailing.

Transcript of talk:

Everyone is both a learner and a teacher. This is me being inspired by my first tutor, my mom, and this is me teaching Introduction to Artificial Intelligence to 200 students at Stanford University.

Now the students and I enjoyed the class, but it occurred to me that while the subject matter of the class is advanced and modern, the teaching technology isn’t. In fact, I use basically the same technology as this 14th-century classroom. Note the textbook, the sage on the stage, and the sleeping guy in the back. Just like today.

So my co-teacher, Sebastian Thrun, and I thought, there must be a better way. We challenged ourselves to create an online class that would be equal or better in quality to our Stanford class, but to bring it to anyone in the world for free.

We announced the class on July 29th, and within two weeks, 50,000 people had signed up for it. And that grew to 160,000 students from 209 countries. We were thrilled to have that kind of audience, and just a bit terrified that we hadn’t finished preparing the class yet.

So we got to work. We studied what others had done, what we could copy and what we could change. Benjamin Bloom had showed that one-on-one tutoring works best, so that’s what we tried to emulate, like with me and my mom, even though we knew it would be one-on-thousands. Here, an overhead video camera is recording me as I’m talking and drawing on a piece of paper.

A student said, “This class felt like sitting in a bar with a really smart friend who’s explaining something you haven’t grasped, but are about to.” And that’s exactly what we were aiming for.

Now, from Khan Academy, we saw that short 10-minute videos worked much better than trying to record an hour-long lecture and put it on the small-format screen. We decided to go even shorter and more interactive. Our typical video is two minutes, sometimes shorter, never more than six, and then we pause for a quiz question, to make it feel like one-on-one tutoring. Here, I’m explaining how a computer uses the grammar of English to parse sentences, and here, there’s a pause and the student has to reflect, understand what’s going on and check the right boxes before they can continue.

Students learn best when they’re actively practicing. We wanted to engage them, to have them grapple with ambiguity and guide them to synthesize the key ideas themselves. We mostly avoid questions like, “Here’s a formula, now tell me the value of Y when X is equal to two.” We preferred open-ended questions.

One student wrote, “Now I’m seeing Bayes networks and examples of game theory everywhere I look.” And I like that kind of response. That’s just what we were going for. We didn’t want students to memorize the formulas; we wanted to change the way they looked at the world. And we succeeded. Or, I should say, the students succeeded.

And it’s a little bit ironic that we set about to disrupt traditional education, and in doing so, we ended up making our online class much more like a traditional college class than other online classes. Most online classes, the videos are always available. You can watch them any time you want. But if you can do it any time, that means you can do it tomorrow, and if you can do it tomorrow, well, you may not ever get around to it.

So we brought back the innovation of having due dates. You could watch the videos any time you wanted during the week, but at the end of the week, you had to get the homework done. This motivated the students to keep going, and it also meant that everybody was working on the same thing at the same time, so if you went into a discussion forum, you could get an answer from a peer within minutes. Now, I’ll show you some of the forums, most of which were self-organized by the students themselves.

From Daphne Koller and Andrew Ng, we learned the concept of “flipping” the classroom. Students watched the videos on their own, and then they come together to discuss them. From Eric Mazur, I learned about peer instruction, that peers can be the best teachers, because they’re the ones that remember what it’s like to not understand. Sebastian and I have forgotten some of that. Of course, we couldn’t have a classroom discussion with tens of thousands of students, so we encouraged and nurtured these online forums.

And finally, from Teach For America, I learned that a class is not primarily about information. More important is motivation and determination. It was crucial that the students see that we’re working hard for them and they’re all supporting each other.

Now, the class ran 10 weeks, and in the end, about half of the 160,000 students watched at least one video each week, and over 20,000 finished all the homework, putting in 50 to 100 hours. They got this statement of accomplishment.

So what have we learned? Well, we tried some old ideas and some new and put them together, but there are more ideas to try. Sebastian’s teaching another class now. I’ll do one in the fall. Stanford Coursera, Udacity, MITx and others have more classes coming. It’s a really exciting time.

But to me, the most exciting part of it is the data that we’re gathering. We’re gathering thousands of interactions per student per class, billions of interactions altogether, and now we can start analyzing that, and when we learn from that, do experimentations, that’s when the real revolution will come. And you’ll be able to see the results from a new generation of amazing students.