Data Analysis Comes to Python: The Pandas Library

The giant panda is renowned throughout the world.  Not as well known but fast gaining in popularity is Pandas, a Python-based data library for data analysis.

Among programming languages Python has emerged as the language of choice for a broad array of applications in scientific computing. Python is easy to learn and easy to use, yet flexible enough to handle difficult computational tasks. It lends itself well to those who wish to focus their energy on solving domain-specifc problems instead of mastering the arcane twists and turns of programming and programming languages. Python’s greatest strength is the growing set of re-usable computational routines in the form of open source libraries such as SciPy and NumPy.

By comparison data analysis capabilities in Python has lagged behind.  Python has been very strong for data wrangling but, compared to domain-specific languages such as R, quite weak for data analysis and modeling. That’s starting to change with the Pandas library. Pandas release 0.10.1 came out in January 2013. The project’s founder Wes McKinney has also recently authored a fine book Python for Data Analysis through O’Reilly publishing. For Python + data analysis enthusiasts these are welcome developments.

I will write more about Python + Pandas for data analysis in future postings.

Share this post:

Recent Posts

Leave a Comment