Pandas Cookbook

This is the code repository for Pandas Cookbook, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

About the Book

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way.

The pandas library is massive, and it’s common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter.

Instructions and Navigation

All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

The code will look like the following:

>>> employee = pd.read_csv('data/employee')
>>> max_dept_salary = employee.groupby('DEPARTMENT')['BASE_SALARY'].max()

Pandas is a third-party package for the Python programming language and, as of the printing of this book, is on version 0.20. Currently, Python has two major supported releases, versions 2.7 and 3.6. Python 3 is the future, and it is now highly recommended that all scientific computing users of Python use it, as Python 2 will no longer be supported in 2020. All examples in this book have been run and tested with pandas 0.20 on Python 3.6.

In addition to pandas, you will need to have the matplotlib version 2.0 and seaborn version 0.8 visualization libraries installed. A major dependence for pandas is the NumPy library, which forms the basis of most of the popular Python scientific computing libraries.

There are a wide variety of ways in which you can install pandas and the rest of the libraries mentioned on your computer, but by far the simplest method is to install the Anaconda distribution. Created by Continuum Analytics, it packages together all the popular libraries for scientific computing in a single downloadable file available on Windows, Mac OSX, and Linux. Visit the download page to get the Anaconda distribution (https://www.anaconda.com/download).

In addition to all the scientific computing libraries, the Anaconda distribution comes with Jupyter Notebook, which is a browser-based program for developing in Python, among many other languages. All of the recipes for this book were developed inside of a Jupyter Notebook and all of the individual notebooks for each chapter will be available for you to use.

It is possible to install all the necessary libraries for this book without the use of the Anaconda distribution. For those that are interested, visit the pandas Installation page (http://pandas.pydata.org/pandas-docs/stable/install.html).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pandas Cookbook

About the Book

Instructions and Navigation

Related Products

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
Chapter 01 Pandas Foundations.ipynb		Chapter 01 Pandas Foundations.ipynb
Chapter 02 Essential DataFrame Operations.ipynb		Chapter 02 Essential DataFrame Operations.ipynb
Chapter 03 Beginning Data Analysis.ipynb		Chapter 03 Beginning Data Analysis.ipynb
Chapter 04 Selecting Subsets of Data.ipynb		Chapter 04 Selecting Subsets of Data.ipynb
Chapter 05 Boolean Indexing.ipynb		Chapter 05 Boolean Indexing.ipynb
Chapter 06 Index Alignment.ipynb		Chapter 06 Index Alignment.ipynb
Chapter 07 Grouping for Aggregation, Filtration and Transformation.ipynb		Chapter 07 Grouping for Aggregation, Filtration and Transformation.ipynb
Chapter 08 Restructuring Data into Tidy Form.ipynb		Chapter 08 Restructuring Data into Tidy Form.ipynb
Chapter 09 Combining Pandas Objects.ipynb		Chapter 09 Combining Pandas Objects.ipynb
Chapter 10 Time Series Analysis.ipynb		Chapter 10 Time Series Analysis.ipynb
Chapter 11 Visualization with Matplotlib, Pandas and Seaborn.ipynb		Chapter 11 Visualization with Matplotlib, Pandas and Seaborn.ipynb
LICENSE		LICENSE
README.md		README.md
dataset_descriptions.ipynb		dataset_descriptions.ipynb

License

Fork-Repo-List/Pandas-Cookbook

Folders and files

Latest commit

History

Repository files navigation

Pandas Cookbook

About the Book

Instructions and Navigation

Related Products

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages