Pandas is most frequent used library used in python for data science . Pandas has literally made a great contribution to python in making it the most favourable programming for data science and data analytics .Today we will explore in great depth that the library has to offer and why it is used so extensively .
If you are following our series of blog post we are using jupyter notebook for our data analysis .
Pandas is apre installed package in anaconda .
Pandas is built on top of numpy and use some of its functions under the hood .
Essential of Pandas :
- Very high Performance, open source library for Data Analysis
- Creates tabular format of data from different sources like csv, json, database, excel .
- Have utilities for descriptive statistics, aggregation, handling missing data , duplicate data ,basic of plotting can also be handled .
- Summarizing data by classification variable
- Time series analysis
- Merging and concatenating two datasets
Understanding of Series and DataFrame :
Series :Series specifies a column :
Dataframe is a two dimensional data structure , data is aligned in a tabular structure with rows and columns .
Features of dataframe :
- Allows indexing
- Exporting and merging of tables
- Mathematical operations is possible
Creating a dataframe with series :
Creating a Data Frame with Random Variable :
Lets Explore more of data frame using some publicly available dataset :
Loading of csv file :
Getting info of column in the dataset :
Descriptive Analysis with pandas :
As we have already imported the dataset into the data frameincome_data :
Try it Yourself:
income.head(7) #shows first 7 rows.
income.tail(7) #shows last 7 rows
To Check the Columns in the dataset :
Selecting a few column :
**Difference between loc and iloc :
loc considers rows (or columns) with particular labels from the index. Whereas ilocconsiders rows (or columns) at particular positions in the index so it only takes integers.
Checking null value count in columns of income_data :
Droping missing value :
Replacing the name of the column for proper understanding:
Dropping columns in from data frame :
As you can see the Index column is missing .
Axis = 0 to slice rows
Axis = 1 to slice columns
Dropping multiple column:
Selecting Columns with categorical values :
Showing data of particular columns with slicing operator :
Filtering Rows based on Condition:
Handling Duplicate data in dataframe :
Appending And Merging Of Data Frame :
Appending the data frames
Merging of Dataframes :
With this we have tried to do justice for your time reference(https://www.listendata.com/2017/12/python-pandas-tutorial.html), in the next blog will see in great details of explanatory data analysis and descriptive analysis of data .
Till then Happy Learning and Sharing !!