What is Python Pandas?
Python Pandas is an open-source data science library built on the python programming language that is useful in data analysis and manipulation.
How to install Python Pandas?
To install Python Pandas we will use a python package manager called ‘pip’.
Python Pandas allows its users to work with data frames easily and in the most efficient manner. DataFrames can be likened to an excel sheet that has rows and columns on data.
Data analysis using Python Pandas
Using Python Pandas we can easily import files that are in CSV format, run some codes on them or analyze the data and then export it out. There are several options for running code when working with data and the most common one is the use of Jupyter notebooks.
It is also possible to use the notebook within VS code, however, note that all this code works perfectly on a normal python script as well.
Using notebooks is common among data scientists since it is easier to run all the commands separately. We will start off by importing Pandas into the working space as an alias.
General industry standards recommend importing and using Python Pandas as an alias instead of typing the word pandas every time we need to use it. We can now import a CSV file into the workspace so that we can work on it.
The file that we are importing below has 100 rows of ‘fake’ or random data. Whenever importing a file it is good that the data frame is given a name in this case we will name it ‘df’.
However, when writing code in a production environment it should be given a proper name. We then do an equal sign before calling on Pandas as pd.read_csv(‘ ’).
This is going to read the data from the CSV file and store it into the data frame that we have named df. Now that we have got all the information stored in our data frame we can start extracting information about the dataset. To do that we use the code below:
<class ‘pandas.core.frame.DataFrame’>RangeIndex: 1000 entries, 0 to 999|
Data columns (total 6 columns):
# Column Non-Null Count Dtype
— —— ————– —–
0 id 1000 non-null int64
1 first_name 1000 non-null object
2 last_name 1000 non-null object
3 email 1000 non-null object
4 ip_address 1000 non-null object
5 app_version 1000 non-null float64
dtypes: float64(1), int64(1), object(4)
memory usage: 47.0+ KB
As shown in the output above this code gives us essential characteristics about the data that includes the number of columns, rows the data types present which are integer in this case and the app version that is a floating-point number.
We can also explore the structure of the data and get to know what the data actually looks like by using the code below.
Although it is optional we can pass the number of rows that we want to be returned but in the above case, we have not passed any number so the default number of rows returned will be five. This allows us to have a quick check on how the data looks like.
We can also check the bottom bit of the data and have a look at what it looks like.
Creating a Pivot Table
Creating a pivot table is a relatively simple task in excel. However, let’s say that we need to do this, every day or every week, we could write a script in python that would do it for us, input the file and export it with your pivot table.
Using a pivot table we intend to find out how 100 people are using different versions of this app. To create a pivot table in Pandas we need to create a new data frame in this case we will name it ‘pivot_df’ and then equate it to ‘pd.pivot_table’.
Now we can start to construct the information that we want to go into our pivot table. And the first thing we need to give it is our main data frame with all the information in it. The name df represents the data frame that we intend to use, and then a comma.
We now need to tell it the indexes that are going to be on the left-hand side of the pivot table. On the other hand, when using Excel we’re going to look up the data again.
So in this case the index is equal to a list because we could have multiple indexes. In addition, we’re going to say app_version because we want to know out of all the app versions how many people this particular version.
If we now run this code we will have a pivot table with an assumption that we want the id, however, this might not always be the case.
pivot_df = pd.pivot_table(df,index = [‘app_version’])|
The id column is, in this case, correct but in other scenarios, this might not be but it’s actually going and looks like it’s either the total or probably the total or the mean which is not what we want and neither is it of use to us.
Therefore we need to pass in more arguments, for example, we can say that we want to look up against the id.
pivot_df = pd.pivot_table(df,index = [‘app_version’], values = [‘id’])|
Again we’ve got the same data back because it had already made that assumption but we need the ‘id’ column there. Since we want to know the count of the individual id’s and not the sum or the mean or anything like that, we are going to use the ‘aggfunc’ function.
pivot_df = pd.pivot_table(df, index = [‘app_version’], values = [‘id’], aggfunc = [‘count’])|
In this case, we get whole numbers as the id’s of how many different ideas are on that version there. Using excel we will obtain a pivot table with the app_version and the ‘id’ under the count which is exactly what we have obtained in this case. This is a little bit of data analysis with python and we’ve got a pivot table showing how many users are using each version of the app
Suppose that we wanted to share this output with colleagues, then in such a case, we have to export the output again. Fortunately, Python Pandas has a really simple way of executing that. All we need to do is we need to call the data frame that we want to export in this case, this is the ‘pivot_df’ data frame and then we want to do pivot_df.to_csv() and then give it a final name which in this case we will name ‘results.csv’.
The file ‘results.csv’ will be created in the working directory that our code editor is set to.
If you’d like to see more programming tutorials, check out our Youtube channel, where we have plenty of Python video tutorials in English.
In our Programming Tutorials series, you’ll find useful materials which will help you improve your programming skills and speed up the learning process.
- How to create perfect HTML tables?
- HTML color codes
- CSS background images
- Best Programming Books You Must Read in 2021
- Python for loop
- Creating a table using HTML and CSS
- Best way of using Java Arrays and ArrayLists
- Best way of using Python Sets
- Best ways of using a Python Dictionary
- Best way of using Python Classes
- Best way of using Python Range
- Best way of using Python if-else
- Best way of using Python RegEx
- Best way of using Python Lists
- Best way of using Python Enumerate
- Best way of using Python Functions
- Best way of using Python Split
- Best way of using Python Try-Except
- Best way of using Python Tuple
- Best way of using Python Arrays
- Best way of using Python Sort
- Best way of using Python DateTime
- How to download Python?
- Best way of using Python FileWrite
- Best way of using Python Lambda
- Best way of using Python ListAppend
- Best way of using Python ListComprehension
- Best way of using Python Map
- Best way of using Python Operators
- Best way of using Python Pandas
Would you like to learn how to code, online? Come and try our first 25 lessons for free at the CodeBerry Programming School.