Learn Pandas for Data Analysis Beginners

learn pandas for data analysis beginners in python - forever code
learn pandas for data analysis beginners in python - forever code

If you want to learn pandas for data analysis beginners, you are in the right place. Pandas is one of the most powerful Python libraries used for handling, cleaning, and transforming data easily. In this complete pandas tutorial in Python with examples, you will understand how to work with real datasets step by step.

In this pandas tutorial in Python, you will learn how to use pandas in Python to read files, create DataFrames, and explore datasets. After that, we will move toward data transformation using pandas, where you will learn how to modify DataFrame using pandas, filter data, and apply operations easily.

This end to end pandas tutorial in Python is designed especially for beginners who want practical knowledge with real examples. By the end of this guide, you will feel confident working with datasets and performing data analysis using pandas in Python.

Getting Started with Pandas for Data Analysis Beginners

Getting started with pandas for data analysis may feel confusing at first. However, once you understand the basics, you can quickly perform data cleaning using pandas, handle missing values, and prepare raw data for analysis. In fact, pandas makes data preprocessing simple and beginner friendly.

What is Pandas in Python?

Pandas is an open-source Python library used for data analysis and data manipulation. It helps you work with structured data easily.

  • In simple words, Pandas allows you to read, clean, transform, and analyze data with just a few lines of code.
  • Primarily, it is used for handling tabular data such as CSV files, Excel files, and databases.
  • The main data structure in Pandas is the DataFrame, which looks like a table with rows and columns. Therefore, it is very easy for beginners to understand.
  • Additionally, Pandas provides built-in functions for data cleaning using pandas, such as removing missing values, deleting duplicates, and fixing data types.
  • Moreover, it supports data preprocessing, which means you can prepare raw data before analysis.
  • For example, you can filter rows, sort values, group data, and modify DataFrame using pandas without writing complex logic.
  • As a result, if you want to learn pandas for data analysis beginners, understanding Pandas basics is the first and most important step.

Why Learn Pandas for Data Analysis Beginners?

  • First of all, Pandas is beginner friendly and easy to understand, especially if you are new to data analysis in Python.
  • Because it uses simple syntax, you can perform complex data manipulation tasks with very little code.
  • Most importantly, Pandas helps you work with real-world datasets such as CSV and Excel files. Therefore, it is very useful for practical projects.
  • Another important reason is, Pandas is widely used in companies for data analysis, reporting, and business intelligence. As a result, learning it improves your job opportunities.
  • Furthermore, it integrates well with other Python libraries like NumPy, Matplotlib, and Scikit-learn for complete data analysis projects.
  • Finally, if you want to learn pandas for data analysis beginners, mastering Pandas will give you a strong foundation for advanced data science and machine learning.

How to Install Pandas in Python Step by Step

  • Before, installing Pandas, make sure Python is already available on your system. Using the latest version improves performance and compatibility.
  • To begin the installation,open the command prompt or terminal based on your operating system..
  • For installing Pandas, use the following command:
				
					pip install pandas
				
			
  • At this stage, press the Enter key to start the installation process. Pandas downloads and installs automatically.
  • After the installation completes, the library becomes available in your Python environment.
  • If you are using Anaconda, the following command can be used instead:
				
					conda install pandas
				
			
To confirm the installation, open Python and type::
				
					import pandas as pd
				
			
  • When no error appears, the setup is successful and you can continue learning pandas for data analysis beginners.

How to Use Pandas in Python for the First Time

  • When starting for the first time,open your Python editor such as VS Code, Jupyter Notebook, or any IDE you prefer.
  • To work with data in Python, the Pandas library must be imported:
				
					import pandas as pd
				
			
  • For understanding how Pandas organizes data, create a simple DataFrame:
				
					data = {
    "Name": ["Rahul", "Amit", "Sneha"],
    "Age": [23, 25, 22],
    "City": ["Delhi", "Mumbai", "Pune"]
}

df = pd.DataFrame(data)
				
			
  • To view the data in tabular form, display the DataFrame:
				
					print(df)
				
			
  • At this point, notice how the data appears in rows and columns. This structure is known as a DataFrame in Pandas.
NameAgeCity
Rahul23Delhi
Amit25Mumbai
Sneha22Pune
  • As a basic operation, try selecting a single column:
				
					print(df["Name"])
				
			
  • Output looks like:
				
					0    Rahul
1     Amit
2    Sneha
Name: Name, dtype: object
				
			
  • you can perform simple data manipulation with Pandas, such as adding a new column:
				
					df["Salary"] = [30000, 35000, 28000]
				
			
  • With regular practice, these steps help build confidence while learning pandas for data analysis beginners.

Basic Pandas Tutorial in Python – How to Use Pandas in Python

In this basic pandas tutorial in python, you will understand how to use pandas in python step by step. Pandas is one of the most important libraries for working with structured data, especially for beginners who want to learn pandas for data analysis beginners. It provides simple functions that help you read files, create DataFrames, explore datasets, and perform data cleaning using pandas.

Creating a DataFrame in Pandas with Example

				
					import pandas as pd
				
			
				
					data = {
    "Name": ["Rahul", "Amit", "Sneha"],
    "Age": [23, 25, 22],
    "City": ["Delhi", "Mumbai", "Pune"]
}
				
			
				
					df = pd.DataFrame(data)
				
			
				
					print(df)
				
			
IndexNameAgeCitySalary
0Rahul23Delhi30000
1Amit25Mumbai35000
2Sneha22Pune28000

How to Read CSV and Excel Files Using Pandas

				
					import pandas as pd
				
			
				
					df = pd.read_csv("data.csv")

				
			
				
					print(df)
				
			
				
					df_excel = pd.read_excel("data.xlsx")
				
			
				
					print(df_excel)
				
			
				
					df_excel = pd.read_excel("data.xlsx", sheet_name="Sheet1")
				
			

If you want to learn more about automating Excel reports, you can read my detailed guide on automate excel file using pandas .

Exploring Data in a Pandas DataFrame

You can download the sample dataset used in this tutorial from here: Download the CSV file for practice and use it to follow along with this pandas tutorial in python.

Understanding head() in Pandas

				
					import pandas as pd
df = pd.read_csv("data.csv")
				
			
				
					df.head()
				
			
Actor Film Year Genre BoxOffice(INR Crore) IMDb
0 Shah Rukh Khan Pathaan 2023 Action 1050 7.2
1 Salman Khan Tiger Zinda Hai 2017 Action 565 6.0
2 Aamir Khan Dangal 2016 Biography 2024 8.4
3 Ranbir Kapoor Brahmastra 2022 Fantasy 431 5.6
4 Ranveer Singh Padmaavat 2018 Historical 585 7.0
				
					df.head(3)
				
			
Actor Film Year Genre BoxOffice(INR Crore) IMDb
0 Shah Rukh Khan Pathaan 2023 Action 1050 7.2
1 Salman Khan Tiger Zinda Hai 2017 Action 565 6.0
2 Aamir Khan Dangal 2016 Biography 2024 8.4

Using tail() to Inspect Data from the End

				
					df.tail()
				
			
Β ActorFilmYearGenreBoxOffice(INR Crore)IMDb
7Hrithik RoshanWar2019Action4756.5
8Akshay KumarGood Newwz2019Comedy3187.0
9Kartik AaryanBhool Bhulaiyaa 22022Horror Comedy2665.9
10Varun DhawanBadrinath Ki Dulhania2017Romantic Comedy2016.1
11Vicky KaushalUri: The Surgical Strike2019Action3428.2

Getting Dataset Overview with info()

				
					df.info()
				
			
				
					<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Actor                 12 non-null     object 
 1   Film                  12 non-null     object 
 2   Year                  12 non-null     int64  
 3   Genre                 12 non-null     object 
 4   BoxOffice(INR Crore)  12 non-null     int64  
 5   IMDb                  12 non-null     float64
dtypes: float64(1), int64(2), object(3)
memory usage: 708.0+ bytes
				
			
The info() function shows:
  • Number of rows
  • Column names
  • Data types of each column
  • Non-null (non-missing) values

How to Select Rows and Columns in Pandas DataFrame

				
					import pandas as pd

data = {
    "Name": ["Rahul", "Amit", "Sneha"],
    "Age": [23, 25, 22],
    "City": ["Delhi", "Mumbai", "Pune"]
}

df = pd.DataFrame(data)
				
			

Selecting Columns in Pandas

				
					df["Name"]
				
			
				
					0    Rahul
1     Amit
2    Sneha
Name: Name, dtype: object
				
			
				
					df[["Name", "City"]]
				
			
				
						 Name	City
0	Rahul	Delhi
1	Amit	Mumbai
2	Sneha	Pune

				
			

Selecting Rows in Pandas

				
					df.iloc[0]
				
			
				
					Name    Rahul
Age        23
City    Delhi
Name: 0, dtype: object

				
			
				
					df.iloc[0:3]
				
			
Name Age City
0 Rahul 23 Delhi
1 Amit 25 Mumbai
2 Sneha 22 Pune
				
					df.loc[0]
				
			
				
					Name    Rahul
Age        23
City    Delhi
Name: 0, dtype: object

				
			
				
					df.loc[0:2, ["Name", "City"]]
				
			
Name City
0 Rahul Delhi
1 Amit Mumbai
2 Sneha Pune

Data Cleaning Using Pandas – How to Preprocess Data in Pandas

Before performing any analysis, raw data must be prepared properly. Data cleaning using pandas helps you remove errors, handle missing values, and correct inconsistent information inside a dataset. In fact, understanding how to preprocess data in pandas is one of the most important skills when you learn pandas for data analysis beginners.

Moreover, clean data improves accuracy and makes data transformation using pandas much easier. With simple functions, you can detect null values, remove duplicates, fix column names, and modify DataFrame using pandas without writing complex code. Therefore, this section will guide you step by step through practical techniques that help you prepare real-world datasets for analysis. By the end, you will confidently handle messy data in any pandas tutorial in python project. πŸš€

Handling Missing Values in Pandas DataFrame

Missing values are common in real-world datasets. Therefore, handling missing values correctly is an important step in data cleaning using pandas and how to preprocess data in pandas.

Create Example Data with Missing Values

				
					import pandas as pd


data = {
    "Name": ["Rahul", "Amit", "Sneha", "Karan"],
    "Age": [23, None, 22, 24],
    "Salary": [30000, 35000, np.nan, 28000]
}

df = pd.DataFrame(data)
print(df)
				
			
Index Name Age Salary
0 Rahul 23 30000
1 Amit NaN 35000
2 Sneha 22 NaN
3 Karan 24 28000

Detect Missing Values

				
					df.isnull().sum()
				
			
				
					Name      0
Age       1
Salary    1
dtype: int64
				
			

Remove Missing Values

				
					df_clean = df.dropna()
print(df_clean)
				
			
Index Name Age Salary
0 Rahul 23 30000
3 Karan 24 28000

Fill Missing Values

				
					df["Age"].fillna(df["Age"].mean(), inplace=True)
				
			
Name Age Salary
0 Rahul 23.0 30000.0
1 Amit 23.0 35000.0
2 Sneha 22.0 NaN
3 Karan 24.0 28000.0

Handling missing values is a crucial skill when you learn pandas for data analysis beginners. Once you manage missing data properly, further analysis becomes much easier and more accurate.

How to Remove Duplicate Data Using Pandas

Duplicate data often appears in real-world datasets. Therefore, removing duplicates is an important step in data cleaning using pandas and proper data preprocessing.

Create Example Data with Duplicates

				
					import pandas as pd

data = {
    "Name": ["Rahul", "Amit", "Sneha", "Rahul"],
    "Age": [23, 25, 22, 23],
    "City": ["Delhi", "Mumbai", "Pune", "Delhi"]
}

df = pd.DataFrame(data)
print(df)
				
			
IndexNameAgeCity
0Rahul23Delhi
1Amit25Mumbai
2Sneha22Pune
3Rahul23Delhi

Detect Duplicate Rows

				
					print(df.duplicated())
				
			
				
					0    False
1    False
2    False
3     True
dtype: bool
				
			
				
					df.duplicated().sum()
				
			

Remove Duplicate Rows

				
					df_clean = df.drop_duplicates()
print(df_clean)
				
			
Index Name Age City
0 Rahul 23 Delhi
1 Amit 25 Mumbai
2 Sneha 22 Pune

Remove Duplicates Based on Specific Columns

				
					print(df.drop_duplicates(subset=["Name"]))
				
			
Name Age City
0 Rahul 23 Delhi
1 Amit 25 Mumbai
2 Sneha 22 Pune
Removing duplicate data is an essential step when you learn pandas for data analysis beginners. Clean and unique data always produces better and more reliable results.

Renaming Columns and Fixing Data Types in Pandas

Clear column names and correct data types make analysis easier. Therefore, renaming columns and fixing data types is an important step in data cleaning using pandas and proper data preprocessing.

Create Example Data

				
					import pandas as pd

data = {
    "emp_name": ["Rahul", "Amit", "Sneha"],
    "emp_age": ["23", "25", "22"],   # Age stored as string
    "emp_salary": ["30000", "35000", "28000"]  # Salary stored as string
}

df = pd.DataFrame(data)
print(df)

				
			
emp_name emp_age emp_salary
0 Rahul 23 30000
1 Amit 25 35000
2 Sneha 22 28000

Rename Columns

				
					df.rename(columns={
    "emp_name": "Name",
    "emp_age": "Age",
    "emp_salary": "Salary"
}, inplace=True)

print(df)
				
			
Name Age Salary
0 Rahul 23 30000
1 Amit 25 35000
2 Sneha 22 28000

Fix Data Types

				
					df["Age"] = df["Age"].astype(int)
df["Salary"] = df["Salary"].astype(float)

print(df.dtypes)
				
			
				
					Name       object
Age         int64
Salary    float64
dtype: object
				
			
Renaming columns and fixing data types ensures that your dataset is ready for deeper analysis and advanced operations in any pandas tutorial in python.

Data Transformation Using Pandas to Modify DataFrame

After cleaning the dataset, the next important step is data transformation using pandas. In this stage, you reshape, filter, and update your data so it becomes more useful for analysis. While data cleaning using pandas focuses on fixing errors, transformation helps you organize and modify DataFrame using pandas according to your analysis needs.

Moreover, proper transformation makes reports clearer and improves decision-making. For example, you can create new columns, group data, apply calculations, or sort values easily. Therefore, understanding how to transform data is essential when you learn pandas for data analysis beginners.

How to Modify DataFrame Using Pandas

Modifying a DataFrame allows you to update, add, or change data according to your analysis needs. In data transformation using pandas, these operations help you prepare datasets for deeper insights.

Create Example Data

				
					import pandas as pd

data = {
    "Name": ["Rahul", "Amit", "Sneha"],
    "Age": [23, 25, 22],
    "Salary": [30000, 35000, 28000]
}

df = pd.DataFrame(data)
print(df)
				
			
Name Age Salary
0 Rahul 23 30000
1 Amit 25 35000
2 Sneha 22 28000

Add a New Column

				
					df["Bonus"] = [2000, 2500, 1800]
print(df)
				
			
Name Age Salary Bonus
0 Rahul 23 30000 2000
1 Amit 25 35000 2500
2 Sneha 22 28000 1800

Update Existing Values

				
					df.loc[0, "Salary"] = 32000
print(df)
				
			
Β NameAgeSalaryBonus
0Rahul23320002000
1Amit25350002500
2Sneha22280001800

Apply a Calculation to a Column

				
					df["Salary"] = df["Salary"] + 1000
print(df)
				
			
Name Age Salary Bonus
0 Rahul 23 33000 2000
1 Amit 25 36000 2500
2 Sneha 22 29000 1800

Remove Columns

				
					df.drop("Bonus", axis=1, inplace=True)
print(df)
				
			
Name Age Salary
0 Rahul 23 33000
1 Amit 25 36000
2 Sneha 22 29000
As a result, understanding how to modify DataFrame using pandas is a core skill in any pandas tutorial in python. Mastering these techniques makes data transformation faster and more efficient. πŸš€

Filtering and Sorting Data in Pandas

Filtering and sorting help you organize data in a meaningful way. In data transformation using pandas, these operations allow you to focus only on relevant records and arrange them properly for analysis.

Create Example Data

				
					import pandas as pd

data = {
    "Name": ["Rahul", "Amit", "Sneha", "Karan"],
    "Age": [23, 25, 22, 24],
    "Salary": [30000, 35000, 28000, 40000]
}

df = pd.DataFrame(data)
print(df)
				
			
Name Age Salary
0 Rahul 23 30000
1 Amit 25 35000
2 Sneha 22 28000
3 Karan 24 40000

Filtering Data in Pandas

Filtering allows you to select rows based on conditions.


Filter Rows Based on a Condition:

				
					print(df[df["Salary"] > 30000])
				
			

Apply Multiple Conditions

				
					print(df[(df["Age"] > 22) & (df["Salary"] > 30000)])
				
			

Sorting Data in Pandas

				
					print(df.sort_values("Salary"))
				
			
				
					print(df.sort_values("Salary", ascending=False))
				
			
				
					print(df.sort_values(["Age", "Salary"]))
				
			

GroupBy Operations in Pandas with Simple Example

GroupBy operations help you summarize and analyze data based on categories. In data transformation using pandas, this method allows you to combine similar records and perform calculations easily.

Create Example Data

				
					import pandas as pd

data = {
    "Product": ["Laptop", "Mobile", "Laptop", "Tablet", "Mobile", "Tablet"],
    "Region": ["North", "South", "East", "West", "North", "South"],
    "Sales": [50000, 30000, 45000, 20000, 35000, 25000]
}

df = pd.DataFrame(data)
print(df)
				
			
Product Region Sales
0 Laptop North 50000
1 Mobile South 30000
2 Laptop East 45000
3 Tablet West 20000
4 Mobile North 35000
5 Tablet South 25000

Group by One Column

				
					print(df.groupby("Product")["Sales"].sum())
				
			
				
					Product
Laptop    95000
Mobile    65000
Tablet    45000
Name: Sales, dtype: int64
				
			

Group by Another Column

				
					print(df.groupby("Region")["Sales"].sum())
				
			
				
					Region
East     45000
North    85000
South    55000
West     20000
Name: Sales, dtype: int64
				
			

Apply Multiple Calculations

				
					print(df.groupby("Product")["Sales"].agg(["sum", "mean", "max"]))
				
			
Product sum mean max
Laptop 95000 47500.0 50000
Mobile 65000 32500.0 35000
Tablet 45000 22500.0 25000

Group by Multiple Columns

				
					print(df.groupby(["Product", "Region"],as_index=False)["Sales"].sum())
				
			
Product Region Sales
0 Laptop East 45000
1 Laptop North 50000
2 Mobile North 35000
3 Mobile South 30000
4 Tablet South 25000
5 Tablet West 20000
Therefore, understanding GroupBy operations is essential in any pandas tutorial in python. Once mastered, analyzing business data becomes much easier and more efficient. πŸš€

❓Frequently Asked Questions (FAQ)

What is Pandas in Python used for?

Pandas is a powerful Python library that helps you analyze and manipulate data. You can use it to work with tables, CSV files, and Excel files easily.

Why should beginners learn Pandas for data analysis?

Beginners should learn Pandas because it simplifies data analysis tasks. You can clean data, transform data, and analyze datasets with simple code.

What is a DataFrame in Pandas?

A DataFrame is a table with rows and columns. You use it to store and analyze structured data.

How do I perform data cleaning using Pandas?

You can clean data by removing missing values, deleting duplicates, and fixing column names. Pandas provides built-in functions like dropna() and drop_duplicates().

Can I use Pandas for Excel files?

Yes, you can use Pandas to read and write Excel files. You can use read_excel() to load data and to_excel() to export data.

What is data transformation using Pandas?

Data transformation using pandas includes modifying columns, filtering data, grouping data, sorting values, and creating new calculated columns. It helps convert raw data into meaningful information.

Is Pandas enough for complete data analysis?

Pandas handles most data analysis tasks. However, you can combine it with libraries like Matplotlib and NumPy for visualization and advanced analysis.

Other Related Posts

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top