Data Visualization with pure Pandas
Data visualization is a critical aspect of data analysis, as it provides a way to understand the data and draw meaningful insights from it. Pandas, a popular Python library for data analysis, provides a convenient way to visualize data using various methods and libraries.
In this article, we will explore the basics of data visualization using Pandas, including creating simple plots, customizing plots, and plotting with multiple data frames. We will be using a sample dataset to demonstrate the visualization techniques in this article.
To start, we need to install Pandas.
!pip install pandas
Next, we will import the necessary libraries and load the sample dataset.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv")
The sample dataset contains information about the passengers on the Titanic, including their class, age, fare, and survival status.
Creating Simple Plots
Pandas provides a simple way to create basic plots, such as line plots, bar plots, and scatter plots, using the plot
method. Let's take a look at a few examples.
Line Plot
A line plot is used to represent the trends in data over a continuous interval or time period. To create a line plot, we use the plot
method on a pandas DataFrame and specify the kind
parameter as line
.
df.plot(kind="line", x="age", y="fare")
Histogram
df.age.hist(figsize=(7.3,4), grid=False)
Bar Plot
A bar plot is used to represent categorical data, where each bar represents a specific category. To create a bar plot, we use the plot
method on a pandas DataFrame and specify the kind
parameter as bar
.
df["embark_town"].value_counts().plot(kind="pie", rot=0)
df["class"].value_counts().plot(kind="bar")
Scatter Plot
A scatter plot is used to represent the relationship between two continuous variables. To create a scatter plot, we use the plot
method on a pandas DataFrame and specify the kind
parameter as scatter
.
df.plot(kind="scatter", x="age", y="fare")
Customizing Plots
We can customize the appearance of the plots by using different parameters, such as the color, size, and shape of the markers, the labels, and the title. Let’s take a look at an example.
df.plot(kind="scatter", x="age", y="fare", color="red", alpha=0.5)
plt.xlabel("Age")
plt.ylabel("Fare")
plt.title("Relationship between Age and Fare")
plt.show()
Plotting with Multiple Data Frames
We can plot multiple data frames on the same plot, which is useful for comparing the data between different groups. Let’s take a look at an example.
df_survived = df[df["survived"] == 1]
df_not_survived = df[df["survived"] == 0]
plt.scatter(df_survived["age"], df_survived["fare"], color="green", label="Survived")
plt.scatter(df_not_survived["age"], df_not_survived["fare"], color="red", label="Not Survived")
plt.xlabel("Age")
plt.ylabel("Fare")
plt.title("Relationship between Age and Fare")
plt.legend()
plt.show()
While Pandas provides a simple way to create plots, there are many more advanced visualization techniques and libraries available, such as Seaborn, Plotly, and Bokeh. These libraries provide more sophisticated and interactive visualizations, such as heat maps, violin plots, and 3D plots, which can further enhance your data analysis.
If you are interested in exploring more advanced data visualization techniques, I would highly recommend checking out these libraries and the documentation and tutorials available online.
In conclusion, data visualization is an important tool in data analysis and Pandas provides a convenient and easy-to-use method for creating visualizations. Whether you are just getting started with data analysis or are an experienced data scientist, Pandas can help you create meaningful and insightful visualizations of your data.
So, to wrap it up, we have covered a lot of ground in this article. We started with an introduction to the importance of data visualization in data analysis and then moved on to the basics of Pandas, including loading and exploring data. After that, we covered the basics of plotting with Pandas, including line plots, bar plots, histograms, and scatter plots. We also discussed how to customize and format your plots, as well as how to plot with multiple data frames.
By following along with the examples in this article, you should now have a solid foundation in data visualization using Pandas. I encourage you to keep practicing and experimenting with your own data to further develop your skills.
Thank you for reading and I hope you found this article helpful. Don’t forget to follow and clap at the end to show your support!