Plotting Time Series in Python: A Complete Guide

Introduction

Time series data is a type of data that is collected over time at regular intervals. It can be used to analyze trends, patterns, and behaviors over time. In order to effectively analyze time series data, it is important to visualize it in a way that is easy to understand. This is where plotting comes in.

Python has several libraries that can be used for plotting time series data, including Matplotlib, Seaborn, and Pandas. These libraries provide a variety of tools for creating different types of visualizations such as line graphs, scatter plots, histograms, and more.

In this guide, we will cover the basics of plotting time series data using Matplotlib and Pandas. We will start with a brief overview of these libraries and then move on to some examples of how to plot time series data using each one. By the end of this guide, you should have a solid understanding of how to create effective visualizations of time series data in Python.

What is a Time Series?

A time series is a sequence of data points that are indexed in time order. It is a dataset where each observation corresponds to a specific point in time. Time series can be found in various fields such as economics, finance, weather forecasting, and more.

Time series are different from other types of datasets because they exhibit temporal dependence, meaning that the value of an observation at any given time depends on its previous values. This property makes time series analysis unique and challenging, as it requires specific techniques and tools to analyze and interpret the data.

In Python, there are several libraries that provide powerful tools for working with time series data. One of the most popular libraries is pandas, which provides high-performance data manipulation and analysis tools for Python. Pandas has built-in support for handling time series data and provides many functions for resampling, shifting, rolling windows, and more.

To work with time series data in Python, you need to ensure that your data is properly formatted. The index of your DataFrame or Series should be a DateTimeIndex object or a PeriodIndex object if you are working with periods rather than timestamps. Once your data is properly formatted, you can start exploring and analyzing it using the various tools provided by pandas.


import pandas as pd

# create a DataFrame with a DatetimeIndex
data = {'sales': [100, 200, 150, 300],
        'date': ['2021-01-01', '2021-02-01', '2021-03-01', '2021-04-01']}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# print the DataFrame
print(df)

Output:

sales
date
2021-01-01 100
2021-02-01 200
2021-03-01 150
2021-04-01 300

In the example above, we created a DataFrame with a DateTimeIndex and set it as the index of the DataFrame. We also converted the ‘date’ column to a datetime format using the `pd.to_datetime` function. Now that our data is properly formatted, we can use pandas to analyze and plot our time series data.

Importing Time Series Data into Python

Time series data is a type of data that is collected over time and can be used to analyze trends and patterns. Python has many libraries that can be used for time series analysis, including Pandas and Matplotlib.

To import time series data into Python, we first need to have the data in a format that Python can read. Common file formats for time series data include CSV, Excel, and JSON.

Once we have our data in a compatible format, we can use Pandas to read it into a DataFrame. A DataFrame is a 2-dimensional table-like data structure that is used in Pandas to represent tabular data.

To read CSV files, we can use the `read_csv()` function from Pandas. For example:


import pandas as pd

df = pd.read_csv('my_time_series_data.csv')

If our data is in an Excel file, we can use the `read_excel()` function instead:


df = pd.read_excel('my_time_series_data.xlsx')

If our data is in a JSON file, we can use the `read_json()` function:


df = pd.read_json('my_time_series_data.json')

After reading our time series data into a DataFrame, we can then use Matplotlib to plot it and visualize any trends or patterns. We’ll cover how to do this in more detail later in this guide.

Cleaning and Preparing Time Series Data

Time series data is a sequence of observations that are recorded over time. This type of data is commonly used in finance, economics, and other fields to analyze trends and make predictions. However, before we can start analyzing time series data, we need to clean and prepare it.

The first step in cleaning time series data is to check for missing values. Missing values can occur due to various reasons such as equipment failure or human error. Missing values can be filled using interpolation techniques such as linear interpolation or forward filling.

The next step is to check for outliers. Outliers are extreme values that can skew the analysis results. Outliers can be detected using statistical methods such as the Z-score method or the Interquartile range (IQR) method. Once outliers are detected, they can be removed or replaced with more appropriate values.

After cleaning the data for missing values and outliers, we need to ensure that the data is in a format that can be analyzed using time series techniques. This includes converting the data into a datetime format, setting it as the index of the DataFrame, and resampling it if necessary.

Finally, we need to ensure that the data meets the assumptions of time series analysis, which include stationarity and normality. Stationarity means that the mean and variance of the data remain constant over time. Normality means that the distribution of the data is Gaussian.

In summary, cleaning and preparing time series data involves checking for missing values and outliers, formatting the data for analysis, and ensuring that it meets the assumptions of time series analysis. By taking these steps, we can ensure that our analysis results are accurate and reliable.


# Example code for filling missing values with forward fill
import pandas as pd

# create sample DataFrame with missing values
df = pd.DataFrame({'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
                   'value': [10, None, 20, None]})
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# fill missing values with forward fill
df.fillna(method='ffill', inplace=True)
print(df)

Visualizing Time Series Data

Time series data is a type of data that is recorded over time, such as stock prices, weather patterns, or website traffic. Visualizing time series data is a crucial step in understanding and analyzing it. In Python, there are several libraries available for plotting time series data, including Matplotlib, Seaborn, and Plotly.

Matplotlib is a popular library for creating static visualizations. It provides a wide range of customizable options for creating line plots, scatter plots, bar charts, and more. To plot a time series using Matplotlib, we can use the `plot` function and pass in the dates as the x-axis values and the corresponding values as the y-axis values.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Generate random data
dates = pd.date_range(start='2023-01-01', end='2023-06-30', freq='D')
values = np.random.rand(len(dates))

# Create a DataFrame from the generated data
data = pd.DataFrame({'value': values}, index=dates)

# Plot time series data
plt.plot(data.index, data['value'])

# Add axis labels and title
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Time Series Plot')

# Show plot
plt.show()

Plotting Multiple Time Series

When working with time series data, it is often useful to plot multiple time series on the same graph for comparison. In Python, we can achieve this using the Matplotlib library.

Let’s consider an example where we have three time series – monthly sales data for three different products. We want to plot all three time series on the same graph.

First, we need to import the necessary libraries:


import pandas as pd
import matplotlib.pyplot as plt

Next, we can read in our data and create a Pandas DataFrame:


data = {'month': ['2021-01-01', '2021-02-01', '2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01'],
        'product_1': [1000, 1200, 800, 1500, 2000, 1800],
        'product_2': [800, 900, 1000, 1100, 1200, 1300],
        'product_3': [500, 600, 700, 800, 900, 1000]}
df = pd.DataFrame(data)
df['month'] = pd.to_datetime(df['month'])
df.set_index('month', inplace=True)

We convert the ‘month’ column to a datetime format and set it as the index of the DataFrame.

Now we can plot all three time series on the same graph using Matplotlib:


plt.plot(df.index, df['product_1'], label='Product 1')
plt.plot(df.index, df['product_2'], label='Product 2')
plt.plot(df.index, df['product_3'], label='Product 3')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales by Product')
plt.legend()
plt.show()

We use the `plot()` function to plot each time series, specifying the x-axis as the index of the DataFrame and the y-axis as the column containing the data for each product. We also add labels for the x- and y-axes, a title for the graph, and a legend to identify each time series.

This will generate a graph with all three time series plotted on the same axis. We can easily compare the sales trends for each product over time.

Customizing Time Series Plots

After creating a basic time series plot, you may want to customize it to better suit your needs. Here are some ways to do that:

1. Changing the Figure Size
You can change the size of the figure by passing in the `figsize` parameter when creating the plot. For example, `plt.subplots(figsize=(10, 5))` will create a plot with a width of 10 inches and a height of 5 inches.

2. Adding Labels and Titles
To add a title to your plot, use the `plt.title()` method. You can also add labels to the x and y axis using `plt.xlabel()` and `plt.ylabel()`, respectively. For example:


plt.title('Sales over Time')
plt.xlabel('Date')
plt.ylabel('Sales (USD)')

3. Changing Colors and Line Styles
You can change the color and line style of your time series plot by passing in additional parameters to the `plot()` method. For example:


plt.plot(date, sales, color='green', linestyle='dashed')


This will create a dashed green line for your plot.

4. Adding Grid Lines
To add grid lines to your plot, use the `plt.grid()` method. For example:


plt.grid(True)


This will add grid lines to both the x and y axis.

5. Changing Tick Labels
You can customize tick labels on both the x and y axis using the `xticks()` and `yticks()` methods, respectively. For example:


plt.xticks(rotation=45)


This will rotate the x tick labels by 45 degrees.

By customizing your time series plots, you can make them more visually appealing and easier to understand for your audience.

Analyzing Time Series Data with Statistics

Time series data can be analyzed using a variety of statistical techniques to gain insights into trends, patterns, and relationships between variables. In Python, there are several libraries that provide useful tools for time series analysis, including NumPy, Pandas, and Matplotlib.

One common technique for analyzing time series data is to calculate summary statistics such as the mean, median, and standard deviation. This can give you a sense of the central tendency and variability of the data over time. For example, you might calculate the average daily temperature over a year to see how it changes from season to season.

Another technique is to look at the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the data. These functions can help identify any significant lags or trends in the data that may be useful for modeling. For example, if you are analyzing stock prices over time, you might use the ACF and PACF to identify any significant patterns or trends in the data that could be used to make predictions about future prices.

Finally, you can use regression analysis to model relationships between variables in a time series. This involves fitting a linear or nonlinear model to the data and using it to make predictions about future values. Regression analysis can be especially useful when you have multiple variables that may be influencing the outcome of interest.

Overall, there are many different statistical techniques that can be used to analyze time series data in Python. By using these tools effectively, you can gain valuable insights into trends and patterns over time that can help inform decision-making in a wide range of fields.

Conclusion

In conclusion, time series data is an important type of data that can provide valuable insights into various phenomena over time. Python provides a wide range of libraries and tools for analyzing and visualizing time series data.

In this guide, we covered the basics of time series data, including the different types of time series, common characteristics, and techniques for handling and cleaning time series data. We also discussed the importance of visualizing time series data and explored some popular libraries for creating time series plots in Python.

We started with an introduction to Matplotlib, which is a popular plotting library in Python. We learned how to create basic line plots and customize them with various features such as labels, titles, legends, and color schemes. We also explored some advanced plotting techniques such as subplots, gridspecs, and annotations.

Next, we looked at Seaborn, which is another powerful visualization library in Python that is built on top of Matplotlib. Seaborn provides several specialized functions for creating different types of time series plots such as heatmaps, clustermaps, regression plots, and distribution plots. We learned how to use these functions to create informative and aesthetically pleasing visualizations.

Finally, we discussed Plotly, which is a web-based visualization library that allows us to create interactive and dynamic plots in Python. Plotly provides several APIs for creating different types of time series plots such as line charts, scatter charts, bar charts, and area charts. We learned how to use these APIs to create interactive dashboards that enable us to explore our data in real-time.

Overall, Python provides a rich ecosystem for working with time series data and creating compelling visualizations. By mastering these tools and techniques, you can gain valuable insights into your data and communicate your findings effectively to others.
Interested in learning more? Check out our Introduction to Python course!


How to Become a Data Scientist PDF

Your FREE Guide to Become a Data Scientist

Discover the path to becoming a data scientist with our comprehensive FREE guide! Unlock your potential in this in-demand field and access valuable resources to kickstart your journey.

Don’t wait, download now and transform your career!


Pierian Training
Pierian Training
Pierian Training is a leading provider of high-quality technology training, with a focus on data science and cloud computing. Pierian Training offers live instructor-led training, self-paced online video courses, and private group and cohort training programs to support enterprises looking to upskill their employees.

You May Also Like

Data Science, Tutorials

Guide to NLTK – Natural Language Toolkit for Python

Introduction Natural Language Processing (NLP) lies at the heart of countless applications we use every day, from voice assistants to spam filters and machine translation. It allows machines to understand, interpret, and generate human language, bridging the gap between humans and computers. Within the vast landscape of NLP tools and techniques, the Natural Language Toolkit […]

Machine Learning, Tutorials

GridSearchCV with Scikit-Learn and Python

Introduction In the world of machine learning, finding the optimal set of hyperparameters for a model can significantly impact its performance and accuracy. However, searching through all possible combinations manually can be an incredibly time-consuming and error-prone process. This is where GridSearchCV, a powerful tool provided by Scikit-Learn library in Python, comes to the rescue. […]

Python Basics, Tutorials

A Beginner’s Guide to Scipy.ndimage

Introduction Scipy.ndimage is a package in the Scipy library that is used to perform image processing tasks. It provides functions to perform operations like filtering, interpolation, and morphological operations on images. In this guide, we will cover the basics of Scipy.ndimage and how to use it to manipulate images. What is Scipy.ndimage? Scipy.ndimage is a […]