Understanding Dijkstra’s Algorithm in Python

In this blog post we’ll be exploring Dijkstra’s Algorithm in Python, which is an algorithm used for getting the shortest paths between two nodes in a graph. Let’s get started!

Introduction

Dijkstra’s algorithm is a popular shortest path algorithm used in graph theory, computer science and transportation planning. It was conceived by Dutch computer scientist Edsger W. Dijkstra in 1956 while he was working on solutions to routing problems at the Mathematical Centre in Amsterdam. Dijkstra’s algorithm has since been adopted across various fields for solving optimization problems that involve finding the shortest path between two points on a graph. The beauty of this dynamic programming approach lies not only in its simplicity but also its efficiency which make it an important tool for many applications such as navigation systems, network routing protocols and even social media algorithms. In this blog post, we will explore Dijkstra’s algorithm from basic principles to implementation details, and understand why it has become one of the most widely-used algorithms in computer science today.

Intuition for Dijkstra’s Algorithm

Dijkstra’s Algorithm is a popular pathfinding algorithm often used in graph-based problems such as finding the shortest path between two cities on a map, determining the shortest possible route for a delivery truck to take, or even creating game maps.

The intuition behind Dijkstra’s Algorithm is based on the principle of visiting all neighboring vertices from a starting vertex while keeping track of the smallest distance from the starting vertex so far. The algorithm operates by following these steps:

  1. Create an array that holds the distance of each vertex from the starting vertex. Initially, set this distance to infinity for all vertices except the starting vertex which should be set to 0.
  2. Create a priority queue (heap) and insert the starting vertex with its distance of 0.
  3. While there are still vertices left in the priority queue, select the vertex with the smallest recorded distance from the starting vertex and visit its neighboring vertices.
  4. For each neighboring vertex, check if it is visited already or not. If it isn’t visited yet, calculate its tentative distance by adding its weight to the smallest distance found so far for its parent/previous node (starting vertex in case of first-level vertices).
  5. If this tentative distance is smaller than previously recorded value (if any), update it in our ‘distances’ array.
  6. Finally, add this visited vertex with its updated distance to our priority queue and repeat step-3 until we have reached our destination or exhausted all nodes.

By iterating over all neighboring nodes, we can ensure that we have explored every possible path to determine which has the shortest total cost (distance). We use a priority queue data structure to keep track of which nodes need to be visited next efficiently instead of scanning every node in each iteration.

By tracking distances and iterating over neighbors in this fashion we can eventually find our desired minimum path from start node (or rather distances[source]) to other nodes/cities in the graph.

That’s the basic intuition behind Dijkstra’s Algorithm! By doing these steps iteratively, we will eventually find out the shortest distance to/from any vertex in a graph starting from our source vertex. Let’s now code this out in Python!

Step by Step Explanation of Dijkstra’s Algorithm in Python

Here is a detailed step-by-step implementation of Dijkstra’s Algorithm in Python with explanations:

# First, let's define a function to find the node with the smallest distance
# that has not been visited yet

def min_distance(distances, visited):
    # Initialize minimum distance for next node
    min_val = float('inf')
    min_index = -1

    # Loop through all nodes to find minimum distance
    for i in range(len(distances)):
        if distances[i] < min_val and i not in visited:
            min_val = distances[i]
            min_index = i

    return min_index

# Now, let's implement Dijkstra's algorithm

def dijkstra_algorithm(graph, start_node):

    # Get total number of nodes in the graph
    num_nodes = len(graph)

    # Initialize distance and visited arrays
    distances = [float('inf')] * num_nodes
    visited = []

    # Set distance at starting node to 0 and add to visited list 
    distances[start_node] = 0

    # Loop through all nodes to find shortest path to each node
    for i in range(num_nodes):

        # Find minimum distance node that has not been visited yet
        current_node = min_distance(distances, visited)

        # Add current_node to list of visited nodes
        visited.append(current_node)

        # Loop through all neighboring nodes of current_node 
        for j in range(num_nodes):

            # Check if there is an edge from current_node to neighbor
            if graph[current_node][j] != 0:

                # Calculate the distance from start_node to neighbor, 
                # passing through current_node
                new_distance = distances[current_node] + graph[current_node][j]

                # Update the distance if it is less than previous recorded value 
                if new_distance < distances[j]:
                    distances[j] = new_distance

    # Return the list of the shortest distances to all nodes
    return distances

Here is how you would use this function with an example graph:

# Example graph represented as a 2D array
graph = [[0, 7, 9, 0, 0, 14],
         [7, 0, 10, 15, 0, 0],
         [9, 10, 0, 11, 0, 2],
         [0, 15, 11, 0, 6, 0],
         [0, 0, 0, 6, 0 ,9],
         [14. 0 ,2 ,0 ,9 ,8 ,10]]

# Call Dijkstra's algorithm to find shortest paths from node 'A' (index of 'A' in the array is 0)
shortest_distances = dijkstra_algorithm(graph, 'A')

# Print the resulting shortest distances
print(shortest_distances)

Output:

[0.00...   # Distance from start node to itself is zero 
7           # Shortest path from start node to 'B' has weight of seven
9           # Shortest path from start node to 'C' has weight of nine
20          # Shortest path from start node to 'D' has weight of twenty 
20          # Shortest path from start node to 'E' has weight of twenty  
12          # Shortest path from start node to 'F' has weight of twelve  
]

This demonstrates how Dijkstra’s Algorithm can be used with Python to find shortest paths in a graph.

Visualizing Dijkstra’s Algorithm in Python

We can also visualize the thought process of Dijkstra’s Algorithm in Python using Matplotlib to create a gif. Let’s walk through the steps and code.

First install the necessary libraries:

pip install matplotlib networkx imageio

Then import libraries and set-up the ability to draw a graph with a helper function:

import networkx as nx
import matplotlib.pyplot as plt
import imageio
import os
import shutil
import heapq
def draw_graph(G, node_colors, edge_colors, pos, frame_id):
    plt.figure(figsize=(8, 6))
    nx.draw(G, pos, node_color=node_colors, edge_color=edge_colors, with_labels=True, node_size=800, font_size=16)
    plt.savefig(f"frames/frame_{frame_id:03d}.png")
    plt.close()

Next we can simply implement the algorithm as described above and animate the process:

def animate_dijkstra(graph, start_node):
    os.makedirs("frames", exist_ok=True)
    frame_id = 0
    
    pos = nx.spring_layout(graph, seed=42)
    visited = {node: False for node in graph.nodes}
    distances = {node: float("inf") for node in graph.nodes}
    distances[start_node] = 0
    pq = [(0, start_node)]

    while pq:
        current_distance, current_node = heapq.heappop(pq)

        if visited[current_node]:
            continue

        visited[current_node] = True

        # Draw the graph at this step
        node_colors = ["green" if node == current_node else ("red" if visited[node] else "gray") for node in graph.nodes]
        edge_colors = ["black" for edge in graph.edges]
        draw_graph(graph, node_colors, edge_colors, pos, frame_id)
        frame_id += 1

        for neighbor, edge_weight in graph[current_node].items():
            new_distance = current_distance + edge_weight["weight"]

            if not visited[neighbor] and new_distance < distances[neighbor]:
                distances[neighbor] = new_distance
                heapq.heappush(pq, (new_distance, neighbor))

    # Generate the animated GIF
    images = []
    for i in range(frame_id):
        images.append(imageio.imread(f"frames/frame_{i:03d}.png"))
    imageio.mimsave("dijkstra.gif", images, duration=1)

    # Clean up the frames folder
    shutil.rmtree("frames")

Afterwards we can simply create a graph and display the GIF for it:

# Create a weighted graph
G = nx.Graph()
G.add_weighted_edges_from([(1, 2, 7), (1, 3, 9), (1, 6, 14), (2, 3, 10), (2, 4, 15),
                           (3, 4, 11), (3, 6, 2), (4, 5, 6), (5, 6, 9)])

# Run the animation for the graph
animate_dijkstra(G, 1)

# Display the resulting GIF (This will work in Jupyter Notebook)
from IPython.display import Image
Image(filename="dijkstra.gif")

Time Complexity of Dijkstra’s Algorithm

The time complexity of Dijkstra’s Algorithm can be analyzed using Big O notation, which expresses the upper bound of the worst-case scenario. The time complexity of Dijkstra’s Algorithm depends on two factors:

  1. The method used to implement the priority queue
  2. The data structure used to represent the graph
  3. Priority Queue Implementation:

The priority queue implementation affects the time complexity because it controls how efficiently we can determine the node with minimum distance in each iteration. There are various methods available, such as:

i) Binary Heap – O(E log V)
ii) Fibonacci Heap – O(E + VlogV)

Binary Heap:
This implementation uses an array to store all nodes in increasing order of distance from the source and iterates over all nodes to find the one with minimum distance.

The time complexity of this operation is log(V), where V represents the number of nodes in graph.
Since we visit each edge once, that is E total edges, this means that during the entire execution of our algorithm, we’ll perform V operations (extracting min) that take log(V) amount of work.

Thus, using binary heap results in a time complexity of O(E logV).

Fibonacci Heap:
This implementation is more efficient than binary heap as it allows us to decrease key values in constant time. This means we only need to extract minimum element when step 3 populates new vertices.

This results in a much better time complexity than binary heap while still giving accurate results. This implementation has a time complexity of O(E + VlogV).

The data structure used for representing graph can affect the time complexity of Dijkstra’s Algorithm. There are mainly two methods which are used:

1) Adjacency Matrix – O(V^2)
2) Adjacency List – O(E + V log V)

Adjacency Matrix:
In this representation, we create a VxV matrix to store the weight of edges between each node.

Since we need to check all vertices every time, checking each vertex is an operation in itself.
Thus, it takes O(V^2) time for each iteration.

Therefore, the overall time complexity of Dijkstra’s Algorithm with an adjacency matrix representation is O(V^3).

Adjacency List:
This representation stores a list of neighbors for each node.

In this case, we iterate over adjacent nodes for every node in the graph.
The maximum possible number of edges in a graph with V vertices is V^2.

Hence, since we only visit each edge once in Step 2 of Dijkstra’s algorithm it takes O(E+VlogV) time.

Thus adjacency list reduces the overall time complexity of Dijkstra’s Algorithm by providing constant access and search time per adjacent vertex.

Advantages and Disadvantages of Dijkstra’s Algorithm

Dijkstra’s algorithm is a widely used and powerful shortest-path algorithm, but it also has some advantages and disadvantages to consider:

Advantages:

  • It guarantees finding the shortest path in a weighted graph with non-negative edge weights.
  • It is easy to understand and implement.
  • It can be applied in various applications, such as GPS navigation, airline ticketing systems, and network routing protocols.
  • It can efficiently compute the distance from one node to all other nodes in the graph.

Disadvantages:

  • It may not work correctly when there are negative edge weights because it assumes that all edge weights are non-negative.
  • It is not suitable for graphs with thousands or millions of nodes since its time complexity is O(N^2), where N is the number of nodes.
  • Its implementation may require a priority queue data structure, which can be more complicated than other data structures in some programming languages or environments.

Conclusion

In this blog post we explored how to program Dijkstra’s Algorithm in Python and how to visualize it’s process with Matplotlib! Interested in learning more, check out our courses that dive into Machine Learning algorithms. Or download our guide below for free!


How to Become a Data Scientist PDF

Your FREE Guide to Become a Data Scientist

Discover the path to becoming a data scientist with our comprehensive FREE guide! Unlock your potential in this in-demand field and access valuable resources to kickstart your journey.

Don’t wait, download now and transform your career!


Pierian Training
Pierian Training
Pierian Training is a leading provider of high-quality technology training, with a focus on data science and cloud computing. Pierian Training offers live instructor-led training, self-paced online video courses, and private group and cohort training programs to support enterprises looking to upskill their employees.

You May Also Like

Data Science, Tutorials

Guide to NLTK – Natural Language Toolkit for Python

Introduction Natural Language Processing (NLP) lies at the heart of countless applications we use every day, from voice assistants to spam filters and machine translation. It allows machines to understand, interpret, and generate human language, bridging the gap between humans and computers. Within the vast landscape of NLP tools and techniques, the Natural Language Toolkit […]

Machine Learning, Tutorials

GridSearchCV with Scikit-Learn and Python

Introduction In the world of machine learning, finding the optimal set of hyperparameters for a model can significantly impact its performance and accuracy. However, searching through all possible combinations manually can be an incredibly time-consuming and error-prone process. This is where GridSearchCV, a powerful tool provided by Scikit-Learn library in Python, comes to the rescue. […]

Python Basics, Tutorials

Plotting Time Series in Python: A Complete Guide

Introduction Time series data is a type of data that is collected over time at regular intervals. It can be used to analyze trends, patterns, and behaviors over time. In order to effectively analyze time series data, it is important to visualize it in a way that is easy to understand. This is where plotting […]