Formula One (F1) stands as the pinnacle of motorsport, epitomizing the fusion of cutting-edge automotive technology, driver prowess, and strategic team management. Since its inception in 1950, the sport has undergone profound transformations influenced by technological advancements, regulatory changes, and evolving competitive dynamics. In recent years, significant developments such as the introduction of hybrid power units in 2014, the implementation of budget caps in 2021 to level the competitive field, and adaptations due to global events like the COVID-19 pandemic have further reshaped the landscape of F1.
The primary objective of this project is to analyze and visualize the evolution of key performance metrics in Formula One (F1) over the period from 1950 to 2023. By examining metrics such as mean career win rates for drivers and points per race for constructors, the project aims to uncover trends, assess the impact of regulatory changes, and highlight significant shifts in team and driver performances. These visualizations will provide valuable insights for fans, analysts, and teams to understand the dynamics that have shaped the sport over the decades.
I first wanted to be able to compare and evaluate the performance of F1 constructors across several years and seasons. At first, I chose mean points per race per season as the key performance indicator for this task. I made this choice to account for the different number of races different seasons could have. I initially used a multi-line graph for the task of identifying constructors that outperformed both their competitors and their own past performances.
However, I ran into a few major problems. While the line graph was somewhat effective in identifying the best performers, this comparison was not always fair. F1 has modified its system for awarding points several times over its history, so comparing the performance of teams in these different eras is a challenge. The graph was also prone to feeling cluttered with too much data being displayed at once. Moreover, identifying the worst performing teams was essentially impossible as there far too many teams that had scored 0 mean points per race over several seasons.
Next, I wanted to examine the nationalities and eras of F1 drivers were the most successful. I initially chose total career race wins as the success metric here. I started by creating two bar graphs, one that displayed the nationalities with the most career race wins and another that displayed the driver cohorts, denoted by the decade a driver was born in, with the most career race wins. I had to settle for using birth decade here instead of debut decade (when they joined F1) because the dataset I used did not have this information present.
I quickly noticed that the most successful countries were usually the ones with the highest number of total combined race starts of their drivers. Therefore, I decided to use mean win rate instead to account for this. This is a country's total wins of a country divided by that country's total number of race starts. While multiple drivers from the same country can take part in a race, only one can win it. Nevertheless, the number of race starts essentially represents the number of opportunities a country has to win a race. I also did the same thing with the graph for birth decade. Dividing by the number of drivers would not have made sense as different drivers take part in different numbers of races. I also added a filter to remove drivers with less than the selected number of race starts from the calculations. This gives users the option to decrease the amount of noise and outliers in the data from drivers who may have only participated in a few or just a single race in their entire F1 career.
The next thing I wanted to evaluate was the budget cap that was first instituted in 2021. While originally planned to start at $175 million in 2021, the economic impact of COVID-19 reduced this figure to $145 million for that year. I wanted to see if the budget cap had been successful in its goal of reducing the gap between the top and bottom teams in F1.
For this task, I chose to use the distribution of mean points per race as the metric for comparison. Two periods would be compared, 2018-2020 and 2021-2023. These three-year periods were chosen because the cost cap has only been in effect recently since 2021, so I chose to compare the combined three-year points distribution before and after the implementation of the budget cap. The choice of mean points per race here is unaffected by any points system changes as there haven't been any of those in this time period except for the singular point for the fastest lap in a race introduced in 2019 that shouldn't skew results by a noticeable margin.
An issue that I immediately noticed was that by choosing to aggregrate the results for all teams during these two periods, the team-specific changes from this cost cap. Therefore, I decided to make another box plot with the constructors that participated during these periods on the x-axis, with the period now being denoted by color hue instead.
Another key performance indicator I wanted to examine was the consistency of a constructor throughout a given season in relation to their competition. I would define this metric as the interquartile range (75th Percentile - 25th Percentile) of the points that a given constructor scored per race during the selected season. I chose to use box plots once again to visualize this metric. To allow for interactive exploration across the widest range of data, I implemented the ability to select a given season from 1950 to 2023.
In recent times, we have heard that some teams prioritize the driver's title over the constructors as it may be more 'marketable'. So, I wanted to find out the extent to which F1 driver champions were responsible for the success of the teams they drove for. I decided that a Sankey Diagram was the appropriate idiom to use for this task.
However, I ran into a problem. The same constructor in some cases had raced under different names in the past so they were recorded as different constructors in the dataset. Brabham and Lotus were the biggest offenders in this case. I had to address this by programmatically identifying these 'duplicates' and merging them together during the data processing.
I also decided to implement filters, so that users would have the interactive freedom to explore multiple-time champions or drivers who won with many teams vs just a single one.
Coming back to the topic of nationalities, I wanted to reexamine country-specific performance using the mean finishing position metric instead of just race wins. To achieve this, I opted to display the data on a choropleth map which I believed would make identifying both extremes of the spectrum easier.
I quickly noticed a peculiar problem. The best performing country ended up being the country with just a single driver that had participated in F1... To account for this possibly anomalous result, I added filters that allow users to disregard countries with less than the chosen minimum number of drivers and also drivers with less than the chosen minimum number of race starts. I also implemented the ability to pan across or zoom into the map to allow users to focus on specific regions with lots of countries packed in a smaller area, such as Europe.
Finally, I wanted to address what I believe is one of the most divisive questions that has a stranglehold on the sport: who is the greatest F1 driver of all time? As we have seen previously, points are not an effective metric for answering questions like this due to the numerous changes in the points system throughout F1 history. While the number of drivers' championships a driver has earned might be a simpler popular alternative, what do you do if two drivers have the same number of titles? Therefore, I wanted to employ the mean career finishing position metric to answer this question. I chose to compare mean career finishing position to total career races. For the choice of visualization idiom, I elected to utilize a scatter plot for this task as I believe that this is likely the best way to handle the very large size of the data (755 drivers).
I instantly identified an odd issue. The driver that was the best according to the chosen metric was one who had only driven a single F1 race in their entire career... To mitigate this oddity, I added the ability to filter out drivers who had less than the selected number of race starts to reduce the amount of noise and anomalies in the data. I also implemented filtering by nationalities to allow users the interactive freedom to explore the best (or worst) drivers from any country of their choosing. Finally, I displayed the distribution statistics for the portion of the data that was currently selected through the filters.