{"id":228,"date":"2020-09-30T00:56:02","date_gmt":"2020-09-29T21:26:02","guid":{"rendered":"http:\/\/mahshad.pro\/?p=228"},"modified":"2020-09-30T00:56:02","modified_gmt":"2020-09-29T21:26:02","slug":"formula-1-comprehensive-dashboard-part-i-creating-a-sql-database","status":"publish","type":"post","link":"https:\/\/mahshad.pro\/?p=228","title":{"rendered":"Formula 1 Comprehensive dashboard \u2014 Part I: Creating a SQL database"},"content":{"rendered":"\n<p> This tutorial is also published on\u00a0<a href=\"https:\/\/medium.com\/@mahshadn\/formula-1-comprehensive-dashboard-part-i-creating-a-sql-database-1e0fb78eecce?source=friends_link&amp;sk=7240a39cb10764910522b49a629aabe1\">Medium.com<\/a>. <\/p>\n\n\n\n<p> I have always been a Formula 1 enthusiast. Last year I wrote a short story on what I learned from it which you can read it here if interested: <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-mahshad-nejati\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"mGyJc2tbeA\"><a href=\"http:\/\/mahshad.pro\/?p=71\">What I have learned from Formula 1 to utilize in real world<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;What I have learned from Formula 1 to utilize in real world&#8221; &#8212; Mahshad Nejati\" src=\"http:\/\/mahshad.pro\/?p=71&#038;embed=true#?secret=q4HbgMM5fK#?secret=mGyJc2tbeA\" data-secret=\"mGyJc2tbeA\" width=\"520\" height=\"293\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p> A few weeks ago, I found an interesting dataset of Formula 1 races\u00a0<a href=\"https:\/\/www.kaggle.com\/cjgdev\/formula-1-race-data-19502017\">here<\/a>\u00a0that includes all the data for the races from 1950 up until 2017 season, and I have decided to make an informative dashboard from this dataset. The dataset is offered in 13 CSV files, each contains a series of data such as circuits, races, seasons, lap times, etc. For this project, I will create a SQL Server database from the dataset and import each file as a table into the SQL database. In the next step, will start creating the dashboard in Microsoft Power BI using the data stored in the SQL database. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"d691\">Creating a SQL Server database<\/h4>\n\n\n\n<p>We will be creating a blank SQL database in SSMS and name it \u201cF1stats\u201d which will be home to our dataset:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE DATABASE F1stats\n\n<\/code><\/pre>\n\n\n\n<p> Now let\u2019s import the CSV files into our database: <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"817\" height=\"742\" src=\"http:\/\/mahshad.pro\/wp-content\/uploads\/sql0.png\" alt=\"\" class=\"wp-image-229\" srcset=\"https:\/\/mahshad.pro\/wp-content\/uploads\/sql0.png 817w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql0-300x272.png 300w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql0-768x697.png 768w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql0-400x363.png 400w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql0-800x727.png 800w\" sizes=\"auto, (max-width: 817px) 100vw, 817px\" \/><figcaption> <br>Importing a CSV file using the import wizard <\/figcaption><\/figure>\n\n\n\n<p>After doing the same process for all of the CSV files, our database will look like this: <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"297\" height=\"409\" src=\"http:\/\/mahshad.pro\/wp-content\/uploads\/sql1.png\" alt=\"\" class=\"wp-image-230\" srcset=\"https:\/\/mahshad.pro\/wp-content\/uploads\/sql1.png 297w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql1-218x300.png 218w\" sizes=\"auto, (max-width: 297px) 100vw, 297px\" \/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"9b0e\">Data validation<\/h4>\n\n\n\n<p>To make sure that the data has been stored correctly, let\u2019s write a SQL query to return the circuits that hosted the most number of F1 races. In below query, I join two tables of \u201craces\u201d and \u201ccircuits\u201d, since I need the data from both tables:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>select circuitRef, circuits.country, count(*) as 'Number of Races'\nfrom\ncircuits inner join races\non circuits.circuitId = races.circuitId\ngroup by circuitRef, country\norder by [Number of Races] desc<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"920\" height=\"689\" src=\"http:\/\/mahshad.pro\/wp-content\/uploads\/sql2.png\" alt=\"\" class=\"wp-image-231\" srcset=\"https:\/\/mahshad.pro\/wp-content\/uploads\/sql2.png 920w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql2-300x225.png 300w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql2-768x575.png 768w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql2-400x300.png 400w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql2-800x599.png 800w\" sizes=\"auto, (max-width: 920px) 100vw, 920px\" \/><\/figure>\n\n\n\n<p> As can be seen in the results screen shot, the \u201cMonza\u201d circuit in Italy have hosted F1 races for 68 times, which is correct considering the creation date of the original dataset. Here we noticed that in some countries there are multiple circuits, so what if we want to know that how many races each country and each circuit have been hosted? To answer this, we can use window functions, which are very useful in data analytics, as below:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>select circuitRef, country,\ncount(raceId) over (partition by circuitRef) as 'Circuit Races',\ncount(raceId) over (partition by country) as 'Country Races'\nfrom [dbo].[circuits]\ninner join\n[dbo].[races]\non circuits.circuitId = races.circuitId\norder by [Country Races] desc<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p> Above code will count the number of races for each circuit and country and returns the result by showing the country with the most races hosted on top. If we execute the code then the result will look like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"599\" height=\"563\" src=\"http:\/\/mahshad.pro\/wp-content\/uploads\/sql3.png\" alt=\"\" class=\"wp-image-232\" srcset=\"https:\/\/mahshad.pro\/wp-content\/uploads\/sql3.png 599w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql3-300x282.png 300w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql3-400x376.png 400w\" sizes=\"auto, (max-width: 599px) 100vw, 599px\" \/><\/figure>\n\n\n\n<p> But this result is not desired since it keeps repeating the same values each for the number of occurrences. To solve this, we can use common table expression and manage it in the way that only shows each circuit only once:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>with cte as (\nselect circuitRef, country,\ncount(raceId) over (partition by circuitRef) as 'Circuit Races',\ncount(raceId) over (partition by country) as 'Country Races',\nROW_NUMBER() over (partition by circuitRef order by country) as 'rownbr'\nfrom [dbo].[circuits]\ninner join\n[dbo].[races]\non circuits.circuitId = races.circuitId\n)\nselect circuitRef, country, [Circuit Races], [Country Races]\nfrom cte\nwhere rownbr = 1\norder by [Country Races] desc\n\n<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"672\" height=\"670\" src=\"http:\/\/mahshad.pro\/wp-content\/uploads\/sql4.png\" alt=\"\" class=\"wp-image-233\" srcset=\"https:\/\/mahshad.pro\/wp-content\/uploads\/sql4.png 672w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql4-150x150.png 150w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql4-300x300.png 300w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql4-400x399.png 400w\" sizes=\"auto, (max-width: 672px) 100vw, 672px\" \/><\/figure>\n\n\n\n<p> We assigned Row Numbers to the first query and consequently we filter \u201crownbr\u201d to show each once. The returned result is now more acceptable than previous one.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"f572\">Troubleshooting<\/h4>\n\n\n\n<p>In above code, if instead of\u00a0<strong>inner join<\/strong>\u00a0we use\u00a0<strong>full outer join<\/strong>, we will notice a one record difference between two results. To find out what is the missing record, we can use a\u00a0<strong>full outer join<\/strong>\u00a0and filter the result as below:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>select *\nfrom [dbo].[circuits]\nfull outer join\n[dbo].[races]\non circuits.circuitId = races.circuitId\nwhere circuits.circuitId is null or races.circuitId is null\n<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"358\" src=\"http:\/\/mahshad.pro\/wp-content\/uploads\/sql5-1024x358.png\" alt=\"\" class=\"wp-image-234\" srcset=\"https:\/\/mahshad.pro\/wp-content\/uploads\/sql5-1024x358.png 1024w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql5-300x105.png 300w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql5-768x268.png 768w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql5-400x140.png 400w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql5-800x279.png 800w, https:\/\/mahshad.pro\/wp-content\/uploads\/sql5.png 1177w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p> By using the above code, we fully checked both tables and returned any record that has a null value for circuitId. It shows that we have \u201cPort Imperial Circuit\u201d in our database but hosted no Formula 1 races which is actually true, due to some financial difficulties as per\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Port_Imperial_Street_Circuit\">Wikipedia<\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p> In the next part, we will connect the data to Power BI to proceed with creating the Formula 1 dashboard.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to create a dashboard of Formula 1 races with Power BI from the data stored in a SQL database<\/p>\n","protected":false},"author":1,"featured_media":235,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[28,40],"tags":[41],"class_list":["post-228","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials","category-works","tag-power-bi","has-thumbnail"],"_links":{"self":[{"href":"https:\/\/mahshad.pro\/index.php?rest_route=\/wp\/v2\/posts\/228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mahshad.pro\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mahshad.pro\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mahshad.pro\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mahshad.pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=228"}],"version-history":[{"count":2,"href":"https:\/\/mahshad.pro\/index.php?rest_route=\/wp\/v2\/posts\/228\/revisions"}],"predecessor-version":[{"id":237,"href":"https:\/\/mahshad.pro\/index.php?rest_route=\/wp\/v2\/posts\/228\/revisions\/237"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mahshad.pro\/index.php?rest_route=\/wp\/v2\/media\/235"}],"wp:attachment":[{"href":"https:\/\/mahshad.pro\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mahshad.pro\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mahshad.pro\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}