Data Analysis using SQL

Agenda 

This script will serve as an introduction to advanced data analysis utilizing the SQL language, which should be a necessary tool for every data scientist, data engineer, and machine learning engineer to gain access to data. The idea underlying SQL is fairly similar to that of any other language or tool used for data analysis (excel, Pandas), thus it should be very intuitive for individuals who have experience working with data.

Loading Data into https://sqliteonline.com/

Open Website in Browser

Click on File and select on Open DB

Select the file database.sqlite which is downloaded from the download section and click on Open

Table will be imported 

List of countries

Query:PIE-select name, id from Country GROUP by name; 

Click on Run Button to execute the query

List of leagues and their country

SELECT * FROM League JOIN Country ON Country.id = League.country_id;

List of teams

SELECT * FROM Team ORDER BY team_long_name LIMIT 10;

List of matches

SELECT Match.id, Country.name AS country_name, League.name AS league_name,  season,  stage,  date, HT.team_long_name AS  home_team, AT.team_long_name AS away_team, home_team_goal, away_team_goal  FROM Match JOIN Country on Country.id = Match.country_id JOIN League on League.id = Match.league_id LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id WHERE country_name = ‘Spain’ ORDER by date LIMIT 10;

Let’s do some basic analytics

SELECT Country.name AS country_name,
League.name AS league_name,
season,
count(distinct stage) AS number_of_stages,
count(distinct HT.team_long_name) AS number_of_teams,
avg(home_team_goal) AS avg_home_team_scors,
avg(away_team_goal) AS avg_away_team_goals,
avg(home_team_goal-away_team_goal) AS avg_goal_dif,
avg(home_team_goal+away_team_goal) AS avg_goals,
sum(home_team_goal+away_team_goal) AS total_goals
FROM Match
JOIN Country on Country.id = Match.country_id
JOIN League on League.id = Match.league_id
LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id
LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id
WHERE country_name in (‘Spain’, ‘Germany’, ‘France’, ‘Italy’, ‘England’)
GROUP BY Country.name, League.name, season
HAVING count(distinct stage) > 10
ORDER BY Country.name, League.name, season DESC
;

SELECT CASE
WHEN ROUND(height)<165 then 165
WHEN ROUND(height)>195 then 195
ELSE ROUND(height)
END AS calc_height,
COUNT(height) AS distribution,
(avg(PA_Grouped.avg_overall_rating)) AS avg_overall_rating,
(avg(PA_Grouped.avg_potential)) AS avg_potential,
AVG(weight) AS avg_weight
FROM PLAYER
LEFT JOIN (SELECT Player_Attributes.player_api_id,
avg(Player_Attributes.overall_rating) AS avg_overall_rating,
avg(Player_Attributes.potential) AS avg_potential
FROM Player_Attributes
GROUP BY Player_Attributes.player_api_id)
AS PA_Grouped ON PLAYER.player_api_id = PA_Grouped.player_api_id
GROUP BY calc_height
ORDER BY calc_height
;

By Bhavesh