What Makes a Top 10 NBA Draft Pick?

Analyzing Player Attributes from Top 10 NBA First Round Draft Picks

Authors: Matthew Muccio, William Miller

Project Abstract

Every June, the National Basketball Association (NBA) holds a draft, where each of the thirty teams have an oppurtunity to select two top prospects to join their organization. With only two rounds in the draft - and only two chances per team - it is crucial that a team does proper research, scouting, and analysis to ensure that their draft picks have a significant impact on their odds at winning a championship.

Using the NBA data API for all players currently in the league, we will examine top 10 draft picks and see how they compare to all players in the league. Then we will make use of various individual player attributes to determine what properties correlate to top NBA draft picks. After having read through our data analysis, we hope that you will understand the importance of research and scouting that NBA teams undergo in selecting top draft picks.

Project Outline

  1. Project Introduction
    • A. Libraries and Dependencies
    • B. Data Sources
    • C. Importing and Examining the Dataset
    • D. Getting Top 10 Draft Picks
    • E. Various Data Trends of Top 10 Draft Picks
  2. Exploring Top 10 Draft Picks
    • A.How Many Current NBA Players Are Top 10 Draft Picks?
    • B. How Many Current NBA Players Are Not Top 10 Draft Picks?
    • C. Top 10 Draft Picks by Position
    • D. Top 10 Draft Picks by Size
    • E. Top 10 Draft Picks by Place of Origin
  3. Analysis of Top 10 Draft Picks' Attributes
    • A. What Attributes Matter Most?
    • B. Reducing and Comparing Attributes
  4. Finding Key Attributes Using Multiple Linear Regression
    • A. Null Hypothesis Testing
    • B. Using SciKit-Learn and StatsModel for Regression Model
  5. Predicting the Ideal Draft Pick Based on Player Attributes with ML
    • A. Training and Testing
  6. Project Conclusion
    • A. Closing Statement About Attributes
    • B. Closing Statement About Draft Pick Prediction
    • C. Final Thoughts & Other Resources

1. Project Introduction

1A. Libraries and Dependencies

  • Python Standard Library Modules:

    • JSON
  • Third-Party Library Modules:

    • Matplotlib - pyplot
    • Pandas
    • Requests
    • Scikit-learn
    • Seaborn
    • StatsModels - API
In [1]:
import json

import matplotlib.pyplot as plt
import pandas as pd
import requests
import seaborn
from sklearn import linear_model
from sklearn import model_selection
from statsmodels import api as sm

1B. Data Sources

The dataset used in our analysis includes all information from almost 500 current players in the NBA. The NBA releases an updated version of this data everyday. It contains information such as player names, height, weight, college, draft number, and country of origin. We will be looking at information from only the players who were drafted in the top 10 of their class.

The dataset can be found here. It comes directly from the NBA website.

1C. Importing and Examining the Dataset

Loading the JSON into a Pandas dataframe. We organize the dataframe into only the columns of data we will use for analysis. These include:

- First Name
- Last Name
- Position
- Height (Feet)
- Height (Inches)
- Weight (Pounds)
- Date of Birth (Year)
- Date of Birth (Month)
- Date of Birth (Day)
- NBA Debut Year
- Number of Years in NBA
- College
- Last affiliation (College or Location)
- Country
- Draft Round Number
- Draft Pick Number
- Draft Year
- Team
In [2]:
# Prepping player data from NBA API to be added to a Pandas DataFrame.
endpoint = "http://data.nba.net/10s/prod/v1/2018/players.json"
r = requests.get(endpoint)
raw_data = r.json()

player_data = raw_data["league"]["standard"]

players = []

for player in player_data:
    players.append(
    {
        "firstName": player["firstName"],
        "lastName": player["lastName"],
        "pos": player["pos"],
        "heightFeet": int(player["heightFeet"]) if not player["heightFeet"] == "-" else "",
        "heightInches": int(player["heightInches"]) if not player["heightInches"] == "-" else "",
        "heightTotal": player["heightFeet"] + " " + player["heightInches"],
        "weightPounds": int(player["weightPounds"]) if not player["weightPounds"] == "" else "",
        "birthYear": int(player["dateOfBirthUTC"][:4:]) if not player["dateOfBirthUTC"][:4:] == "" else "",
        "birthMonth": int(player["dateOfBirthUTC"][5:7:]) if not player["dateOfBirthUTC"][5:7:] == "" else "",
        "birthDay": int(player["dateOfBirthUTC"][8:10:]) if not player["dateOfBirthUTC"][8:10:] == "" else "",
        "nbaDebutYear": player["nbaDebutYear"],
        "yearsPro": player["yearsPro"],
        "collegeName": player["collegeName"],
        "lastAffiliation": player["lastAffiliation"],
        "country": player["country"],
        "roundNum": int(player["draft"]["roundNum"]) if not player["draft"]["roundNum"] == "" else "",
        "pickNum": int(player["draft"]["pickNum"]) if not player["draft"]["pickNum"] == "" else "",
        "draftYear": int(player["draft"]["seasonYear"]) if not player["draft"]["seasonYear"] == "" else "",
        "teamId": player["draft"]["teamId"]
    })

df = pd.DataFrame(players)
df.head()
Out[2]:
birthDay birthMonth birthYear collegeName country draftYear firstName heightFeet heightInches heightTotal lastAffiliation lastName nbaDebutYear pickNum pos roundNum teamId weightPounds yearsPro
0 1 8 1993 Spain 2013 Alex 6 6 6 6 Spain/Spain Abrines 2016 32 G 2 1610612760 200 2
1 4 5 1996 St. Bonaventure USA Jaylen 6 2 6 2 St. Bonaventure/USA Adams 2018 G 190 0
2 20 7 1993 Pittsburgh New Zealand 2013 Steven 7 0 7 0 Pittsburgh/New Zealand Adams 2013 12 C 1 1610612760 265 5
3 18 7 1997 Kentucky USA 2017 Bam 6 10 6 10 Kentucky/USA Adebayo 2017 14 C-F 1 1610612748 255 1
4 5 6 1993 Illinois State Trinidad and Tobago DeVaughn 6 6 6 6 Illinois State/Trinidad and Tobago Akoon-Purcell 2018 G-F 200 0

We will use pandas describe() to show some statistics about the dataset. First, we can see there is 498 players in the current NBA league. There is some pretty interesting results to look at here. Like the top 'lastAffiliation' is Kentucky (University of Kentucky). We will see a lot of these results play in later as we look deeper into the dataset. Another interesting result is, the 'firstName' and 'lastName' columns.

In [3]:
df.describe()
Out[3]:
birthDay birthMonth birthYear collegeName country draftYear firstName heightFeet heightInches heightTotal lastAffiliation lastName nbaDebutYear pickNum pos roundNum teamId weightPounds yearsPro
count 498 498 498 498 498 498 498 498 498 498 498 498 498 498 498 498 498 498 498
unique 32 13 23 156 47 21 383 4 13 20 245 414 21 60 7 3 31 85 21
top 5 5 1994 USA Tyler 6 10 6 10 Kentucky/USA Williams 2017 G 1 220 0
freq 23 54 49 56 383 106 6 449 54 53 21 7 80 106 194 277 106 32 89

Since team identification for each player is stored as a number instead of a team name, we need to convert the id's to team names. To do this we will be looking at another page of data that the NBA provides. The data page can be found here and includes all of the information of teams that are associated with the NBA, including the team id's. We can use this data to match the team id with an actual team name in our DataFrame.

In [4]:
# Prepping team data from NBA API to be added to a teams dictionary
# to replace teamId with actual team names.
endpoint = "http://data.nba.net/"
r = requests.get(endpoint)
raw_data = r.json()

team_data = raw_data["sports_content"]["teams"]["team"]

teams = {}

for team in team_data:
    if team["is_nba_team"]:
        teamId = team["team_id"]
        teamName = team["team_name"] + " " + team["team_nickname"]
        teams[teamId] = teamName
In [5]:
# Converting teamId column to actual team name.
# Converting collegeName column to None if blank.
for i, row in df.iterrows():
    teamId = row["teamId"]
    if teamId:
        df.at[i, "teamId"] = teams[int(row["teamId"])]
    else:
        df.at[i, "teamId"] = "None"
    
    def convert_to_none(col):
        colName = row[col]
        if not colName:
            df.at[i, col] = "None"
    def convert_to_zero(col):
        colName = row[col]
        if not colName:
            df.at[i, col] = 0
            
    convert_to_none("collegeName")
    convert_to_none("draftYear")
    convert_to_none("nbaDebutYear")
    convert_to_zero("pickNum")
    convert_to_zero("roundNum")
df
Out[5]:
birthDay birthMonth birthYear collegeName country draftYear firstName heightFeet heightInches heightTotal lastAffiliation lastName nbaDebutYear pickNum pos roundNum teamId weightPounds yearsPro
0 1 8 1993 None Spain 2013 Alex 6 6 6 6 Spain/Spain Abrines 2016 32 G 2 Oklahoma City Thunder 200 2
1 4 5 1996 St. Bonaventure USA None Jaylen 6 2 6 2 St. Bonaventure/USA Adams 2018 0 G 0 None 190 0
2 20 7 1993 Pittsburgh New Zealand 2013 Steven 7 0 7 0 Pittsburgh/New Zealand Adams 2013 12 C 1 Oklahoma City Thunder 265 5
3 18 7 1997 Kentucky USA 2017 Bam 6 10 6 10 Kentucky/USA Adebayo 2017 14 C-F 1 Miami Heat 255 1
4 5 6 1993 Illinois State Trinidad and Tobago None DeVaughn 6 6 6 6 Illinois State/Trinidad and Tobago Akoon-Purcell 2018 0 G-F 0 None 200 0
5 19 7 1985 Texas USA 2006 LaMarcus 6 11 6 11 Texas/USA Aldridge 2006 2 F 1 Chicago Bulls 260 12
6 29 10 1997 None USA None Rawle 6 5 6 5 USA/USA Alkins None 0 G 0 None 225 0
7 8 10 1995 Duke USA 2018 Grayson 6 5 6 5 Duke University/USA Allen 2018 21 G 1 Utah Jazz 198 0
8 21 4 1998 Texas USA 2017 Jarrett 6 11 6 11 Texas/USA Allen 2017 22 C 1 Brooklyn Nets 237 1
9 21 9 1990 Wake Forest USA 2010 Al-Farouq 6 9 6 9 Wake Forest/USA Aminu 2010 8 F 1 Los Angeles Clippers 220 8
10 19 11 1993 Virginia USA 2015 Justin 6 6 6 6 Virginia/USA Anderson 2015 21 G-F 1 Dallas Mavericks 230 3
11 20 9 1993 UCLA USA 2014 Kyle 6 9 6 9 UCLA/USA Anderson 2014 30 F 1 San Antonio Spurs 230 4
12 6 5 1988 California USA 2008 Ryan 6 10 6 10 California/USA Anderson 2008 21 F 1 Brooklyn Nets 240 10
13 16 10 1998 California-Los Angeles USA 2017 Ike 6 10 6 10 UCLA/USA Anigbogu 2017 47 C 2 Indiana Pacers 250 1
14 6 12 1994 Greece Greece 2013 Giannis 6 11 6 11 Greece/Greece Antetokounmpo 2013 15 F 1 Milwaukee Bucks 242 5
15 20 11 1997 Dayton Greece 2018 Kostas 6 10 6 10 University of Dayton/Greece Antetokounmpo None 60 F 2 Philadelphia 76ers 200 0
16 29 5 1984 Syracuse USA 2003 Carmelo 6 8 6 8 Syracuse/USA Anthony 2003 3 F 1 Denver Nuggets 240 15
17 17 7 1997 Indiana United Kingdom 2017 OG 6 8 6 8 Indiana/United Kingdom Anunoby 2017 23 F 1 Toronto Raptors 232 1
18 26 3 1994 Villanova USA None Ryan 6 3 6 3 Villanova/USA Arcidiacono 2017 0 G 0 None 200 1
19 30 6 1985 UCLA USA 2004 Trevor 6 8 6 8 UCLA/USA Ariza 2004 43 F 2 New York Knicks 215 14
20 10 11 1987 Texas USA 2008 D.J. 6 0 6 0 Texas/USA Augustin 2008 9 G 1 Charlotte Hornets 183 10
21 23 7 1998 Arizona Bahamas 2018 Deandre 7 1 7 1 University of Arizona/Bahamas Ayton 2018 1 C 1 Phoenix Suns 250 0
22 30 8 1995 Florida State USA 2017 Dwayne 6 7 6 7 Florida State/USA Bacon 2017 40 G-F 2 New Orleans Pelicans 221 1
23 14 3 1999 Duke USA 2018 Marvin 6 11 6 11 Duke University/USA Bagley III 2018 2 F 1 Sacramento Kings 234 0
24 30 3 1993 Wichita State USA None Ron 6 4 6 4 Wichita State/USA Baker 2016 0 G 0 None 220 2
25 29 3 1996 Vanderbilt USA 2016 Wade 6 4 6 4 Vanderbilt/USA Baldwin IV 2016 17 G 1 Memphis Grizzlies 200 2
26 27 10 1997 California-Los Angeles USA 2017 Lonzo 6 6 6 6 UCLA/USA Ball 2017 2 G 1 Los Angeles Lakers 190 1
27 12 5 1998 Texas USA 2018 Mo 7 0 7 0 University of Texas at Austin/USA Bamba 2018 6 C 1 Orlando Magic 221 0
28 26 6 1984 Northeastern Puerto Rico None J.J. 6 0 6 0 Northeastern/Puerto Rico Barea 2006 0 G 0 None 185 12
29 30 5 1992 North Carolina USA 2012 Harrison 6 8 6 8 North Carolina/USA Barnes 2012 7 F 1 Golden State Warriors 225 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
468 6 9 1990 Kentucky USA 2010 John 6 4 6 4 Kentucky/USA Wall 2010 1 G 1 Washington Wizards 210 8
469 10 6 1994 California USA 2016 Tyrone 6 5 6 5 California/USA Wallace 2017 60 G 2 Utah Jazz 198 1
470 25 7 1989 Pittsburgh USA None Brad 6 4 6 4 Pittsburgh/USA Wanamaker 2018 0 G 0 None 210 0
471 5 9 1993 North Carolina State USA 2014 T.J. 6 8 6 8 North Carolina State/USA Warren 2014 14 F 1 Phoenix Suns 215 4
472 13 10 1994 George Washington Japan None Yuta 6 9 6 9 George Washington/Japan Watanabe 2018 0 G 0 None 205 0
473 3 2 1996 California-Los Angeles USA 2018 Thomas 7 0 7 0 USA/USA Welsh 2018 58 C 2 Denver Nuggets 255 0
474 12 11 1988 UCLA USA 2008 Russell 6 3 6 3 UCLA/USA Westbrook 2008 4 G 1 Oklahoma City Thunder 200 10
475 2 7 1994 Colorado USA 2017 Derrick 6 4 6 4 Colorado/USA White 2017 29 G 1 San Antonio Spurs 190 1
476 13 8 1992 Florida State USA None Okaro 6 8 6 8 Florida State/USA White 2016 0 F 0 None 205 2
477 13 6 1989 Marshall USA 2010 Hassan 7 0 7 0 MarshallUSA/USA Whiteside 2010 33 C 2 Sacramento Kings 265 6
478 23 2 1995 Kansas Canada 2014 Andrew 6 8 6 8 Kansas/Canada Wiggins 2014 1 F-G 1 Cleveland Cavaliers 194 4
479 28 1 1993 California-Santa Barbara USA None Alan 6 8 6 8 California-Santa Barbara/USA Williams 2015 0 F-C 0 None 265 3
480 6 2 1990 North Carolina State USA None C.J. 6 5 6 5 North Carolina State/USA Williams 2017 0 G 0 None 226 1
481 22 5 1995 Gonzaga USA None Johnathan 6 9 6 9 Gonzaga/USA Williams 2018 0 F 0 None 228 0
482 2 12 1994 None USA None Kenrich 6 7 6 7 USA/USA Williams 2018 0 G-F 0 None 210 0
483 27 10 1986 South Gwinnett HS (GA) USA 2005 Lou 6 1 6 1 South Gwinnett HS (Snellville, GA)/USA Williams 2005 45 G 2 Philadelphia 76ers 175 13
484 19 6 1986 North Carolina USA 2005 Marvin 6 9 6 9 North CarolinaUSA/USA Williams 2005 2 F 1 Atlanta Hawks 237 13
485 30 12 1994 Indiana USA None Troy 6 7 6 7 Indiana/USA Williams 2016 0 F 0 None 218 2
486 17 10 1997 Texas A&M USA 2018 Robert 6 10 6 10 Texas A&M University/USA Williams III 2018 27 C-F 1 Boston Celtics 240 0
487 19 2 1996 Michigan USA 2017 D.J. 6 10 6 10 Michigan/USA Wilson 2017 17 F 1 Milwaukee Bucks 231 1
488 26 3 1996 Duke USA 2015 Justise 6 7 6 7 Duke/USA Winslow 2015 10 F 1 Miami Heat 225 3
489 27 9 1995 Nevada-Las Vegas USA None Christian 6 10 6 10 Nevada-Las Vegas/USA Wood 2015 0 F 0 None 214 2
490 26 4 1992 Utah USA 2015 Delon 6 5 6 5 Utah/USA Wright 2015 20 G 1 Toronto Raptors 183 3
491 17 12 1995 None France 2016 Guerschon 6 8 6 8 France/France Yabusele 2017 16 F 1 Boston Celtics 260 1
492 1 6 1985 Southern California USA 2007 Nick 6 7 6 7 USC/USA Young 2007 16 G-F 1 Washington Wizards 210 11
493 21 6 1988 Georgia Tech USA 2007 Thaddeus 6 8 6 8 Georgia Tech/USA Young 2007 12 F 1 Philadelphia 76ers 221 11
494 19 9 1998 Oklahoma USA 2018 Trae 6 2 6 2 University of Oklahoma/USA Young 2018 5 G 1 Dallas Mavericks 180 0
495 5 10 1992 Indiana USA 2013 Cody 7 0 7 0 Indiana/USA Zeller 2013 4 C 1 Charlotte Hornets 240 5
496 4 1 1997 None Croatia 2016 Ante 6 11 6 11 Croatia/Croatia Zizic 2017 23 C 1 Boston Celtics 254 1
497 18 3 1997 None Croatia 2016 Ivica 7 1 7 1 Mega Leks/Croatia Zubac 2016 32 C 2 Los Angeles Lakers 240 2

498 rows × 19 columns

We now have a tidy dataframe that is much easier to read and has the pertinent information. The dataframe also has now accounted for information that was missing and put 'None' its place.

1D. Getting Top 10 Draft Picks

Since our project focuses on Top 10 draft picks, we will organize our data to isolate only players who were drafted within the top ten of their class.

In [6]:
top10_df = df.loc[(df["pickNum"] <= 10) & (df["roundNum"] == 1)]
top10_df
Out[6]:
birthDay birthMonth birthYear collegeName country draftYear firstName heightFeet heightInches heightTotal lastAffiliation lastName nbaDebutYear pickNum pos roundNum teamId weightPounds yearsPro
5 19 7 1985 Texas USA 2006 LaMarcus 6 11 6 11 Texas/USA Aldridge 2006 2 F 1 Chicago Bulls 260 12
9 21 9 1990 Wake Forest USA 2010 Al-Farouq 6 9 6 9 Wake Forest/USA Aminu 2010 8 F 1 Los Angeles Clippers 220 8
16 29 5 1984 Syracuse USA 2003 Carmelo 6 8 6 8 Syracuse/USA Anthony 2003 3 F 1 Denver Nuggets 240 15
20 10 11 1987 Texas USA 2008 D.J. 6 0 6 0 Texas/USA Augustin 2008 9 G 1 Charlotte Hornets 183 10
21 23 7 1998 Arizona Bahamas 2018 Deandre 7 1 7 1 University of Arizona/Bahamas Ayton 2018 1 C 1 Phoenix Suns 250 0
23 14 3 1999 Duke USA 2018 Marvin 6 11 6 11 Duke University/USA Bagley III 2018 2 F 1 Sacramento Kings 234 0
26 27 10 1997 California-Los Angeles USA 2017 Lonzo 6 6 6 6 UCLA/USA Ball 2017 2 G 1 Los Angeles Lakers 190 1
27 12 5 1998 Texas USA 2018 Mo 7 0 7 0 University of Texas at Austin/USA Bamba 2018 6 C 1 Orlando Magic 221 0
29 30 5 1992 North Carolina USA 2012 Harrison 6 8 6 8 North Carolina/USA Barnes 2012 7 F 1 Golden State Warriors 225 6
36 28 6 1993 Florida USA 2012 Bradley 6 5 6 5 Florida/USA Beal 2012 3 G 1 Washington Wizards 207 6
38 9 1 1989 Kansas State USA 2008 Michael 6 9 6 9 Kansas State/USA Beasley 2008 2 F 1 Miami Heat 235 10
42 17 11 1997 None Croatia 2016 Dragan 7 1 7 1 Croatia/Croatia Bender 2016 4 F 1 Phoenix Suns 225 2
47 28 8 1992 Lubumbashi, DR Congo Democratic Republic of the Congo 2011 Bismack 6 9 6 9 Baloncesto Fuenlabrada/Democratic Republic of ... Biyombo 2011 7 C 1 Sacramento Kings 255 7
61 30 8 1996 Villanova USA 2018 Mikal 6 7 6 7 Villanova University/USA Bridges 2018 10 F 1 Philadelphia 76ers 210 0
69 24 10 1996 California USA 2016 Jaylen 6 7 6 7 California/USA Brown 2016 3 G 1 Boston Celtics 220 2
76 12 11 1992 Michigan USA 2013 Trey 6 1 6 1 Michigan/USA Burke 2013 9 G 1 Minnesota Timberwolves 175 5
81 18 2 1993 Georgia USA 2013 Kentavious 6 5 6 5 Georgia/USA Caldwell-Pope 2013 8 G 1 Detroit Pistons 205 5
85 26 1 1977 North Carolina USA 1998 Vince 6 6 6 6 North Carolina/USA Carter 1998 5 F-G 1 Golden State Warriors 220 20
86 16 4 1999 Duke USA 2018 Wendell 6 10 6 10 Duke University/USA Carter Jr. 2018 7 F 1 Chicago Bulls 255 0
90 18 8 1993 Kentucky USA 2015 Willie 7 0 7 0 Kentucky/USA Cauley-Stein 2015 6 C 1 Sacramento Kings 240 3
93 2 10 1982 Dominquez H.S USA 2001 Tyson 7 1 7 1 Dominguez HS (CA)/USA Chandler 2001 2 C 1 Los Angeles Clippers 240 17
96 2 7 1997 Washington USA 2016 Marquese 6 10 6 10 Washington/USA Chriss 2016 8 F 1 Phoenix Suns 240 2
101 19 11 1997 Gonzaga USA 2017 Zach 7 0 7 0 Gonzaga/USA Collins 2017 10 F-C 1 Sacramento Kings 235 1
103 11 10 1987 Ohio State USA 2007 Mike 6 1 6 1 Ohio State/USA Conley 2007 4 G 1 Memphis Grizzlies 175 11
106 13 8 1990 Kentucky USA 2010 DeMarcus 6 11 6 11 Kentucky/USA Cousins 2010 5 C 1 Sacramento Kings 270 8
110 20 3 1980 Michigan USA 2000 Jamal 6 5 6 5 Michigan/USA Crawford 2000 8 G 1 Cleveland Cavaliers 185 18
114 14 3 1988 Davidson USA 2009 Stephen 6 3 6 3 Davidson/USA Curry 2009 7 G 1 Golden State Warriors 190 9
116 11 3 1993 Kentucky USA 2012 Anthony 6 10 6 10 Kentucky/USA Davis 2012 1 F-C 1 New Orleans Pelicans 253 6
123 16 4 1985 Duke United Kingdom 2004 Luol 6 9 6 9 Duke/United Kingdom Deng 2004 7 F 1 Phoenix Suns 237 14
124 7 8 1989 Southern California USA 2009 DeMar 6 7 6 7 USC/USA DeRozan 2009 9 G 1 Toronto Raptors 220 9
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
380 3 6 1993 Georgetown USA 2013 Otto 6 8 6 8 Georgetown/USA Porter Jr. 2013 3 F 1 Washington Wizards 198 5
382 2 8 1995 None Latvia 2015 Kristaps 7 3 7 3 Cajasol Sevilla/Latvia Porzingis 2015 4 F-C 1 New York Knicks 240 3
390 29 11 1994 Kentucky USA 2014 Julius 6 9 6 9 Kentucky/USA Randle 2014 7 F 1 Los Angeles Lakers 250 4
396 1 8 1992 Duke USA 2012 Austin 6 4 6 4 Duke/USA Rivers 2012 10 G 1 New Orleans Pelicans 200 6
404 4 10 1988 Memphis USA 2008 Derrick 6 3 6 3 Memphis/USA Rose 2008 1 G 1 Chicago Bulls 200 9
405 5 2 1991 Washington USA 2012 Terrence 6 7 6 7 Washington/USA Ross 2012 8 G-F 1 Toronto Raptors 206 6
407 21 10 1990 El Masnou, Spain Spain 2009 Ricky 6 4 6 4 FC Barcelona/Spain Rubio 2011 5 G 1 Minnesota Timberwolves 190 7
408 23 2 1996 Ohio State USA 2015 D'Angelo 6 5 6 5 Ohio State/USA Russell 2015 2 G 1 Los Angeles Lakers 198 3
416 4 1 1999 Alabama USA 2018 Collin 6 2 6 2 University of Alabama/USA Sexton 2018 8 G 1 Cleveland Cavaliers 190 0
420 20 7 1996 Louisiana State Australia 2016 Ben 6 10 6 10 Louisiana State/Australia Simmons 2017 1 G-F 1 Philadelphia 76ers 230 1
423 6 3 1994 Oklahoma State USA 2014 Marcus 6 4 6 4 Oklahoma State/USA Smart 2014 6 G 1 Boston Celtics 220 4
427 25 11 1997 North Carolina State USA 2017 Dennis 6 3 6 3 North Carolina State/USA Smith Jr. 2017 9 G 1 Dallas Mavericks 195 1
431 7 10 1993 Michigan Canada 2014 Nik 6 6 6 6 Michigan/Canada Stauskas 2014 8 G 1 Sacramento Kings 205 4
436 3 3 1998 Duke USA 2017 Jayson 6 8 6 8 Duke/USA Tatum 2017 3 F 1 Boston Celtics 208 1
446 13 3 1991 Texas Canada 2011 Tristan 6 10 6 10 Texas/Canada Thompson 2011 4 C-F 1 Cleveland Cavaliers 238 7
449 15 11 1995 Kentucky USA 2015 Karl-Anthony 7 0 7 0 Kentucky/USA Towns 2015 1 C 1 Minnesota Timberwolves 248 3
453 27 10 1988 Ohio State USA 2010 Evan 6 7 6 7 Ohio State/USA Turner 2010 2 G-F 1 Philadelphia 76ers 220 8
455 20 5 1987 Baylor USA 2010 Ekpe 6 10 6 10 Baylor/USA Udoh 2010 6 C-F 1 Golden State Warriors 245 6
457 6 5 1992 Utena, Lithuania Lithuania 2011 Jonas 7 0 7 0 Utena, Lithuania/Lithuania Valanciunas 2012 5 C 1 Toronto Raptors 265 6
461 24 8 1995 Indiana USA 2014 Noah 6 9 6 9 Indiana/USA Vonleh 2014 9 F 1 Charlotte Hornets 250 4
463 17 1 1982 Marquette USA 2003 Dwyane 6 4 6 4 Marquette/USA Wade 2003 5 G 1 Miami Heat 220 15
465 10 12 1991 Syracuse USA 2012 Dion 6 4 6 4 Syracuse/USA Waiters 2012 4 G 1 Cleveland Cavaliers 215 6
466 8 5 1990 Connecticut USA 2011 Kemba 6 1 6 1 Connecticut/USA Walker 2011 9 G 1 Charlotte Hornets 184 7
468 6 9 1990 Kentucky USA 2010 John 6 4 6 4 Kentucky/USA Wall 2010 1 G 1 Washington Wizards 210 8
474 12 11 1988 UCLA USA 2008 Russell 6 3 6 3 UCLA/USA Westbrook 2008 4 G 1 Oklahoma City Thunder 200 10
478 23 2 1995 Kansas Canada 2014 Andrew 6 8 6 8 Kansas/Canada Wiggins 2014 1 F-G 1 Cleveland Cavaliers 194 4
484 19 6 1986 North Carolina USA 2005 Marvin 6 9 6 9 North CarolinaUSA/USA Williams 2005 2 F 1 Atlanta Hawks 237 13
488 26 3 1996 Duke USA 2015 Justise 6 7 6 7 Duke/USA Winslow 2015 10 F 1 Miami Heat 225 3
494 19 9 1998 Oklahoma USA 2018 Trae 6 2 6 2 University of Oklahoma/USA Young 2018 5 G 1 Dallas Mavericks 180 0
495 5 10 1992 Indiana USA 2013 Cody 7 0 7 0 Indiana/USA Zeller 2013 4 C 1 Charlotte Hornets 240 5

124 rows × 19 columns

Now let's look at some data trends within the data set. There are many important attributes to a player like position, colleges, height, and weight. Many of these reasons go into why players would get drafted top 10 or not.

Expected results! Historically, the 'Guard' position is the top position to be drafted due to the all around playmaking ability the individual has to have to fill this role. Guards also have multiple positions on the court and many take up leadership roles on teams.

In [7]:
# Unique positions of Top 10 Draft Picks
top10_df["pos"].unique()
Out[7]:
array(['F', 'G', 'C', 'F-G', 'F-C', 'C-F', 'G-F'], dtype=object)

Top 10 draft picks come from colleges all over, even overseas! We will look at colleges more in detail later.

In [8]:
# Unique college names of Top 10 Draft Picks
top10_df["collegeName"].unique()
Out[8]:
array(['Texas', 'Wake Forest', 'Syracuse', 'Arizona', 'Duke',
       'California-Los Angeles', 'North Carolina', 'Florida',
       'Kansas State', 'None', 'Lubumbashi, DR Congo', 'Villanova',
       'California', 'Michigan', 'Georgia', 'Kentucky', 'Dominquez H.S',
       'Washington', 'Gonzaga', 'Ohio State', 'Davidson',
       'Southern California', 'Connecticut', 'Providence', 'Kansas',
       'Memphis', 'Georgia Tech', "Sant'Angelo Lodigiano, Italy",
       'Barcelona, Spain', 'Fresno State', 'Indiana', 'Georgetown',
       'Oklahoma', 'Arizona State', 'Wisconsin', 'Butler', 'Croatia',
       'SW Atlanta Christian Academy (GA)', 'Florida State',
       'Michigan State', 'St. Vincent-St. Mary HS (OH)', 'Maryland',
       'Weber State', 'Peoria Central HS (IL)', 'Stanford', 'UCLA',
       'Lehigh', 'Sao Carlos, Brazil', 'Wurzburg, Germany',
       'Louisiana-Lafayette', 'Utah', 'El Masnou, Spain', 'Alabama',
       'Louisiana State', 'Oklahoma State', 'North Carolina State',
       'Baylor', 'Utena, Lithuania', 'Marquette'], dtype=object)

Top 10 draft picks also come from all over the world! Even from the newest country in the world, South Sudan.

In [9]:
# Unique country of origin of Top 10 Draft Picks
top10_df["country"].unique()
Out[9]:
array(['USA', 'Bahamas', 'Croatia', 'Democratic Republic of the Congo',
       'United Kingdom', 'Slovenia', 'Cameroon', 'Australia', 'Italy',
       'Spain', 'Dominican Republic', 'Turkey', 'Ukraine', 'South Sudan',
       'Finland', 'Canada', 'Brazil', 'Germany', 'France', 'Austria',
       'Latvia', 'Lithuania'], dtype=object)

A cool little data fact about the University of Maryland basketball program. Yes, there is a player active in the NBA who was a top 10 draft pick!

In [10]:
# Top 10 NBA Draft Picks from University of Maryland
maryland_df = top10_df.loc[df["collegeName"] == "Maryland"]
maryland_df
Out[10]:
birthDay birthMonth birthYear collegeName country draftYear firstName heightFeet heightInches heightTotal lastAffiliation lastName nbaDebutYear pickNum pos roundNum teamId weightPounds yearsPro
275 16 6 1993 Maryland Ukraine 2013 Alex 7 1 7 1 Maryland/Ukraine Len 2013 5 C 1 Phoenix Suns 250 5

2. Exploring Top 10 Draft Picks

2A. How Many Current NBA Players Are Top 10 Draft Picks?

Now, let's look a bit deeper into our dataset and specifically active top 10 draft picks. Below, according to the top 10 draft picks Pandas DataFrame created in part 1D, there are 124 current NBA players that were top 10 NBA Draft picks.

In [11]:
# Getting the amount of top 10 draft picks currently in the NBA, 124.
num_top10_draft_picks = top10_df.shape[0]
print("Number of current NBA players who are top 10 draft picks:")
num_top10_draft_picks
Number of current NBA players who are top 10 draft picks:
Out[11]:
124

2B. How Many Current NBA Players Are Not Top 10 Draft Picks?

Now, let's look at how many active NBA players are not top 10 draft picks. According, to the Pandas DataFrame that was created based on the NBA API JSON data, there are 498 active NBA players and, out of those, 374 players were not top 10 draft picks. This is a large majority of active NBA players, a little over 75% of active players were not top 10 draft picks. This shows it is uncommon to be a top 10 draft pick.

In [12]:
# According to the Pandas DataFrame that was created based on the
# NBA API JSON data, there are 498 current NBA players.
num_nba_players = df.shape[0]
print("Number of current NBA players:")
num_nba_players
Number of current NBA players:
Out[12]:
498
In [13]:
# By finding the difference of the total number of current NBA players
# and the total number of current NBA players that were not top 10
# draft picks (374).
print("Number of current NBA players who are not top 10 draft picks:")
num_not_top10_draft_picks = num_nba_players - num_top10_draft_picks
num_not_top10_draft_picks
Number of current NBA players who are not top 10 draft picks:
Out[13]:
374
In [14]:
# There are 374 current NBA players who were not top 10 draft picks, or
# approximately 75% of all NBA players.
percent_not_top10_draft_picks = num_not_top10_draft_picks / num_nba_players
print("Percentage of current NBA players who are not top 10 draft picks:")
percent_not_top10_draft_picks
Percentage of current NBA players who are not top 10 draft picks:
Out[14]:
0.751004016064257
In [15]:
# Plotting bar graph of total current players in NBA, current players
# who are top 10 NBA draft picks, and current players who are not 
# top 10 NBA draft picks.

current_nba_data = [
    {
        "Group": "Total NBA Players",
        "Count": num_nba_players
    },
    {
        "Group": "Total Non-Top 10 Draft Picks",
        "Count": num_not_top10_draft_picks
    },
    {
        "Group": "Total Top 10 Draft Picks",
        "Count": num_top10_draft_picks
    }
]

current_nba_df = pd.DataFrame(current_nba_data)

plt.figure(figsize=(13,7))
plt.title("How Many Current NBA Players Are Top 10 Draft Picks?", fontsize=18)
seaborn.barplot(data=current_nba_df, x="Group", y="Count")
plt.show()

2C. Top 10 Draft Picks by Position

By looking at the chart below for sorting top 10 draft picks by position, it again, shows the 'Guard' position holds the top spot in terms of positions in the top 10 active draft picks in the league.

In [16]:
by_pos_df = top10_df.groupby(top10_df["pos"]).count().reset_index()
by_pos_df = by_pos_df[["pos", "pickNum"]]
by_pos_df = by_pos_df.sort_values("pickNum", ascending=False)
by_pos_df.columns = ["Position", "Count"]
by_pos_df
Out[16]:
Position Count
5 G 46
2 F 37
0 C 18
3 F-C 10
1 C-F 5
4 F-G 4
6 G-F 4
In [17]:
# Plotting top 10 draft picks by position.
plt.figure(figsize=(13,7))
plt.title("Top 10 NBA Draft Picks by Position", fontsize=18)
seaborn.barplot(data=by_pos_df, x="Position", y="Count")
plt.show()

2D. Top 10 Draft Picks by Size

We can also look at the top 10 draft picks dataset by strictly size. This includes height and weight. If we look at the chart below we can see there is an obvious middle ground between 220 - 250 pounds. There is obviously a less of cluster as you get further away from this range. From this information, we also see there are few players that lie below 200 pounds. This contradicts our data from sorting top 10 draft picks by position. The 'Guard' position, historically, are the lowest weight players in the NBA, averaging around 190 pounds. However, as seen above, the 'Guard' position is the highest occuring position.|

In [18]:
# Top 10 Draft Picks by Weight
by_wt_df = top10_df.groupby(top10_df["weightPounds"]).count().reset_index()
by_wt_df = by_wt_df[["weightPounds", "pickNum"]]
by_wt_df = by_wt_df.sort_values("pickNum", ascending=False)
by_wt_df.columns = ["Weight", "Count"]
by_wt_df
Out[18]:
Weight Count
21 220 12
34 250 10
30 240 8
5 190 7
11 200 7
23 225 6
32 245 5
19 215 5
13 205 4
39 265 4
24 230 4
17 210 4
0 175 4
4 185 3
9 195 3
27 235 3
37 255 3
22 221 2
31 242 2
28 237 2
40 270 2
10 198 2
15 207 2
2 183 1
41 275 1
36 253 1
35 251 1
3 184 1
33 248 1
38 260 1
12 201 1
6 192 1
29 238 1
14 206 1
26 234 1
25 232 1
7 193 1
8 194 1
1 180 1
20 218 1
18 214 1
16 208 1
42 279 1

As we can see in the bar graph, the weight varies quite a bit, but there is a small cluster between ~220 - 250 pounds.

In [19]:
# Graphing the top 10 NBA draft picks by weight.
plt.figure(figsize=(13,7))
plt.title("Top 10 NBA Draft Picks by Weight", fontsize=18)
seaborn.barplot(data=by_wt_df, x="Weight", y="Count")
plt.show()

Height is also included in size. Based on just the table, we can already see top 10 NBA draft picks are quite tall. Height is another major attribute when drafting players. A very tall player with quality coordination can dominate opponents inside the paint.

In [20]:
# Top 10 Draft Picks by Height
by_ht_df = top10_df.groupby(top10_df["heightTotal"]).count().reset_index()
by_ht_df = by_ht_df[["heightTotal", "pickNum"]]
by_ht_df = by_ht_df.sort_values("pickNum", ascending=False)
by_ht_df.columns = ["Height", "Count"]
by_ht_df
Out[20]:
Height Count
12 7 0 14
6 6 4 13
11 6 9 13
2 6 10 12
3 6 11 12
9 6 7 11
10 6 8 11
5 6 3 10
7 6 5 7
8 6 6 7
13 7 1 5
1 6 1 4
0 6 0 2
4 6 2 2
14 7 3 1

Based on this bar graph, top 10 NBA Draft picks have a slight tendency to be taller on average. Unlike the previous graph, which was by weight, this graph's data is less sporadic. A majority of top 10 draft picks are 6' 7" or taller.

In [21]:
# Bar graph for top 10 NBA draft picks by height.
plt.figure(figsize=(13,7))
plt.title("Top 10 NBA Draft Picks by Height", fontsize=18)
seaborn.barplot(data=by_ht_df, x="Height", y="Count")
plt.show()

2E. Top 10 Draft Picks by Place of Origin

Another attribute to graph by is place of origin. Many NBA rookies, and eventually NBA stars, are drafted from major college basketball programs. The more money college basketball programs have, the more NBA prodigy's they can churn out. The results below are very expected. It is logical that the colleges which have historic basketball programs, which tend to have the most success each year as well as are perennial championship contenders, and which recruit the top prospects out of high school, are represented the most significantly in the Top 10 NBA Draft Pick data. Schools like Kentucky, Duke, Arizona, Indiana, Syracuse, and UNC are not only among the winningest college basketball teams, they also attract the most talented high school prospects. This results in the school also sending a fair amount of their most-talented players to the NBA. For example, the University of Kentucky is famous for their one-and-done basketball legacy. This means top recruited basketball players out of high school to the University of Kentucky only spend a year playing NCAA basketball, then get recruited in the top of their NBA class. This has been very frowned upon in the recent years due to schools seeming to focus more on what makes them money, sports, then academics, expecially for those who are recruited and stay in the college system for only one year. They lack any sort of education if their NBA career fails. However, the schools at the top of the list below are famous for winning seasons and winning the National Championship for college basketball. As an NBA team, why wouldn't you want to recruit from the teams who consistently win National Championships and produce the best basketball players?

In [22]:
# Top 10 Draft Picks by Place of Origin (College if attended, or city/country of residence). 
by_origin_df = top10_df.groupby(top10_df["lastAffiliation"]).count().reset_index()
by_origin_df = by_origin_df[["lastAffiliation", "pickNum"]]
by_origin_df = by_origin_df.sort_values("pickNum", ascending=False)
by_origin_df.columns = ["Place of Origin", "Count"]
by_origin_df
Out[22]:
Place of Origin Count
38 Kentucky/USA 10
15 Duke/USA 6
2 Arizona/USA 4
32 Indiana/USA 4
63 Syracuse/USA 3
78 Washington/USA 3
49 North Carolina/USA 3
51 Ohio State/USA 3
9 Connecticut/USA 3
26 Georgetown/USA 3
65 Texas/USA 3
66 UCLA/USA 3
47 Michigan/USA 2
36 Kansas/USA 2
23 Florida/USA 2
80 Wisconsin/USA 2
44 Memphis/USA 2
13 Duke University/USA 2
77 Wake Forest/USA 2
56 Peoria Central HS (IL)USA/USA 1
79 Weber State/USA 1
48 North Carolina State/USA 1
50 North CarolinaUSA/USA 1
76 Villanova University/USA 1
75 Vasco de Gama/Brazil 1
52 Oklahoma State/USA 1
74 Utena, Lithuania/Lithuania 1
53 Oklahoma/Bahamas 1
54 Oklahoma/USA 1
55 Olimpia Milano/Italy 1
... ... ...
16 Duke/United Kingdom 1
14 Duke/Australia 1
12 Dominguez HS (CA)/USA 1
10 Croatia/Croatia 1
21 Florida State/USA 1
8 California/USA 1
7 Cajasol Sevilla/Latvia 1
6 Butler/USA 1
5 Baylor/USA 1
4 Baloncesto Fuenlabrada/Democratic Republic of ... 1
3 Australia/Australia 1
20 Fenerbahce Ulker/Turkey 1
22 Florida/Domincan Republic 1
43 Maryland/Ukraine 1
34 Kansas/Cameroon 1
42 Marquette/USA 1
41 Louisiana-Lafayette/USA 1
1 Arizona/Finland 1
39 Lehigh/USA 1
37 Kentucky/Canada 1
35 Kansas/Canada 1
33 Kansas State/USA 1
24 France/France 1
31 Guangdong/DRC 1
30 Gonzaga/USA 1
29 Germany/Germany 1
28 Georgia/USA 1
27 Georgia Tech/USA 1
25 Fresno State/USA 1
40 Louisiana State/Australia 1

81 rows × 2 columns

3. Analysis of Top 10 Draft Picks' Attributes

3A. What Attributes Matter Most?

After we have gotten more comfortable with our dataset, we can continue to data analysis. First, we want to determine what attributes are important and which ones are not when selection a top 10 NBA draft pick. The first cell below shows all of the attributes we are currently holding.

In [23]:
# Viewing all the columns, or attributes, that we can use from the dataset.
attributes = top10_df.columns.tolist()
for a in attributes:
    print(a)
birthDay
birthMonth
birthYear
collegeName
country
draftYear
firstName
heightFeet
heightInches
heightTotal
lastAffiliation
lastName
nbaDebutYear
pickNum
pos
roundNum
teamId
weightPounds
yearsPro

Not all of the player attributes captured from the NBA API are pertinent to our analysis. We can remove unimportant player information that does not factor into whether a player is a top 10 draft pick, from the Top 10 Draft Picks DataFrame. We chose:

- First Name
- Last Name
- Position
- Height (Feet)
- Height (Inches)
- Weight (Pounds)
- Date of Birth (Year)
- Date of Birth (Month)
- Date of Birth (Day)
- College
- Draft Round Number
- Draft Pick Number
- Draft Year
In [24]:
# Not all of the player attributes captured from the NBA API are needed
# for our analysis.
edited_top10_df = top10_df[["firstName", "lastName", "pickNum", \
                            "birthDay", "birthMonth", "birthYear", \
                           "collegeName", "country", "heightFeet", \
                           "heightInches", "pos", "weightPounds", \
                           "draftYear"]]
edited_top10_df.head()
Out[24]:
firstName lastName pickNum birthDay birthMonth birthYear collegeName country heightFeet heightInches pos weightPounds draftYear
5 LaMarcus Aldridge 2 19 7 1985 Texas USA 6 11 F 260 2006
9 Al-Farouq Aminu 8 21 9 1990 Wake Forest USA 6 9 F 220 2010
16 Carmelo Anthony 3 29 5 1984 Syracuse USA 6 8 F 240 2003
20 D.J. Augustin 9 10 11 1987 Texas USA 6 0 G 183 2008
21 Deandre Ayton 1 23 7 1998 Arizona Bahamas 7 1 C 250 2018

We must reduce our table to numerical values in order to run some kind of regression on it. We will have to measure our players based on objective numerical data, mainly values related to size, such as age, height, and weight, but also position and country of origin (as numerical values). View the columns on the far right-hand side of the DataFrame below to see these new columns. The follows the comments below.

In [25]:
# Converts position strings to corresponding numbers.
# Guard - 3
# Forward - 2
# Center - 1

def convert_position_to_num(pos):
    pos_str = pos[0]
    if pos_str == "G":
        return 3
    elif pos_str == "F":
        return 2
    elif pos_str == "C":
        return 1

# Converts country strings to corresponding numbers.
# USA = 1
# Other country = 0

def convert_country_to_num(country):
    if country == "USA":
        return 1
    else:
        return 0

edited_top10_df.loc[:, "Name"] = edited_top10_df["firstName"] + " " + edited_top10_df["lastName"] 
edited_top10_df.loc[:, "Draft Pick"] = edited_top10_df["pickNum"]
edited_top10_df.loc[:, "Age"] = edited_top10_df["draftYear"] - edited_top10_df["birthYear"]
edited_top10_df.loc[:, "Height"] = edited_top10_df["heightFeet"] * 12 - edited_top10_df["heightInches"]
edited_top10_df.loc[:, "Weight"] = edited_top10_df["weightPounds"]
edited_top10_df.loc[:, "Position"] = edited_top10_df.loc[:, "pos"].apply(convert_position_to_num)
edited_top10_df.loc[:, "Country"] = edited_top10_df.loc[:, "country"].apply(convert_country_to_num)
edited_top10_df
/home/matthewmuccio/.local/lib/python3.6/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
/home/matthewmuccio/.local/lib/python3.6/site-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
Out[25]:
firstName lastName pickNum birthDay birthMonth birthYear collegeName country heightFeet heightInches pos weightPounds draftYear Name Draft Pick Age Height Weight Position Country
5 LaMarcus Aldridge 2 19 7 1985 Texas USA 6 11 F 260 2006 LaMarcus Aldridge 2 21 61 260 2 1
9 Al-Farouq Aminu 8 21 9 1990 Wake Forest USA 6 9 F 220 2010 Al-Farouq Aminu 8 20 63 220 2 1
16 Carmelo Anthony 3 29 5 1984 Syracuse USA 6 8 F 240 2003 Carmelo Anthony 3 19 64 240 2 1
20 D.J. Augustin 9 10 11 1987 Texas USA 6 0 G 183 2008 D.J. Augustin 9 21 72 183 3 1
21 Deandre Ayton 1 23 7 1998 Arizona Bahamas 7 1 C 250 2018 Deandre Ayton 1 20 83 250 1 0
23 Marvin Bagley III 2 14 3 1999 Duke USA 6 11 F 234 2018 Marvin Bagley III 2 19 61 234 2 1
26 Lonzo Ball 2 27 10 1997 California-Los Angeles USA 6 6 G 190 2017 Lonzo Ball 2 20 66 190 3 1
27 Mo Bamba 6 12 5 1998 Texas USA 7 0 C 221 2018 Mo Bamba 6 20 84 221 1 1
29 Harrison Barnes 7 30 5 1992 North Carolina USA 6 8 F 225 2012 Harrison Barnes 7 20 64 225 2 1
36 Bradley Beal 3 28 6 1993 Florida USA 6 5 G 207 2012 Bradley Beal 3 19 67 207 3 1
38 Michael Beasley 2 9 1 1989 Kansas State USA 6 9 F 235 2008 Michael Beasley 2 19 63 235 2 1
42 Dragan Bender 4 17 11 1997 None Croatia 7 1 F 225 2016 Dragan Bender 4 19 83 225 2 0
47 Bismack Biyombo 7 28 8 1992 Lubumbashi, DR Congo Democratic Republic of the Congo 6 9 C 255 2011 Bismack Biyombo 7 19 63 255 1 0
61 Mikal Bridges 10 30 8 1996 Villanova USA 6 7 F 210 2018 Mikal Bridges 10 22 65 210 2 1
69 Jaylen Brown 3 24 10 1996 California USA 6 7 G 220 2016 Jaylen Brown 3 20 65 220 3 1
76 Trey Burke 9 12 11 1992 Michigan USA 6 1 G 175 2013 Trey Burke 9 21 71 175 3 1
81 Kentavious Caldwell-Pope 8 18 2 1993 Georgia USA 6 5 G 205 2013 Kentavious Caldwell-Pope 8 20 67 205 3 1
85 Vince Carter 5 26 1 1977 North Carolina USA 6 6 F-G 220 1998 Vince Carter 5 21 66 220 2 1
86 Wendell Carter Jr. 7 16 4 1999 Duke USA 6 10 F 255 2018 Wendell Carter Jr. 7 19 62 255 2 1
90 Willie Cauley-Stein 6 18 8 1993 Kentucky USA 7 0 C 240 2015 Willie Cauley-Stein 6 22 84 240 1 1
93 Tyson Chandler 2 2 10 1982 Dominquez H.S USA 7 1 C 240 2001 Tyson Chandler 2 19 83 240 1 1
96 Marquese Chriss 8 2 7 1997 Washington USA 6 10 F 240 2016 Marquese Chriss 8 19 62 240 2 1
101 Zach Collins 10 19 11 1997 Gonzaga USA 7 0 F-C 235 2017 Zach Collins 10 20 84 235 2 1
103 Mike Conley 4 11 10 1987 Ohio State USA 6 1 G 175 2007 Mike Conley 4 20 71 175 3 1
106 DeMarcus Cousins 5 13 8 1990 Kentucky USA 6 11 C 270 2010 DeMarcus Cousins 5 20 61 270 1 1
110 Jamal Crawford 8 20 3 1980 Michigan USA 6 5 G 185 2000 Jamal Crawford 8 20 67 185 3 1
114 Stephen Curry 7 14 3 1988 Davidson USA 6 3 G 190 2009 Stephen Curry 7 21 69 190 3 1
116 Anthony Davis 1 11 3 1993 Kentucky USA 6 10 F-C 253 2012 Anthony Davis 1 19 62 253 2 1
123 Luol Deng 7 16 4 1985 Duke United Kingdom 6 9 F 237 2004 Luol Deng 7 19 63 237 2 0
124 DeMar DeRozan 9 7 8 1989 Southern California USA 6 7 G 220 2009 DeMar DeRozan 9 20 65 220 3 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
380 Otto Porter Jr. 3 3 6 1993 Georgetown USA 6 8 F 198 2013 Otto Porter Jr. 3 20 64 198 2 1
382 Kristaps Porzingis 4 2 8 1995 None Latvia 7 3 F-C 240 2015 Kristaps Porzingis 4 20 81 240 2 0
390 Julius Randle 7 29 11 1994 Kentucky USA 6 9 F 250 2014 Julius Randle 7 20 63 250 2 1
396 Austin Rivers 10 1 8 1992 Duke USA 6 4 G 200 2012 Austin Rivers 10 20 68 200 3 1
404 Derrick Rose 1 4 10 1988 Memphis USA 6 3 G 200 2008 Derrick Rose 1 20 69 200 3 1
405 Terrence Ross 8 5 2 1991 Washington USA 6 7 G-F 206 2012 Terrence Ross 8 21 65 206 3 1
407 Ricky Rubio 5 21 10 1990 El Masnou, Spain Spain 6 4 G 190 2009 Ricky Rubio 5 19 68 190 3 0
408 D'Angelo Russell 2 23 2 1996 Ohio State USA 6 5 G 198 2015 D'Angelo Russell 2 19 67 198 3 1
416 Collin Sexton 8 4 1 1999 Alabama USA 6 2 G 190 2018 Collin Sexton 8 19 70 190 3 1
420 Ben Simmons 1 20 7 1996 Louisiana State Australia 6 10 G-F 230 2016 Ben Simmons 1 20 62 230 3 0
423 Marcus Smart 6 6 3 1994 Oklahoma State USA 6 4 G 220 2014 Marcus Smart 6 20 68 220 3 1
427 Dennis Smith Jr. 9 25 11 1997 North Carolina State USA 6 3 G 195 2017 Dennis Smith Jr. 9 20 69 195 3 1
431 Nik Stauskas 8 7 10 1993 Michigan Canada 6 6 G 205 2014 Nik Stauskas 8 21 66 205 3 0
436 Jayson Tatum 3 3 3 1998 Duke USA 6 8 F 208 2017 Jayson Tatum 3 19 64 208 2 1
446 Tristan Thompson 4 13 3 1991 Texas Canada 6 10 C-F 238 2011 Tristan Thompson 4 20 62 238 1 0
449 Karl-Anthony Towns 1 15 11 1995 Kentucky USA 7 0 C 248 2015 Karl-Anthony Towns 1 20 84 248 1 1
453 Evan Turner 2 27 10 1988 Ohio State USA 6 7 G-F 220 2010 Evan Turner 2 22 65 220 3 1
455 Ekpe Udoh 6 20 5 1987 Baylor USA 6 10 C-F 245 2010 Ekpe Udoh 6 23 62 245 1 1
457 Jonas Valanciunas 5 6 5 1992 Utena, Lithuania Lithuania 7 0 C 265 2011 Jonas Valanciunas 5 19 84 265 1 0
461 Noah Vonleh 9 24 8 1995 Indiana USA 6 9 F 250 2014 Noah Vonleh 9 19 63 250 2 1
463 Dwyane Wade 5 17 1 1982 Marquette USA 6 4 G 220 2003 Dwyane Wade 5 21 68 220 3 1
465 Dion Waiters 4 10 12 1991 Syracuse USA 6 4 G 215 2012 Dion Waiters 4 21 68 215 3 1
466 Kemba Walker 9 8 5 1990 Connecticut USA 6 1 G 184 2011 Kemba Walker 9 21 71 184 3 1
468 John Wall 1 6 9 1990 Kentucky USA 6 4 G 210 2010 John Wall 1 20 68 210 3 1
474 Russell Westbrook 4 12 11 1988 UCLA USA 6 3 G 200 2008 Russell Westbrook 4 20 69 200 3 1
478 Andrew Wiggins 1 23 2 1995 Kansas Canada 6 8 F-G 194 2014 Andrew Wiggins 1 19 64 194 2 0
484 Marvin Williams 2 19 6 1986 North Carolina USA 6 9 F 237 2005 Marvin Williams 2 19 63 237 2 1
488 Justise Winslow 10 26 3 1996 Duke USA 6 7 F 225 2015 Justise Winslow 10 19 65 225 2 1
494 Trae Young 5 19 9 1998 Oklahoma USA 6 2 G 180 2018 Trae Young 5 20 70 180 3 1
495 Cody Zeller 4 5 10 1992 Indiana USA 7 0 C 240 2013 Cody Zeller 4 21 84 240 1 1

124 rows × 20 columns

3B. Reducing and Comparing Attributes

Now, that we have limited our attributes, we will create a new dataframe to look at only the attributes that are most important.

In [26]:
# Creating a new reduced dataframe.
reduced_top10_df = edited_top10_df[["Name", "Draft Pick", "Age", "Height", "Weight", "Position", "Country"]]
reduced_top10_df.head()
Out[26]:
Name Draft Pick Age Height Weight Position Country
5 LaMarcus Aldridge 2 21 61 260 2 1
9 Al-Farouq Aminu 8 20 63 220 2 1
16 Carmelo Anthony 3 19 64 240 2 1
20 D.J. Augustin 9 21 72 183 3 1
21 Deandre Ayton 1 20 83 250 1 0

We are further reducing our DataFrame such that each individual value will become a rating of that category on a 0-100 scale. Simply divide the value by the maximum value in its column to obtain the rating. For the Position rating, it is more favorable for a player to be a Guard, then a Forward, than a Center in order to be selected higher, according to the data trends we saw above. (Thus, 1 = G, 0.66 = F, 0.33 = C) For the Country rating, it is more favorable for a player to be from the USA in order to be selected higher, according to the data trends we saw above. (Thus, 1 = USA, 0 = Any other country).

In [27]:
final_top10_df = reduced_top10_df.copy()
final_top10_df.loc[:, "Draft Pick"] = 100 * (1 - ((final_top10_df["Draft Pick"] - 1) / final_top10_df["Draft Pick"].max()))
final_top10_df.loc[:, "Age"] = 100 * (final_top10_df["Age"] / final_top10_df["Age"].max())
final_top10_df.loc[:, "Height"] = 100 * (final_top10_df["Height"] / final_top10_df["Height"].max())
final_top10_df.loc[:, "Weight"] = 100 * (final_top10_df["Weight"] / final_top10_df["Weight"].max())
final_top10_df.loc[:, "Position"] = 100 * (final_top10_df["Position"] / final_top10_df["Position"].max())
final_top10_df.loc[:, "Country"] = 100 * (final_top10_df["Country"] / final_top10_df["Country"].max())
final_top10_df.head()
Out[27]:
Name Draft Pick Age Height Weight Position Country
5 LaMarcus Aldridge 90 91.3043 72.619 93.19 66.666667 100.0
9 Al-Farouq Aminu 30 86.9565 75 78.853 66.666667 100.0
16 Carmelo Anthony 80 82.6087 76.1905 86.0215 66.666667 100.0
20 D.J. Augustin 20 91.3043 85.7143 65.5914 100.000000 100.0
21 Deandre Ayton 100 86.9565 98.8095 89.6057 33.333333 0.0

4. Finding Key Attributes Using Multiple Linear Regression

4A. Null Hypothesis Testing

We are trying to view the effects of five different, or aggregate player attributes (Age, Weight, Height, Position, and Country) on the overall player draft pick.

Null Hypothesis: None of the player attributes have a veritable impact on the player draft pick. In order to test the null hypothesis, we are going to perform Multiple Linear Regression on the dataset using SciKit-Learn.

4B. Using SciKit-Learn and StatsModels for Regression Model

We will use our reduced Top 10 Draft Pick DataFrame with the numerical player attributes for the multiple regression model. We will create a new DataFrame for the features of the regression. We will also create a new DataFrame for the target of the regression. The features act as the independent variables, which include Age, Weight, Height, Position, and Country when drafted. The target acts as the dependent variable, which is the player's overall draft pick.

In [28]:
columns = ["Age", "Weight", "Height", "Position", "Country"]
features = final_top10_df[columns]
target = final_top10_df[["Draft Pick"]]

Defines X and y for use in the LinearRegression() function from SciKit-Learn. We will then fit the linear regression model.

In [29]:
X, y = features, target["Draft Pick"]
lin_model = linear_model.LinearRegression()
model = lin_model.fit(X, y)

Our R-squared score is supposed to test how well the variance is explained by the model. As values range from 0 to 1, the 0.066 value means that none of the variance can be explained by the model.

In [30]:
lin_model.score(X, y)
Out[30]:
0.06646315824053861

We will now find the coefficients from the model in order to determine which attributes had the least or most significant impact overall. It seem as though weight has the largest impact overall, and not much else seems to have a correlation. However, in order to determine with certainty which player attributes have the most impact on the overall draft pick and to test the null hypothesis correctly, we must calculate the p-values using StatsModels.

In [31]:
sklearn_coefficients = lin_model.coef_.tolist()
for i in range(len(columns)):
    print("Player Attribute: {0}, Coefficient: {1}".format(columns[i], sklearn_coefficients[i]))
    print()
Player Attribute: Age, Coefficient: -1.183334963400501

Player Attribute: Weight, Coefficient: 0.34881396034339474

Player Attribute: Height, Coefficient: -0.3270327185286698

Player Attribute: Position, Coefficient: 0.020243605293690153

Player Attribute: Country, Coefficient: -0.023549391367973516

We can reuse the Features (X) and Target (y) variables that we created in the last step to create a regression with stats model. We simply must add a constant in StatsModels. The model uses the method of Ordinary Least Squares. Its objective is to minimize the sum of squared distances between the actual numerical values in the dataset and the generated predicted values in the regression.

In [32]:
sm_X = X
sm_y = y

sm_X = sm.add_constant(X)

ols_model = sm.OLS(y.astype(float), X.astype(float)).fit()
ols_model.summary()
Out[32]:
OLS Regression Results
Dep. Variable: Draft Pick R-squared: 0.807
Model: OLS Adj. R-squared: 0.799
Method: Least Squares F-statistic: 99.67
Date: Sat, 15 Dec 2018 Prob (F-statistic): 7.48e-41
Time: 17:17:50 Log-Likelihood: -587.57
No. Observations: 124 AIC: 1185.
Df Residuals: 119 BIC: 1199.
Df Model: 5
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Age -0.3195 0.455 -0.702 0.484 -1.220 0.581
Weight 1.0175 0.321 3.165 0.002 0.381 1.654
Height -0.1535 0.303 -0.507 0.613 -0.754 0.447
Position 0.2345 0.134 1.749 0.083 -0.031 0.500
Country -0.0264 0.063 -0.419 0.676 -0.151 0.099
Omnibus: 43.680 Durbin-Watson: 2.030
Prob(Omnibus): 0.000 Jarque-Bera (JB): 7.400
Skew: -0.073 Prob(JB): 0.0247
Kurtosis: 1.812 Cond. No. 38.6


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The only player attributes that seem to have a meaningful impact on the model is weight and position. As seen in the "P > |t|" column below, only two of the values are close to the critical value of 5% (p-value of 0.05), 0.002 and 0.083, for weight and position, respectively. We can partially reject the null hypothesis, because it is clear that some player attributes contribute to the overall draft pick. Our R-squared value, 0.807 is fair, as it shows that we did not overfit the model. We will be able to figure out more regarding our R-squared value in the next step with training and testing.

5. Predicting the Ideal Draft Pick Based on Player Attributes with ML

5A. Training and Testing

We will be able to re-use the features (X) and target (y) from the SciKit-Learn regression model. We will split the dataset into training data and testing data for both of the variables. Variables, training_X and training_y are used to generate, or train, the regression model. Then, the testing_X data is used with the model in order to make predictions for the predicted overall draft picks. Then, the predictions are compared to the actual draft picks in testing_y. We decided to split up training and testing data in a 75%/25% split, in favor of training. A majority of the dataset should be used for training the model, but since our dataset is not significantly large, we decided on 75% over 80%, 90%, or higher. We will display the first 15 results of the predicted draft picks.

In [33]:
training_X, testing_X, training_y, testing_y = model_selection.train_test_split(X, y, test_size=0.25)

lin_model = linear_model.LinearRegression()
model = lin_model.fit(training_X, training_y)

predictions = lin_model.predict(testing_X)

for p in predictions[0:15]:
    print(p)
45.609138541413955
64.40331011525883
58.18968850622154
66.41595408077855
64.96677510088088
65.3990736493505
63.586677934535274
54.078909573592426
55.61003041588606
63.58296514613826
65.83137219782014
53.9110483227645
48.31870341036114
53.74725863943901
58.38993675692603

We will then plot the predicted draft pick data from the linear regression model against the actual values from the dataset in the testing_y variable. We will add a trend/identity line to manifest how closely the predictions from the new linear regression model are to the actual player draft picks. If the predictions are accurate, the plot points will follow the trend line.

In [34]:
plt.figure(figsize=(6, 4))
plt.title("Predicted Values vs. Actual Values for Top 10 Player Draft Picks", fontsize=15)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.scatter(testing_y, predictions)
plt.plot(testing_y, testing_y, color="Red")
plt.show()

As we can see plot points do not follow the trend line very well. This is, again, proves only limited attributes have relationships.

Project Conclusion

6A. Closing Statement About Attributes

From our models, it is clearly difficult to correlate player attributes to the overall draft pick. This only adds to the difficulty that Coaches, General Managers, Owners of NBA franchises face when the NBA Draft season comes around each year.

The only attribute that truly seemed to correlate with a higher draft pick was the weight of the player. It seems as though more research must be done in order to determine which player attributes matter most in regard to drafting a top 10 pick. It is surprising that other attributes such as height, colleges, or place of origin did not correlate more with the data. These attributes where among the strongest when analyzing different parts of the data.

There are many other attributes that can be added or subtracted from this dataset. We did not touch any gameplay statistics from players. This could be the logical next step to creating an accurate model. One, the dataset would be much larger, since there are many, many gameplay statistics kept on each player. Two, we might be able to find that those attributes, meaning high gameplay statistics, would correlate better with players who were top 10 NBA draft picks and players who were not top 10 NBA draft picks.

6B. Closing Statement About Draft Pick Prediction

We now have a couple conclusions from our models. One, we can see that these attributes, while they seem strong on the surface, do not correlate closely to why a player is a top 10 draft pick. Two, these results give us a next step to look at other player attributes such as gameplay statistics from a top 10 draft pick before they were drafted, or looking at their performance in the league after they were drafted. These could give insight into what makes a top 10 NBA draft pick, while also making the sample size larger.

Even though the results where weak in terms of correlation, this does bring up a few good points. For example, there stereotypes like 'because a player played for University of Kentucky they will be a top 10 draft pick'. However, we can see that they almost do not correlate at all. From these results we can see that many aspects of the data can be improved and added upon to make a more accurate model and truly find 'What Makes a Top 10 NBA Draft Pick?'.

6C. Final Thoughts & Other Resources

Thank you very much for reading through our project and data analysis on Top 10 NBA Draft Picks. Please feel free to contact us with any comments, concerns, or feedback regarding the project, data analysis, or decriptions.

If you would like to learn more about similar topics and research about NBA statistics please visit the links below:

The Length and Success of NBA Careers:Does College Production Predict Professional Outcomes?

A Starting Point for Analyzing Basketball Statistics