Diabetic Classification: A Rigorous Analysis

3 minute read

Published:

Under Construction.

Descriptiopn

Contents

  • Dataset
  • Codes
    • Mount google drive
    • Required libraries and libraries
    • Load Data Function
    • Algorithm Function
    • Show Result Function
    • Run the program
    • Download Code

Dataset

Demo

No.PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
161487235033.60.627501
21856629026.60.351310
38183640023.30.672321
418966239428.10.167210
50137403516843.12.288331
65116740025.60.201300
7378503288310.248261
81011500035.30.134290
92197704554330.50.158531
108125960000.232541

🔗👉Download Dataset

Mount google drive

I Lorem ipsum is a placeholder text commonly used to demonstrate the visual form of a document or a typeface without relying on meaningful content.

Mount google drive

Copy the code:

from google.colab import drive
drive.mount('/content/gdrive')

Required libraries and libraries

I Lorem ipsum is a placeholder text commonly used to demonstrate the visual form of a document or a typeface without relying on meaningful content.

Required Libraries

Copy the code:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import zero_one_loss
from sklearn.preprocessing import StandardScaler

Load Data Function

I Lorem ipsum is a placeholder text commonly used to demonstrate the visual form of a document or a typeface without relying on meaningful content.

Load Data Function

Copy the code:

def load_data():

#Load Data
  dataset = pd.read_csv('/content/gdrive/MyDrive/Robotech/ML/week_2/diabetes.csv')

#The values of some columns are not measured.
  zero_not_accepted = ['Glucose', 'BloodPressure', 'SkinThickness', 'BMI', 'Insulin']

  for column in zero_not_accepted:

    #Set the data whose value is zero to 'Nan'
    dataset[column] = dataset[column].replace(0, np.nan)

    #Calculate the average regardless of Nan values
    mean = int(dataset[column].mean(skipna=True))

    #Substituting the average value for the Nan values
    dataset[column] = dataset[column].replace(np.nan, mean)

#Separation of 'Data(features)' from dataset
  data = dataset.iloc[:, 0:8]

#Separation of 'Label' from dataset
  label = dataset.iloc[:, 8]

  #Separation of 'Test Data' and 'Train Data' from dataset
  #20% for test data
  #80% for train data

  '''
  x_train: Train Features
  x_test: Test Features
  y_train: Train Label
  y_test: Test Label
  '''
  x_train, x_test, y_train, y_test = train_test_split(data, label, test_size=0.2, random_state=42)

  return x_train, x_test, y_train, y_test

Algorithm Function

I Lorem ipsum is a placeholder text commonly used to demonstrate the visual form of a document or a typeface without relying on meaningful content.

Algorithm Function

Copy the code:

  def algorithm():
  #Training Model
  clf = KNeighborsClassifier(n_neighbors=11, metric= "euclidean")
  clf.fit(x_train, y_train)

return clf

Show Result Function

I Lorem ipsum is a placeholder text commonly used to demonstrate the visual form of a document or a typeface without relying on meaningful content.

Show Result Function

Copy the code:

def show_result():

  y_pred = clf.predict(x_test)

  #Accuracy
  acc = accuracy_score(y_test, y_pred)
  print("Accuracy: {:.2f}".format(acc*100))

  #Loss
  loss = zero_one_loss(y_test, y_pred)
  print("Loss: {:.2f}".format(loss*100))

Run the program

I Lorem ipsum is a placeholder text commonly used to demonstrate the visual form of a document or a typeface without relying on meaningful content.

Run the program

Copy the code:

x_train, x_test, y_train, y_test = load_data()

clf = algorithm()

show_result()

Code

🔸 Downlload code
🔸 Colab