This analysis focuses on the Pima Indians Diabetes Database (the data is [here]). It was created by the National Institute of Diabetes and Digestive and Kidney Diseases. Our aim is to learn the age of a patient given the following parameters [paper]:
- preg. Number of times pregnant
- plas. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- pres. Diastolic blood pressure (mm Hg)
- skin. Triceps skin fold thickness (mm)
- test. 2-Hour serum insulin (mu U/ml)
- mass. Body mass index (weight in kg/(height in m)^2)
- pedi. Diabetes pedigree function
- class. Class variable (0 or 1)
We will use a machine learning method to take a sample of the data and learn with it. Then we will make predictions and see if we can predict the age of the patient from the data. Let's first read in the data:
import numpy as np import pandas as pd import sys from sklearn.cross_validation import train_test_split from sklearn.ensemble import RandomForestRegressor ver=pd.read_csv("di.csv")
Now let's generate the training data (taking 33% of the data) to train the system. The rows will be taken randomly from the dataset:
train, test, y_train, y_test = train_test_split(ver[["preg","plas","pres","skin","test","mass","pedi"]],ver["age"],test_size=0.33, random_state=1)
Now we will fit a model using the random forest method:
model= RandomForestRegressor(n_estimators=100,min_samples_leaf=10) model.fit(train,y_train)
We should now have created our model. Let's make prediction on our data:
predictions =model.predict(test)
This will give us values for age. Let's process these and define if the magnitude of the error is less than 6 we will see that as a success:
for x in range(0, len(predictions)): error = abs(int(predictions[x])-ver["age"][x]) if (error<=6): str = "Success" success=success+1 else: str="Failed!" failure = failure+1 print('%4d %4d %4d %s' % (int(predictions[x]),ver["age"][x],error,str) ) print ('Success: %3d Fail: %3d' % (success,failure))
If we run the model here are the results:
Pred Act Diff ---------------------- 47 50 3 Success 30 31 1 Success 34 32 2 Success 30 21 9 Failed! 28 33 5 Success 25 30 5 Success 32 26 6 Success 23 29 6 Success 25 53 28 Failed! 25 54 29 Failed! 43 30 13 Failed! 24 34 10 Failed! 34 57 23 Failed! 43 59 16 Failed! 24 51 27 Failed! 49 32 17 Failed! 25 31 6 Success 40 31 9 Failed! 26 33 7 Failed! 33 32 1 Success 33 27 6 Success 39 50 11 Failed! 43 41 2 Success 45 29 16 Failed! 23 51 28 Failed! 40 41 1 Success 28 43 15 Failed! 49 22 27 Failed! 25 57 32 Failed! 39 38 1 Success 40 60 20 Failed! 33 28 5 Success 23 22 1 Success 30 28 2 Success 24 45 21 Failed! 29 33 4 Success 23 35 12 Failed! 26 46 20 Failed! 44 27 17 Failed! 41 56 15 Failed! 25 26 1 Success 24 37 13 Failed! 24 48 24 Failed! 28 54 26 Failed! 23 40 17 Failed! 27 25 2 Success 30 29 1 Success 33 22 11 Failed! 23 31 8 Failed! 28 24 4 Success 26 22 4 Success 26 26 0 Success 44 30 14 Failed! 22 58 36 Failed! 42 42 0 Success 25 21 4 Success 31 41 10 Failed! 46 31 15 Failed! 47 44 3 Success 34 22 12 Failed! 42 21 21 Failed! 28 39 11 Failed! 38 36 2 Success 28 24 4 Success 28 42 14 Failed! 24 32 8 Failed! 38 38 0 Success 30 54 24 Failed! 45 25 20 Failed! 46 27 19 Failed! 43 28 15 Failed! 29 26 3 Success 37 42 5 Success 23 23 0 Success 44 22 22 Failed! 44 22 22 Failed! 45 41 4 Success 39 27 12 Failed! 32 26 6 Success 41 24 17 Failed! 41 22 19 Failed! 29 22 7 Failed! 40 36 4 Success 27 22 5 Success 27 37 10 Failed! 43 27 16 Failed! 39 45 6 Success 43 26 17 Failed! 28 43 15 Failed! 29 24 5 Success 23 21 2 Success 26 34 8 Failed! 25 42 17 Failed! 28 60 32 Failed! 30 21 9 Failed! 44 40 4 Success 25 24 1 Success 35 22 13 Failed! 28 23 5 Success 24 31 7 Failed! 31 33 2 Success 26 22 4 Success 46 21 25 Failed! 24 24 0 Success 26 27 1 Success 41 21 20 Failed! 42 27 15 Failed! 45 37 8 Failed! 24 25 1 Success 25 24 1 Success 29 24 5 Success 24 46 22 Failed! 36 23 13 Failed! 26 25 1 Success 35 39 4 Success 25 61 36 Failed! 39 38 1 Success 26 25 1 Success 28 22 6 Success 28 21 7 Failed! 36 25 11 Failed! 46 24 22 Failed! 44 23 21 Failed! 28 69 41 Failed! 24 23 1 Success 38 26 12 Failed! 38 30 8 Failed! 41 23 18 Failed! 37 40 3 Success 28 62 34 Failed! 42 33 9 Failed! 27 33 6 Success 29 30 1 Success 33 39 6 Success 28 26 2 Success 24 31 7 Failed! 36 21 15 Failed! 43 22 21 Failed! 24 29 5 Success 41 28 13 Failed! 33 55 22 Failed! 24 38 14 Failed! 42 22 20 Failed! 37 42 5 Success 40 23 17 Failed! 45 21 24 Failed! 25 41 16 Failed! 34 34 0 Success 28 65 37 Failed! 33 22 11 Failed! 26 24 2 Success 46 37 9 Failed! 49 42 7 Failed! 31 23 8 Failed! 44 43 1 Success 38 36 2 Success 42 21 21 Failed! 47 23 24 Failed! 27 22 5 Success 28 47 19 Failed! 29 36 7 Failed! 28 45 17 Failed! 25 27 2 Success 23 21 2 Success 43 32 11 Failed! 41 41 0 Success 37 22 15 Failed! 37 34 3 Success 39 29 10 Failed! 46 29 17 Failed! 42 36 6 Success 36 29 7 Failed! 35 25 10 Failed! 24 23 1 Success 30 33 3 Success 44 36 8 Failed! 51 42 9 Failed! 25 26 1 Success 37 47 10 Failed! 36 37 1 Success 42 32 10 Failed! 32 23 9 Failed! 27 21 6 Success 50 27 23 Failed! 23 40 17 Failed! 42 41 1 Success 42 60 18 Failed! 29 33 4 Success 41 31 10 Failed! 39 25 14 Failed! 37 21 16 Failed! 28 40 12 Failed! 28 36 8 Failed! 41 40 1 Success 31 42 11 Failed! 36 29 7 Failed! 47 21 26 Failed! 24 23 1 Success 32 26 6 Success 33 29 4 Success 43 21 22 Failed! 41 28 13 Failed! 30 32 2 Success 24 27 3 Success 47 55 8 Failed! 45 27 18 Failed! 33 57 24 Failed! 42 52 10 Failed! 25 21 4 Success 44 41 3 Success 23 25 2 Success 30 24 6 Success 43 60 17 Failed! 49 24 25 Failed! 24 36 12 Failed! 40 38 2 Success 43 25 18 Failed! 23 32 9 Failed! 25 32 7 Failed! 25 41 16 Failed! 24 21 3 Success 27 66 39 Failed! 24 37 13 Failed! 32 61 29 Failed! 32 26 6 Success 27 22 5 Success 31 26 5 Success 22 24 2 Success 23 31 8 Failed! 46 24 22 Failed! 44 22 22 Failed! 43 46 3 Success 24 22 2 Success 25 29 4 Success 25 23 2 Success 29 26 3 Success 30 51 21 Failed! 42 23 19 Failed! 48 32 16 Failed! 45 27 18 Failed! 28 21 7 Failed! 38 22 16 Failed! 24 22 2 Success 45 33 12 Failed! 28 29 1 Success 25 49 24 Failed! 34 41 7 Failed! 34 23 11 Failed! 27 34 7 Failed! 28 23 5 Success 35 42 7 Failed! 29 27 2 Success 34 24 10 Failed! 44 25 19 Failed! Success: 104 Fail: 150
We can estimate the success if we guess an age of 26 each time. This will mean that a value from 20 to 32 will be correct, which is 13 values. As we have values of age from 20 to 69, there will be 49 different values. So the random choice is 26.5%, where we achieved 41% ... and that's machine learning!
Demo
There is the demo: