Cyber&Data: Introduction to ML (Part 2)This provides a foundation in the usage of machine learning in Splunk. Objectives
ContentTutorialPredicting Categorical Fields}1. Select the "Splunk Machine Learning Toolkit App", and then select "Experiments". Next select "Predict Categorical Fields". Then create a new experiment and give it the name of "Firewall". Once created, then add the firewall traffic dataset:| importlookup firewall_traffic.csv
How many records are there?
| importlookup firewall_traffic.csv | head 5000 Next we will use logistic regression to make a prediction. For the algorithm select "LogisticRegression", and then "used_by_malware" as the field to predict, and then select all the other fields for the fields to be used for predicting. Confirm that we are using 70\% of the data to predict, and then select "Fit Model".
What are the values in the confusion matrix?
Next change the model to "SVM", and recompute your model:
What are the values in the confusion matrix?
Next change the model to "RandomForestClassifer", and recompute your model:
What are the values in the confusion matrix?
Try the other available models, and determine which one is the best for this dataset:
Which is the best model?
2. First select Prediction Categorical Fields, and give the experiment a name (such as cars). We first read the data in with: | importlookup track_day.csvAnswer the following :
How many records are there?
Once populated, select logistic regression for your model. And then use the numeric fields to predict this, and where we will use 70% of the data to train, and then 30\% of the data to test our prediction.
Precision:
And so while the success rate for true positives is fairly good, next use Random Forest Classifier (and which is made up from a number of models). What are the results:
Precision:
Predicting Numeric Fields3. First select Prediction Numerical Fields, and give the experiment a name (such as cancerdr_experiment). We first read the data in with: | inputlookup df.csv Now train for the Cancer DR against the other parameters, and determine the best model to use:
Model 1:
|