Learn Machine Learing and Splunk

Splunk is one of the most successful packages for Cybersecurity analytics, and defines seven main elements for machine learning (Figure 1):

Learn Machine Learning and Splunk

Splunk is one of the most successful packages for Cybersecurity analytics, and defines seven main elements for machine learning (Figure 1):

  • Preprocessing: This defines how the data is scaled to produce the correct range (such as for numerical values to be scaled to a given range). A typical method is StandardScalar.
  • Feature Extraction: This defines a method to extract key features that are required for the machine to learn on. Typical methods are PCA (Principle Component Analysis) and TFIDF.
  • Analysing data: This involves analysing the correlations between data. Typical methods include ACF (autocorrelation factors) and PACF (partial autocorrelation factors).
  • Classification: This involves classifying data into groups. Typical methods include: SVM and RandomForestClassifier.
  • Group events: This normally involves clustering. Kmeans and BIRCH are typical methods.
  • Detection of outliers: This defines anomalies within the data sets, and be used in anomaly detection. A typical method is OneClassSVM.
  • Prediction: This makes predictions on the data given a set of known inputs, and can either be numerical predictions (such as using linear regression, random forest regression, lasso, and decision tree regression) or categorical (such as with logistic regression).
  • Forecasting: This defines a method to predict future data values from the history of the data. Typical methods are ARIMA (Autoregressive integrated moving average) and KalmanFilter.

Figure 1: Machine Learning Ref: https://docs.splunk.com/images/2/20/Machine-learning-quick-ref-guide.pdf

Here is a tutorial: