Machine Learning has far-reaching applications in mobility, especially in enterprise mobile apps, as companies are increasingly relying on big data for achieving greater cost efficiencies through innovative implementation of technologies, for solving complex business problems

Predictive Predicament

With the adoption of IoT (“internet of things”) devices and sensors, modern datasets have become very rich in information. The datasets have thus become highly dimensional, with hundreds or thousands of features that have high degrees of relevance and correlation. Presented with such a huge range of intersecting data points, the models being developed are prone to choking due to higher training time. They also run the risk of overfitting due to increase in number of features. Overfitting refers to an error in modelling, which happens when a function is found to be closely fitting a set of data that is limited in scope.

Feature Selection is also known as “variable selection” or “attribute selection,” and is a very important component of machine learning environment. It involves the automatic selection of the most important attributes in a data set, which help improve the predictive model that is being developed. When used properly, feature selection methods can aid in the removal of irrelevant or redundant attributes from data, which do not contribute directly to the accuracy of the predictive model.

Feature Selection Techniques

Here are the three major feature selection techniques and their traits:

1. Filter Method

In this method, a data scientist applies a statistical measure to assign appropriate scores to each feature. Upon being ranked, the features are either selected or removed from the dataset. Usually, the filter method is univariate, which means that it considers each feature independently without analyzing causes or relationships with other features. It simply takes the data, summarizes it, and then finds a pattern.

The patterns that are usually found in univariate data include:

  • Central Tendency (mean, median, or mode).
  • Dispersion (maximum and minimum range, variance, interquartile range, and standard deviation).

2. Wrapper Method

In this method, a data scientist first prepares several subsets where different combinations of data features are created, evaluated and compared. Each subset is used to train a model that is further tested on a hold-out set. Based on the number of errors on the hold-out set, the subset is given a predictive score. A predictive model based on this method compares the different subsets and picks out the best performing subset for a specific search problem.

The two main statistical methods applied are:

  • Heuristic: A sequential selection adds or removes a single feature at each step of the search. Either we start from a full set of features and gradually remove features (backward elimination) OR we start from an empty set and add go on adding features one by one (forward selection).
  • Recursive Feature Elimination: A greedy algorithm that repeatedly creates subsets by eliminating the worst performing feature and ranks the subsets accordingly.

However, this method is computationally very expensive and cannot be applied to very large datasets.

3. Embedded Method

This method combines the best qualities of filter and wrapper methods and works on insights provided by other machine learning algorithms. Embedded method learns which features are most suitable for the accuracy of the model, while the model is being created. It utilizes regularization, which involves introducing additional constraints into the predictive algorithm to reduce overfitting.

The most commonly used method is LASSO (least absolute shrinkage and selection operator), which is a regression analysis method that adds a penalty equivalent to the absolute value of a magnitude of coefficients. Another method is Ridge Regression, which improves prediction error by shrinking the large regression coefficients in order to reduce overfitting. However, it does not make the model interpretable.

Conclusion

Feature selection techniques in machine learning are highly contextual and are largely dependent upon the characteristics of datasets. While there is no last word on feature selection, data scientists should apply all methods and understand their practical implications vis-a-vis the objectives.