Powering innovations with smart algorithms
Smart algorithms on big data power innovations leading to speedier transformation
Unsupervised machine learning relies on an algorithm’s discretion to traverse large datasets to select, extract and report compelling relationships. In comparison to supervised machine learning, which guides the program to convergence, these systems are unleashed to self-navigate through intricacies of data without much support. As the program searches through a plethora of patterns, it often discovers novel associations that are reported after being prioritized using statistical metrics.
To manage such large magnitude of trends, a potential starting point is to focus on itemsets. Broadly speaking, an itemset is a specific type of pattern within a given database. For example, consider a retail dataset of customers purchasing four types of breakfast products such as: Cereal, Milk, Bread, and Egg. So, for a given day, the top row could record items purchased by the first customer such as (bran, skim, brown, small). This would mean that this customer bought bran cereal, skim milk, brown bread and small sized eggs. Similarly, the second row could be (wheat, regular, white, large), and the third row might be (wheat, skim, white, small). In summary, each of the above entry represent the transaction recorded after the sale made to the three customers respectively.
In this simplistic scenario, an example of an itemset could be something like (bran, X, X, small) which means all shoppers that purchased both bran cereal as well as small sized eggs regardless of having bought any other items. The symbol ‘X’ is regarded as a wild card that basically matches all values in our simplified dataset. Also, as this itemset is only represented by the first transaction, its frequency is considered as being one. Another itemset could be (X, skim, X, small), which would mean all customers that bought skim milk and small sized eggs irrespective of other products. As this matches the first and the third transactions, the count of this itemset is two.
Itemsets that occur frequently provide valuable piece of information especially in the retail sector as they represent cluster of products that are repeatedly bought together. However, despite being useful, what they lack on their own is an element of predictability. In other words, a frequent itemset such as (X, skim, X, small) does not tell us whether customers buying skim milk are more likely to buy small sized eggs, or, is it that customers that buy small sized eggs have higher propensity to buy skim milk. In summary, a frequent itemset represent a group of products that are regularly bought together without attempting to anticipate customer behaviour.
Interestingly, during the mid-nineties, a group of IBM researchers led by Rakesh Agrawal exploited the mathematical properties of itemsets and smartly combined it with conditional probabilities to transform them into association rules. Association rules are written in the format of IF (condition) THEN (action). An example of such a rule from our example could be IF (small eggs) THEN (skim milk). This rule links small sized eggs with skim milk and predicts that a customer who buys small sized eggs is highly likely to also purchase skim milk. It is important to note that the inverse of this relationship may not necessarily be true. In fact, the key power of an association rule is that it is directional in the sense that the right-hand side of the rule makes a prediction of what is likely to happen in case if the left-hand side of the rule were to take place. The researchers called this approach market basket analysis, which results in identification of multitude of association rules that are shortlisted based on statistical measures.
The application of market basket analysis has immensely transformed the retail sector causing a strategic shift in managing placement of items, store layout, marketing, sales, recommendations, as well as warehousing and the supply chain. The core foundation is provided by association rules which often connect disparate products that are found within large volumes of retail transactions. The strongly coupled goods are placed nearby for better visibility, and this principal is holistically applied across all products leading to an effective placement of goods within the store. Additionally, association rules can be used to develop customer profiles for specific products. For example, we could learn that a given age-bracket is prone to buy a specific combination of items. This information helps with bundling, marketing and advertising of targeted goods to those respective customer clusters. Moreover, association rules enhance revenue by cross selling of strongly linked items and they also drive the recommendation engine that is a popular feature with modern online stores. Finally, as group of products that are sold simultaneously run out of stock concurrently, the supply chain process refills quantities collectively and stores them close to each other for efficient warehouse usage.
Besides retail, there are many other sectors that are using association rules for analytics due to several reasons. Firstly, the association rules display versatility by not requiring domain knowledge for its implementation. In fact, its true powers lie in discovering unbiased trends that are derived based on statistical properties. Secondly, association rules promote innovation by providing an automated collection of hypotheses that can be shortlisted and put to test for validation. This is particularly useful in domains with large volumes of data and a dearth of subject-matter experts. Thirdly, the approach has well-established set of algorithms and heuristic that makes it particularly scalable for large data sets. Finally, the generated rules are independent and therefore they can be implemented selectively without having the need to apply all of them.
On a global scale, data driven disruptions are fundamentally changing how organisations discover new products, operate, trade and grow their businesses. As the size of data increases to the level of petabytes, the data itself has become a rich source from which novel hypotheses are drawn, tested and resolved. In this scenario, association rules are well suited to sift through large volumes of data in an unsupervised manner and return with prospective theories for experts to examine. This contrasts with practises of the past when specialists would form a hypothesis followed by collecting of experimental data for testing before drawing conclusions. Therefore, as the use of smart algorithms on big data enhances the process of innovation, it in turn will drive organisations towards rapid transformation.
To manage such large magnitude of trends, a potential starting point is to focus on itemsets. Broadly speaking, an itemset is a specific type of pattern within a given database. For example, consider a retail dataset of customers purchasing four types of breakfast products such as: Cereal, Milk, Bread, and Egg. So, for a given day, the top row could record items purchased by the first customer such as (bran, skim, brown, small). This would mean that this customer bought bran cereal, skim milk, brown bread and small sized eggs. Similarly, the second row could be (wheat, regular, white, large), and the third row might be (wheat, skim, white, small). In summary, each of the above entry represent the transaction recorded after the sale made to the three customers respectively.
In this simplistic scenario, an example of an itemset could be something like (bran, X, X, small) which means all shoppers that purchased both bran cereal as well as small sized eggs regardless of having bought any other items. The symbol ‘X’ is regarded as a wild card that basically matches all values in our simplified dataset. Also, as this itemset is only represented by the first transaction, its frequency is considered as being one. Another itemset could be (X, skim, X, small), which would mean all customers that bought skim milk and small sized eggs irrespective of other products. As this matches the first and the third transactions, the count of this itemset is two.
Itemsets that occur frequently provide valuable piece of information especially in the retail sector as they represent cluster of products that are repeatedly bought together. However, despite being useful, what they lack on their own is an element of predictability. In other words, a frequent itemset such as (X, skim, X, small) does not tell us whether customers buying skim milk are more likely to buy small sized eggs, or, is it that customers that buy small sized eggs have higher propensity to buy skim milk. In summary, a frequent itemset represent a group of products that are regularly bought together without attempting to anticipate customer behaviour.
Interestingly, during the mid-nineties, a group of IBM researchers led by Rakesh Agrawal exploited the mathematical properties of itemsets and smartly combined it with conditional probabilities to transform them into association rules. Association rules are written in the format of IF (condition) THEN (action). An example of such a rule from our example could be IF (small eggs) THEN (skim milk). This rule links small sized eggs with skim milk and predicts that a customer who buys small sized eggs is highly likely to also purchase skim milk. It is important to note that the inverse of this relationship may not necessarily be true. In fact, the key power of an association rule is that it is directional in the sense that the right-hand side of the rule makes a prediction of what is likely to happen in case if the left-hand side of the rule were to take place. The researchers called this approach market basket analysis, which results in identification of multitude of association rules that are shortlisted based on statistical measures.
The application of market basket analysis has immensely transformed the retail sector causing a strategic shift in managing placement of items, store layout, marketing, sales, recommendations, as well as warehousing and the supply chain. The core foundation is provided by association rules which often connect disparate products that are found within large volumes of retail transactions. The strongly coupled goods are placed nearby for better visibility, and this principal is holistically applied across all products leading to an effective placement of goods within the store. Additionally, association rules can be used to develop customer profiles for specific products. For example, we could learn that a given age-bracket is prone to buy a specific combination of items. This information helps with bundling, marketing and advertising of targeted goods to those respective customer clusters. Moreover, association rules enhance revenue by cross selling of strongly linked items and they also drive the recommendation engine that is a popular feature with modern online stores. Finally, as group of products that are sold simultaneously run out of stock concurrently, the supply chain process refills quantities collectively and stores them close to each other for efficient warehouse usage.
Besides retail, there are many other sectors that are using association rules for analytics due to several reasons. Firstly, the association rules display versatility by not requiring domain knowledge for its implementation. In fact, its true powers lie in discovering unbiased trends that are derived based on statistical properties. Secondly, association rules promote innovation by providing an automated collection of hypotheses that can be shortlisted and put to test for validation. This is particularly useful in domains with large volumes of data and a dearth of subject-matter experts. Thirdly, the approach has well-established set of algorithms and heuristic that makes it particularly scalable for large data sets. Finally, the generated rules are independent and therefore they can be implemented selectively without having the need to apply all of them.
On a global scale, data driven disruptions are fundamentally changing how organisations discover new products, operate, trade and grow their businesses. As the size of data increases to the level of petabytes, the data itself has become a rich source from which novel hypotheses are drawn, tested and resolved. In this scenario, association rules are well suited to sift through large volumes of data in an unsupervised manner and return with prospective theories for experts to examine. This contrasts with practises of the past when specialists would form a hypothesis followed by collecting of experimental data for testing before drawing conclusions. Therefore, as the use of smart algorithms on big data enhances the process of innovation, it in turn will drive organisations towards rapid transformation.