Consider a credit card company that has a system to detect any suspicious transactions to protect customers from credit card fraud. This can be done by paying special attention to card usage that are rather different from typical cases. For example, If a purchase amount is too big for card owner and location of transaction is far from owner’s resident city, then system contacts card owner for verification. WHAT DATA MINING TECHNIQUES CAN HELP TO DETECT SUSPICIOUS TRANSACTIONS?
Most transaction are normal, However, if card is stolen, then transaction pattern changes significantly. Location, items purchased are different from authentic card owner. To detect credit card fraud detection this idea is used to identify transactions that are very different from the normal.
Outliers detection is the process of finding data objects that behaves differently from expectation. Such objects are called anomalies or outliers. Outlier detection is important in may applications other than fraud detection such as medical industry, public security, and intrusion detection.
Let us first know, What Are Outliers?
What Are Outliers?
Consider a statistical process that generates a set of data objects. An outliers is a data object that deviates significantly from other objects. Outliers are different from Noisy data. Noise is random error or variance. In general noise is not that important in data analysis, including outlier detection. Outliers are interesting because they are suspected of being generated by other mechanisms than other data.
Outlier detection is also related to novelty detection in evolving data sets. Consider monitoring a social media app where new content is incoming, novelty detection will identify new topics and trends in timely manner.
Types of Outliers
Global outliers
A data object is said to be global outlier if it deviates significantly from other data object. Global outliers are also called as point anomalies. These are simplest outliers
Contextual outliers
A data object is said to be Contextual outlier if it deviates significantly with respect to a specific context of the object. Contextual outliers are also called as conditional outliers as they are conditional for a chosen context. For example, “The temperature today is 2’C. Is it exceptional?”
Collective outliers
A subset of data object is said to be collective outlier if object as whole deviates significantly from entire data set.
Some Challenges for Outlier Detection
- Understanding : – In some cases, user may want to not only detect outliers, but also find out why the detected objects are outliers. To achieve this understanding , an outlier detection system must also provide justification of the detection.
- System specific Outlier detection :- Choosing a measure and relationship model to describe data set is system dependent. Different system may have different requirements.
- Handling Noise :- As Outliers are different from noise, we need to handle noise while performing outlier detection. Isolating noise from data is a challenge for outlier detection.
- Modeling Outliers and Normal objects :- Quality of Outlier detection depends highly on the modeling of normal objects and outliers. The separation between data normality and abnormality is often not very precise.
More on Outlier Detection in Upcoming posts where we discuss methods to detect outliers.
Keep Learning! This is The Way!
๐