Get help with any kind of project - from a high school essay to a PhD dissertation
Data Mining in a Nut Shell In today's business world, information regarding the customer is a must for a businesses attempting to maximize its profits. A brand new, and important, tool in gaining this understanding is Data Mining. Data Mining is a set of automatic processes used to find previously unknown patterns and relationships in information. These patterns and relationships, once extracted, may be used to create valid predictions about the behaviour of the customer. Data Mining is usually used for four major tasks: (1) to improve the procedure for earning new clients and retaining customers; (2) to reduce fraud; (3) to recognize internal wastefulness and deal with that wastefulness in operations, and (4) to chart unexplored areas of the internet (Cavoukian). The fulfillment of these tasks can be enhanced if appropriate data has been collected and if that data is kept in a data warehouse. According to Stanford University, "A Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources since they're generated...This makes it a lot easier and more efficient to run queries over data that originally came from different sources." When data about an organization's practices is easier to access, it becomes more economical to mine. "Without the pool of validated and scrubbed data that a data warehouse provides, the data mining procedure requires considerable additional effort to pre-process the data" (SAS Institute). There are several different types of models and algorithms used to "mine" the data. These include, but aren't limited to, neural networks, decision trees, rule induction, boosting, and genetic algorithms. Neural networks are physical cellular systems which can acquire, store, and utilize experiential knowledge (Zurada). Neural networks offer a way to efficiently model large and complex issues. Decision trees are diagrams used for making decisions in business or computer programming. Branches are used to represent choices with associated risks, expenses, results, or probabilities. Rule induction is a way of deriving a set of rules to classify cases (Two Crows). These set of rules differ from those in a decision tree in that they are independent from one another. Boosting is a technique in which multiple random samples of data are taken and a.. .