Today, people running a business area gain a great deal of profit as it can be increase season by season through consistent procedure should be apply consequently. Thus, carrying out data mining process can result in utilize in assist to make decision making process within the business. This paper sophisticated in detail the level of importance as well as the application the use of data mining which is often adopt for various fields depends on the objective, objective, goals and reason for conducting the study within the organization. there are three main areas take as a example which can be hotel, library and hotel to see on how data mining works to these main field.
Keywords: Data Mining, KDD Process, Decision Trees, Ant Colony Clustering Algorithm; Connection Guidelines, Neural Network, Tough Set,
As we know, business which conducts business transfer is keeps significant of document or data in a specific database for further retrieval. The info are combine from are a few departments that carried out different process and each of their function parallel with the mission and perspective of organization. Corresponding (Imberman, 2001) the number of areas in large directories can approach magnitudes of 102 to 103. Therefore, it's important to make proper decision making or tactical planning using the prevailing data where these performs important role in order to ensure any action that are taken place does not given a direct effect especially bring reduction to the organization. Other than that, data became outdated when it keeps on changing and easily out dated as the user requirement shifting is determined by factors such as movements, money, needs and so forth.
One way to analyze data is using of data mining strategy which enable to aid organization by point out several steps to produce the valuable result in short time period compare with the original method which may involves several methodologies and it derive to longer of energy to perform the inspection towards some of data. Thus, in the business area an action should be done quickly to be able to contend with other competitors and improve performance both in supplying service and produce a high quality product. Furthermore, process interpretation of the result involves group of men and women to inject a few of the creativeness and synthesis which can lead to the alternatives on the situation or tasks.
Obviously, data mining a lot help out with various domains with different purposes and depend on the objectives that want to achieve. The others of this paper is organized as follows. Section 2 instructs about classification of data mining. Section 3 establishes the importance of data mining. Section 4 explains the use of data mining in various areas. Section 5 draws the conclusions.
There are abroad definitions shown by a few researcher and academician corresponding to their view and view based on the study they did. Moreover, these will understand or giving a concept before discusses more in depth towards data mining strategy.
Basically, the key goal use of data mining is to manipulate large amount of data either existence or store in the databases by determine appropriate parameters which is donate to the quality of prediction that'll be use to solve problem. Define by Gargano & Raggad, 1999.
"Data mining searches for hidden relationships, patterns, correlations, and interdependencies in large directories that traditional information gathering methods (e. g. record creation, pie and club graph generation, end user querying, decision support systems (DSSs), etc. ) might overlook".
Besides that, another author also decided with opinion toward the data mining meaning which is to get hidden routine, orientation and also trend. Through (Palace, 1996) added to the previous is:
"Data mining is the procedure of finding correlations or patterns among dozens of fields in large relational databases".
Moreover, data mining also define as process to squash of knowledge or information using appropriate platform or model to investigate until produce an productivity that assist in fulfill the goal of the analysis. From Imberman, 2001:
"As knowledge extraction, information discovery, information harvesting, exploratory data analysis, data archeology, data pattern processing, and functional dependency analysis".
The statement above agreed and contributes that the framework or model that take up definitely to expose the true circumstances. Define by Ma, Chou & Yen, 2000:
"Data mining is the procedure of applying artificial brains techniques (such as advanced modeling and rule induction) to a huge data occur order to determine patterns in the info".
In the other hands, data mining is taken a few steps during evaluation which step is depending on the technique that is chosen. Each of the methodology is not much differ from other methodology. Through Forcht & Cochran, 1999:
"Data mining is an interactive process which involves assembling the info into a format conducive to research. After the data are configured, they must be cleaned out by checking for obvious mistakes or flaws (such as an item that can be an extreme outlier) and eliminating them".
As discusses above, it can be seen that data mining will be beneficial a whole lot of party and multiple range of level in the business as the model or construction that is apply can reduce time and cost. Then, the results allow the responsible knowledge staff member to transform in to the proper value of information effectively by critically analyze the effect.
The process should be done carefully to all the useful factors or algorithm being takes away or not be included in the removal of reliable data. Data mining techniques can help in select a part of data using appropriate tools to filter outliers and anomalies within the group of data. Corresponding to Gargano & Raggad, 1999, there are many others important of data mining contain:
· To accomplish the explication of recently covered information includes the capabilities to discover guidelines, classify, partition, affiliate and optimize.
According to (Goebel & Gruenwald, 1999) to be able to seek the routine of data, a few methodologies are use in clarify the vagueness as well concerning identifying the connection among one variables and other factors within the databases whereas the results will guide in making decision or even to forecast the impact when the action were take into consideration. The chosen of methodologies should be established in an effective way suit with the guidelines and condition towards the info which is to be analyzed. The methodologies include:
- Statistical Methods: focused mainly on evaluation of preconceived hypotheses and on appropriate models to data.
- Case-Based Reasoning (CBR): technology that attempts to solve a given problem by making direct use of past experiences and solutions.
- Neural Sites: made from large numbers of simulated neurons, linked to the other person in a way just like brain neurons which allows the network to "learn".
- Decision Trees: each non-terminal node presents a test or decision on the considered data item and can also be interpreted as a particular form of an rule set, seen as a their hierarchical organization of guidelines.
- Rule Induction: Rules talk about a statistical relationship between the occurrences of certain traits in a data item, or between certain data items in a data place.
- Bayesian Belief Networks: visual representations of probability distributions produced from co-occurrence matters in the group of data items.
- Genetic algorithms / Evolutionary Programming: formulate hypotheses about dependencies between factors, in the form of association guidelines or various other inner formalism.
- Fuzzy Sets: constitute a powerful approach to offer not only with incomplete, noisy or imprecise data, but may also be helpful in developing uncertain models of the data that provide smarter and smoother performance than traditional systems.
- Rough Units: rough collections are a mathematical principle dealing with uncertainty in data and used as a stand-alone solution or coupled with other methods such as guideline induction, classification, or clustering methods
· The capability to seamlessly automate and embed some of mundane, repetitive, boring decision steps not needing continuous human intervention.
Several steps are taken in operations or analyzes on selected data where in fact the process will involve of filtering, changing, screening, modeling, visualization and recorded the effect or store accordingly in the directories or data warehouse. Each one of the steps functions diversely and has responsibility in bears out the procedure with the purpose to easier and produce the high quality of assumption by automate generate towards specific conditions. For example, data warehouse also keep past analysis which allow getting rid of the redundant outcome at certain steps. Through Ma, Chou & Yen, 2000, they stress the characteristics of data mining define how it help reach the finish process of analyzing. It comprises:
- Data pattern persistence: Data-access languages or data-manipulation languages (DMLs) identify the precise data that users want to take in to the program for control or display. In addition, it permits users to input query specifications. Therefore, users simply choose the desired information from the selections, and the machine develops the SQL command automatically.
- Formatting ability: It generates raw data forms, tabular, spreadsheet form, multidimensional-display and visualization.
- Content analysis capabilities: Data mining also has a solid content analysis ability that enables an individual to process the technical specs compiled by the end-users.
- Synthesis potential: Data mining allows data synthesis to be well-timed executed.
· Simultaneously lowering cost and potential problem encountered in the decision making process.
Basically, data mining can decrease the error of forecasting by following the steps of determined technique in well manner to avoid delaying in making decision where this example will giving big impact for the business enterprise area. Therefore, it must be careful in handling the data throughout the steps consists of whereby the strategic plan should take into consideration includes of the targets to done the analysis, the quantity of data, the factors, the relationship between variables, test adopted, and so forth. Moreover, when there is need to go over with the professional towards the study conducted and it should be contained in the planning part. In the context of firm, usually a product or group of men and women are given sensible to bears this duty to discover the hidden structure for another section. Hence, the continuously meeting should be done between your professional and analysts to guarantee the end result gratify their requirement as well concerning improve the performance of worker, department and business.
In term of minimizing an expense, compare to the traditional research which take time in acquiring the info from respondents and it depend on the methodologies that are use and the number of sampling. If the questionnaire method, it can be done quickly and less time consuming but if the interviewing method is used, it surely take time and researcher have to fits the respondent more than one time, when there is an ambiguity or the answers not meet the requirement. For several review, the sampling are involves from different location which require the researcher to travel in order to gain the genuine opinion from them which will cost a lot entails of accommodation, food, trip ticket and so forth. For data mining, it uses the lifetime of data (for example, data of customer business deal, data of student sign up, data of patient undertake the operation process and so on) that retain in data warehouse which usually reduce cost in facet of acquiring data. Other than that, researcher take first action by seek out the study in the info warehouse when the target being determine at the start of review because previous analysis are store in the info warehouse. If it is found tally, a few step will be miss or easily chosen towards the data and it confirm that data mining can reducing the cost as well as time. Make reference to Gargano & Raggad, 1999, data mining also derive long term benefit that your cost incurred due to the development, execution, and maintenance of such systems by a broad margin.
Nowadays, data mining is widely use especially to people organization that targets consumer orientation. For instance, retail, financial, communication, and marketing organizations (Palace, 1996). Besides it, medical area also gain advantage by apply the data mining into the daily functions. These various of field shows each of the organization bears different purchase where all of details keep in the directories which enables to execute examination for multiple goal loves to increase revenue, gain more customer, improve customer satisfaction and others. Furthermore, again through (Palace, 1996) the lifetime data allow to ascertain relationships among interior factor is composed price, product setting or personnel skills and exterior factor consists financial indicators, competition and customer demographic.
Hence, there three types of data mining's request in different areas which are hotel sector, library scope and also medical center with the goals to reduce or eliminate the weakness by treat it using the result that is interpret in well manner to assist to make decision to discover the best solutions. The instances are the following:
· A data mining method of developing the profiles of hotel customers.
A study do by Min, Min & Ahmed Emam, 2002 with the aim to target a few of the respected customers for special treatment predicated on their expected future success to the hotel. There are a few questions regarding to the client profiling:
- Which customers are likely to return to the same hotel as repeat guests?
- Which customers are at greatest risk of defecting to other competing hotels?
- Which service attributes tend to be important to which customers?
- How to segment the customer people into profitable or unprofitable customers?
- Which portion of the customers' best will fit the existing service capacities of the hotels?
The researchers take up decision trees for analyzing the data from the in foreign countries approach to data mining strategy because the capability to generate appropriate rules using visualization and ease. You will discover three steps needing to follows in this technique and it includes:
- Data collection: the process of go for data that suit with purpose from the previous survey. Moreover, take away the unwanted data from databases by filtering out the excel record.
- Data formatting: the procedure of modified all data in the spreadsheet to Statistical Deals for Sociable Sciences (SPSS) for the purpose of classification precision.
- Rules induction: the procedure of collection of algorithms to building decision trees which is C5. 0 to generate sets of rules that bring important signs in order for hotel manager to consider further action.
As the effect, the researcher discovered that "if-then" rules as a good in formulating a customer retention strategy with a predictive ranging from 80. 9 % to 93. 7 per cent whereas a predictive exactness reflect to the rules conditions that impact by times (ratio).
· Using data mining technology to provide a recommendation service in the digital collection.
A study conducted by Chen & Chen, 2006 with the purpose to provide advice system architecture to market digital collection service in digital libraries. You will find in another country of digital publication format wants audio, training video, picture, etc. thus, it lead problems in examining or defining the keyword and content to be able to gain information from an individual to increase the service in the digital libraries.
In the strategy section, there are two data mining models decided on which consist
o Ant Colony Clustering Algorithm;
This model is capable to find the shortest journey or reduce time to find the best output match the condition that existence in the organizations. Each one of the steps has different function to enable they too start to see the relation one of the variables It requires a few steps which are:
Step 0: variables and initialize pheromone paths.
Step 1: Each ant constructs its solution
Step 2: Compute the scores of all solutions
Step 3: Update the pheromone trails.
Step 4: In the event the best solution is not transformed after some predefined iterations, terminate the algorithm; often go to step 2 2.
o Association guidelines to discover the hidden routine.
This model allows to find co-purchase items and help out with uncovered marriage algorithms in form of connection rules. You will discover two main steps as follows:
Step 1: Find all large item sets
Step 2; use the top items set produced in the first rung on the ladder to generate all the effective connection rules.
As the results, these two models encounter more than one solutions and permit to gain a great deal of recommendation that can be manipulate into various problem that prevails in executing digital libraries as well concerning promote the use in multiple level of user using the correct device and providing appropriate services.
· Using KDD process to forecast the period of surgery.
A research conducted by Combas, Meskens & Vandamme, 2007 with the aim is to identify classes of surgery likely to take different measures of time in line with the patient's account as well as to allow the use of the operating theatre to be better slated. There are many issues arise in this field that lead to the study. For example, an endoscopy unit use of endoscopy tube (distributed resources) through the surgery. However their supply is limited because it takes 30-45min to clean and sterilize each one. The arranging of endoscopies (and all the operating theatre methods) must obviously look at the option of these different resources.
The researchers choose Knowledge Discovery in Directories (KDD) process to analyze this massive data from the databases. The step the following:
- Step 1: data planning which the chosen data must be fulfill of requirement includes extra diagnoses, "Previous energetic background" and system afflicted.
- Step 2: data cleaning where filtering data by pertaining to surgical procedures that had been performed at least 40 times (at least 20 times for combinations involving both surgery and specific cosmetic surgeons).
- Step 3: data mining which to decide appropriate method to test on the part of data which it will involve rough set and neural network.
- Step 4: validation in comparison consist procedure for interpretation by checking the result from two methods that perform data evaluation in order to observe the pace of good classification.
Then, researcher added up another three steps in order to fit with the objective that is suggested also to produce the best benefits to forecast the durations of surgery. It includes:
- o Step 5: Measuring the impact of predicting the duration of surgery on planning which in this step the length of time of surgery supplied by the prediction models (empirical laws, rule-based laws, etc. ) based on information stored in the databases can be used to feed a series of algorithms and heuristics for planning purposes
- o Step 6: Simulation includes the present time will allow to simulate the activity of the various theatre suites in terms of the functioning sequence determined by planning methods on both scenarios that are working data and patient's profile
- o Step 7: validation & collection of the best model where in fact the results supplied by the simulation model should permit to assess the grade of scheduling based on some performance indicators likes the amount of time that the operating theatres are not in use, the amount of potential additional hours, and problems in predicting the duration of surgery.
As the results, analysts are not specifically satisfactory. The primary problem seems to be the choice of adjustable grouping, which might possibly have an effect on prediction quality.
As a summary, data mining can be consider as an effective and reliable way to discover or to change the invisible to visible data that get from databases which have functions to store large amount of data by using the right tools in assist or enable to analyze, synthesis and change this content of data for various purposes and often depend on the primary businesses that carries out to identify the target.
From the talk above, it can be seen that there are a whole lot of advantages when perform data mining especially available area which permit the organization to predict the trends, customer requirement, the relationship etc as early preparation can be identify to be able to seek another or a few others way to ensure that firm can still operate their daily operation after determine that corporation not consent towards the result have been gain.
In order to produce the outcome that satisfying the organization and minimize the problem as it successfully implement the information in order to execute business transaction. The key factors should be assign in well manner meet or suitable with the aim that propose in performing the study since it have to repeat the steps when found the problems as the decision making process could not been done in line with the timeline.
Chen, Chia-Chen & Chen, An-Pin. (2006 ). Using data mining technology to give a advice service in the digital collection. The Electronic Library. 25(6): 711-734.
Combas, C. , Meskens, N & Vandamme, J. P. (2007). Utilizing a KDD process to forecast the period of surgery. International Journal of Production Economics. 112: 279-293.
Forcht. , Karen A. & Cochran, Kevin. (1999). Using data mining and datawarehousing techniques. Industrial Management & Data Systems. 99(5), 189-196.
Gargano. , Michael L. & Raggad, Bel G. (1999). Data mining - a powerful information creating tool. OCLC Systems & Services. 15(2), 81-90.
Goebel, Michael & Gruenwald, Le. (1999). A survey of data mining and knowledge discovery software tools. ACM SIGKDD Explorations Newsletter. 1: 20 - 33.
Imberman, Susan P. (2001) Effective Usage of the KDD Process and Data Mining for Computer Performance Professionals. in International Computer Measurement Group Convention. Anaheim: USA, 611-620.
Ma, Catherine, Chou, David C. &. Yen, David C. (2000). Data warehousing, technology analysis and management. Industrial Management & Data Systems. 100(3), 125-135.
Min, Hokey. , Min, Hyesung & Ahmed Emam. (2002). A data mining approach to developing the information of hotel customers. International Journal of Contemporary Hospitality Management. 14(6): 274-285.
Palace, Costs. (1996, Spring). Data Mining: What is Data Mining? retrieved March 2, 2010, from: http://www. anderson. ucla. edu/faculty/jason. frand/teacher/technologies/palace/datamining. htm