Thursday 26 July 2018

Data Mining using R Language

Why Data Mining ?

- I have this financial data with me, I need to find out if any of the transactions are fraudulent.
- I have this email data with me, I have need to check how many of the mails are spam.
- I have this telecom data with me, I need to find out how many of the customers will churn out.

Data Mining to the rescue!
 How do I obtain knowledge from this data?
→ Hey, you can use data mining technique to find interesting insights from the data.

What is Data Mining?
→ Data Mining is the computing process of discovering patterns in large datasets involving methods at the intersection of machine learning, statistic, and database systems.





 How should the Mined Information be?




New :- The extracted information should give us new patterns, relationships among the data entities.

Correct :- As everything that glitters is not gold, similarly, all the mined information might not be correct/valid. The mined information needs to be evaluated for it's correctness before we use it for any other purpose.

Potentially useful :- As we extract useful products such as petrol, diesel etc. from crude oil, similarly, the mined information from raw data should be useful and relevant to us. 

Knowledge Discovery in database

Tasks in KDD
1. Data Selection :- a) Data from 
                                b) Data Warehouse
                                c) Target Data

2. Data Pre-Processing :-
      a) The selected data must be appropriate for mining tasks
      b) Simple operations such as summarizing, aggregation, normalization can be done to transform/consolidate the data such that it is suitable for mining.

3. Data Mining :- 
       a) This is the most important step in KDD process
       b) Intelligent operations such as clustering, classification, regression, and applied in order to extract patterns.

4. Pattern Evaluation :-
         Once the data mining technique have been applied, the obtained results need to be evaluated for their accuracy.

5. Knowledge Representation :-
         The identified patterns must be represented using simple, anesthetic graphs.

0 Comments:

Post a Comment

Popular Posts

Categories

AI (27) Android (24) AngularJS (1) Assembly Language (2) aws (17) Azure (7) BI (10) book (4) Books (114) C (77) C# (12) C++ (82) Course (60) Coursera (176) coursewra (1) Cybersecurity (22) data management (11) Data Science (89) Data Strucures (6) Deep Learning (9) Django (6) Downloads (3) edx (2) Engineering (14) Excel (13) Factorial (1) Finance (5) flutter (1) FPL (17) Google (19) Hadoop (3) HTML&CSS (46) IBM (25) IoT (1) IS (25) Java (92) Leet Code (4) Machine Learning (44) Meta (18) MICHIGAN (5) microsoft (3) Pandas (3) PHP (20) Projects (29) Python (745) Python Coding Challenge (198) Questions (2) R (70) React (6) Scripting (1) security (3) Selenium Webdriver (2) Software (17) SQL (40) UX Research (1) web application (8)

Followers

Person climbing a staircase. Learn Data Science from Scratch: online program with 21 courses