Studies in computational intelligence; 29 (Berlin; Heidelberg, 2006). - ОГЛАВЛЕНИЕ / CONTENTS
Навигация

Архив выставки новых поступлений | Отечественные поступления | Иностранные поступления | Сиглы
ОбложкаSumathi S. Introduction to data mining and its applications / S.Sumathi, S.N.Sivanandam. - Berlin; Heidelberg: Springer, 2006. - xxii, 828 p.: ill. - (Studies in computational intelligence; 29). - Bibliogr.: p.799-828. - ISBN-10 3-540-34350-4; ISBN13 978-3-540-34350-9; ISSN 1860-949X
 

Место хранения: 02 | Отделение ГПНТБ СО РАН | Новосибирск

Оглавление / Contents
 
1    Introduction to Data Mining Principles ..................... 1
1.1  Data Mining and Knowledge Discovery ........................ 2
1.2  Data Warehousing and Data Mining - Overview ................ 5
     1.2.1  Data Warehousing Overview ........................... 7
     1.2.2  Concept of Data Mining .............................. 8
1.3  Summary ................................................... 20
1.4  Review Questions .......................................... 20

2    Data Warehousing, Data Mining, and OLAP ................... 21
2.1  Data Mining Research Opportunities and Challenges ......... 23
     2.1.1  Recent Research Achievements ....................... 25
     2.1.2  Data Mining Application Areas ...................... 27
     2.1.3  Success Stories .................................... 29
     2.1.4  Trends that Affect Data Mining ..................... 30
     2.1.5  Research Challenges ................................ 31
     2.1.6  Test Beds and Infrastructure ....................... 33
     2.1.7  Findings and Recommendations ....................... 33
2.2  Evolving Data Mining into Solutions for Insights .......... 35
     2.2.1  Trends and Challenges .............................. 36
2.3  Knowledge Extraction Through Data Mining .................. 37
     2.3.1  Data Mining Process ................................ 39
     2.3.2  Operational Aspects ................................ 50
     2.3.3  The Need and Opportunity for Data Mining ........... 51
     2.3.4  Data Mining Tools and Techniques ................... 52
     2.3.5  Common Applications of Data Mining ................. 55
     2.3.6  What about Data Mining in Power Systems? ........... 56
2.4  Data Warehousing and OLAP ................................. 57
     2.4.1  Data Warehousing for Actuaries ..................... 57
     2.4.2  Data Warehouse Components .......................... 58
     2.4.3  Management Information ............................. 59
     2.4.4  Profit Analysis .................................... 60
     2.4.5  Asset Liability Management ......................... 60
2.5  Data Mining and OLAP ...................................... 61
     2.5.1  Research ........................................... 61
     2.5.2  Data Mining ........................................ 68
2.6  Summary ................................................... 72
2.7  Review Questions .......................................... 72

3    Data Marts and Data Warehouse ............................. 75
3.1  Data Marts, Data Warehouse, and OLAP ...................... 77
     3.1.1  Business Process Re-engineering .................... 77
     3.1.2  Real-World Usage ................................... 78
     3.1.3  Business Intelligence .............................. 78
     3.1.4  Different Data Structures .......................... 82
     3.1.5  Different Users .................................... 84
     3.1.6  Technological Foundation ........................... 86
     3.1.7  Data Warehouse ..................................... 87
     3.1.8  Informix Architecture .............................. 87
     3.1.9  Building the Data Warehouse/Data Mart Environment .. 88
     3.1.10 History ............................................ 91
     3.1.11 Nondetailed Data in the Enterprise Data Warehouse .. 92
     3.1.12 Sharing Data Among Data Marts ...................... 93
     3.1.13 The Manufacturing Process .......................... 93
     3.1.14 Subdata Marts ...................................... 95
     3.1.15 Refreshment Cycles ................................. 95
     3.1.16 External Data ...................................... 96
     3.1.17 Operational Data Stores (ODS) and Data Marts ....... 97
     3.1.18 Distributed Metadata ............................... 98
     3.1.19 Managing the Warehouse Environment ................ 100
     3.1.20 OLAP .............................................. 102
3.2  Data Warehousing for Healthcare .......................... 107
     3.2.1  A Data Warehousing Perspective for Healthcare ..... 107
     3.2.2  Adding Value to your Current Data ................. 107
     3.2.3  Enhance Customer Relationship Management .......... 108
     3.2.4  Improve Provider Management ....................... 109
     3.2.5  Reduce Fraud ...................................... 109
     3.2.6  Prepare for HEDIS Reporting ....................... 110
     3.2.7  Disease Management ................................ 110
     3.2.8  What to Expect When Beginning a Data Warehouse
            Implementation .................................... 110
     3.2.9  Definitions ....................................... 111
3.3  Data Warehousing in the Telecommunications Industry ...... 112
     3.3.1  Implementing One View ............................. 118
     3.3.2  Business Benefit .................................. 120
     3.3.3  A Holistic Approach ............................... 121
3.4  The Telecommunications Lifecycle ......................... 122
     3.4.1  Current Enterprise Environment .................... 122
     3.4.2  Getting to the Root of the Problem ................ 123
     3.4.3  The Telecommunications Lifecycle .................. 125
     3.4.4  Telecom Administrative Outsourcing ................ 127
     3.4.5  Choose your Outsourcing Partner Wisely ............ 127
     3.4.6  Security in Web-Enabled Data Warehouse ............ 128
3.5  Security Issues in Data Warehouse ........................ 129
     3.5.1  Performance vs Security ........................... 130
     3.5.2  An Ideal Security Model ........................... 131
     3.5.3  Real-World Implementation ......................... 131
     3.5.4  Proposed Security Model ........................... 136
3.6  Data Warehousing: To Buy or To Build a Fundamental
     Choice for Insurers ...................................... 140
     3.6.1  Executive Overview ................................ 140
     3.6.2  The Fundamental Choice ............................ 140
     3.6.3  Analyzing the Strategic Value of Data
            Warehousing ....................................... 141
     3.6.4  Addressing your Concerns .......................... 142
     3.6.5  Introducing FellowDSS ............................. 146
3.7  Summary .................................................. 148
3.8  Review Questions ......................................... 149

4    Evolution and Scaling of Data Mining Algorithms .......... 151
4.1  Data-Driven Evolution of Data Mining Algorithms .......... 152
     4.1.1  Transaction Data .................................. 153
     4.1.2  Data Streams ...................................... 154
     4.1.3  Graph and Text-Based data ......................... 155
     4.1.4  Scientific Data ................................... 156
4.2  Scaling Mining Algorithms to Large DataBases ............. 157
     4.2.1  Prediction Methods ................................ 157
     4.2.2  Clustering ........................................ 160
     4.2.3  Association Rules ................................. 161
     4.2.4  From Incremental Model Maintenance to Streaming
            Data .............................................. 162
4.3  Summary .................................................. 163
4.4  Review Questions ......................................... 164

5    Emerging Trends and Applications of Data Mining .......... 165
5.1  Emerging Trends in Business Analytics .................... 166
     5.1.1  Business Users .................................... 166
     5.1.2  The Driving Force ................................. 167
5.2  Business Applications of Data Mining ..................... 170
5.3  Emerging Scientific Applications in Data Mining .......... 177
     5.3.1  Biomedical Engineering ............................ 177
     5.3.2  Telecommunications ................................ 178
     5.3.3  Geospatial Data ................................... 180
     5.3.4  Climate Data and the Earth's Ecosystems ........... 181
5.4  Summary .................................................. 182
5.5  Review Questions ......................................... 183

6    Data Mining Trends and Knowledge Discovery ............... 185
6.1  Getting a Handle on the Problem .......................... 186
6.2  KDD and Data Mining: Background .......................... 187
6.3  Related Fields ........................................... 191
6.4  Summary .................................................. 194
6.5  Review Questions ......................................... 194

7    Data Mining Tasks, Techniques, and Applications .......... 195
7.1  Reality Check for Data Mining ............................ 196
     7.1.1  Data Mining Basics ................................ 196
     7.1.2  The Data Mining Process ........................... 197
     7.1.3  Data Mining Operations ............................ 199
     7.1.4  Discovery-Driven Data Mining Techniques ........... 201
7.2  Data Mining: Tasks, Techniques, and Applications ......... 204
     7.2.1  Data Mining Tasks ................................. 204
     7.2.2  Data Mining Techniques ............................ 206
     7.2.3  Applications ...................................... 209
     7.2.4  Data Mining Applications - Survey ................. 210
7.3  Summary .................................................. 215
7.4  Review Questions ......................................... 216

8    Data Mining: an Introduction - Case Study ................ 217
8.1  The Data Flood ........................................... 218
8.2  Data Holds Knowledge ..................................... 218
     8.2.1  Decisions From the Data ........................... 219
8.3  Data Mining: A New Approach to Information Overload ...... 219
     8.3.1  Finding Patterns in Data, which we can use to
            Better, Conduct the Business ...................... 219
     8.3.2  Data Mining can be Breakthrough Technology ........ 220
     8.3.3  Data Mining Process in an Information System ...... 221
     8.3.4  Characteristics of Data Mining .................... 222
     8.3.5  Data Mining Technology ............................ 223
     8.3.6  Technology Limitations ............................ 224
     8.3.7  BBC Case Study: The Importance of Business
            Knowledge ......................................... 225
     8.3.8  Some Medical and Pharmaceutical Applications of
            Data Mining ....................................... 228
     8.3.9  Why Does Data Mining Work? ........................ 228
8.4  Summary .................................................. 229
8.5  Review Questions ......................................... 229

9    Data Mining &; KDD ....................................... 231
9.1  Data Mining and KDD - Overview ........................... 232
     9.1.1  The Idea of Knowledge Discovery in Databases
            (KDD) ............................................. 234
     9.1.2  How Data Mining Relates to KDD .................... 235
     9.1.3  The Data Mining Future ............................ 237
9.2  Data Mining: The Two Cultures ............................ 238
     9.2.1  The Central Issue ................................. 238
     9.2.2  What are Data Mining and the Data Mining
            Process? .......................................... 239
     9.2.3  Machine Learning .................................. 239
     9.2.4  Impact of Implementation .......................... 240
9.3  Summary .................................................. 241
9.4  Review Questions ......................................... 241

10   Statistical Themes and Lessons for Data Mining ........... 243
10.1 Data Mining and Official Statistics ...................... 244
     10.1.1 What is New in Data Mining is ..................... 244
     10.1.2 Goals and Tools of Data Mining .................... 244
     10.1.3 New Mines: Texts, Web, Symbolic Data? ............. 245
     10.1.4 Applications in Official Statistics ............... 246
10.2 Statistical Themes and Lessons for Data Mining ........... 246
     10.2.1 An Overview of Statistical Science ................ 248
     10.2.2 Is Data Mining "Statistical Deja Vu" (All Over
            Again)? ........................................... 252
     10.2.3 Characterizing Uncertainty ........................ 254
     10.2.4 What Can Go Wrong, Will Go Wrong .................. 256
     10.2.5 Symbiosis in Statistics ........................... 261
10.3 Summary .................................................. 262
10.4 Review Questions ......................................... 263

11   Theoretical Frameworks for Data Mining ................... 265
11.1 Two Simple Approaches .................................... 266
     11.1.1 Probabilistic Approach ............................ 267
     11.1.2 Data Compression Approach ......................... 268
11.2 Microeconomic View of Data Mining ........................ 268
     11.3 Inductive Databases ................................. 269
     11.4 Summary ............................................. 270
     11.5 Review Questions .................................... 270

12   Major and Privacy Issues in Data Mining and Knowledge
     Discovery ................................................ 271
12.1 Major Issues in Data Mining .............................. 272
12.2 Privacy Issues in Knowledge Discovery and Data Mining .... 275
     12.2.1 Revitalized Privacy Threats ....................... 277
     12.2.2 New Privacy Threats ............................... 279
     12.2.3  Possible Solutions ............................... 281
12.3 The OECD Personal Privacy Guidelines ..................... 283
     12.3.1 Risks Privacy and the Principles of Data
            Protection ........................................ 284
     12.3.2 The OECD Guidelines and Knowledge Discovery ....... 286
     12.3.3 Knowledge Discovery about Groups .................. 288
     12.3.4 Legal Systems and other Guidelines ................ 289
12.4 Summary .................................................. 290
12.5 Review Questions ......................................... 291

13   Active Data Mining ....................................... 293
13.1 Shape Definitions ........................................ 295
13.2 Queries .................................................. 297
13.3 Triggers ................................................. 299
     13.3.1 Wave Execution Semantics .......................... 300
13.4 Summary .................................................. 302
13.5 Review Questions ......................................... 302

14   Decomposition in Data Mining - A Case Study .............. 303
14.1 Decomposition in the Literature .......................... 304
     14.1.1 Machine Learning .................................. 304
14.2 Typology of Decomposition in Data Mining ................. 305
14.3 Hybrid Models ............................................ 306
14.4 Knowledge Structuring .................................... 309
14.5 Rule-Structuring Model ................................... 310
14.6 Decision Tables, Maps, and Atlases ....................... 311
14.7 Summary .................................................. 312
14.8 Review Questions ......................................... 313

15   Data Mining System Products and Research Prototypes ...... 315
15.1 How to Choose a Data Mining System ....................... 316
15.2 Examples of Commercial Data Mining Systems ............... 318
15.3 Summary .................................................. 319
15.4 Review Questions ......................................... 320

16   Data Mining in Customer Value and Customer Relationship
     Management ............................................... 321
16.1 Data Mining: A Concept of Customer Relationship
     Marketing ................................................ 322
     16.1.1 Traditional Marketing Research .................... 322
     16.1.2 Relationship Marketing - the Modern View .......... 323
     16.1.3 Understanding the Background of Data Mining ....... 324
     16.1.4 Continuous Relationship Marketing ................. 326
     16.1.5 Developing the Data Mining Project ................ 327
     16.1.6 Further Research .................................. 328
16.2 Introduction to Customer Acquisition ..................... 328
     16.2.1 How Data Mining and Statistical Modeling Change
            Things ............................................ 329
     16.2.2 Defining Some Key Acquisition Concepts ............ 329
     16.2.3 It all Begins with the Data ....................... 331
     16.2.4 Test Campaigns .................................... 332
     16.2.5 Evaluating Test Campaign Responses ................ 333
     16.2.6 Building Data Mining Models Using Response
            Behaviors ......................................... 333
16.3 Customer Relationship Management (CRM) ................... 335
     16.3.1 Defining CRM ...................................... 335
     16.3.2 Integrating Customer Data into CRM Strategy ....... 335
     16.3.3 Strategic Data Analysis for CRM ................... 335
     16.3.4 Data Warehousing and Data Mining .................. 337
     16.3.5 Sharing Customer Data Within the Value Chain ...... 338
     16.3.6 CVM - Customer Value Management ................... 339
     16.3.7 Issues in Global Customer Management .............. 340
     16.3.8 Changing Systems .................................. 341
     16.3.9 Changing Customer Management - A Strategic View ... 342
16.4 Data Mining and Customer Value and Relationships ......... 348
     16.4.1 What is Data Mining? .............................. 349
     16.4.2 Relevance to a Business Process ................... 351
     16.4.3 Data Mining and Customer Relationship
            Management ........................................ 352
     16.4.4 How Data Mining Helps Database Marketing .......... 353
16.5 CRM: Technologies and Applications ....................... 356
     16.5.1 What is CRM ....................................... 357
     16.5.2 What is CRM Used for? ............................. 357
     16.5.3 Consequences of Implementation of CRM ............. 359
     16.5.4 Which Technologies are Used in CRM? ............... 360
     16.5.5 Business Rules .................................... 360
     16.5.6 Data Warehousing .................................. 360
     16.5.7 Data Mining ....................................... 361
     16.5.8 Real-Time Information Analysis .................... 362
     16.5.9 Reporting ......................................... 363
     16.5.10 Web Self-Service ................................. 363
     16.5.11 Market Overview .................................. 364
     16.5.12 Connection between ERP and CRM ................... 365
     16.5.13 Benefits of CRM to the Enterprise ................ 367
     16.5.14 Future of CRM .................................... 367
16.6 Data Management in Analytical Customer Relationship
     Management ............................................... 369
     16.6.1 The CRM Process Model ............................. 370
     16.6.2 Data Sources for Analytical CRM ................... 374
     16.6.3 Data Integration in Analytical CRM ................ 376
     16.6.4 Further Research .................................. 384
16.7 Summary .................................................. 385
16.8 Review Questions ......................................... 385

17   Data Mining in Business .................................. 387
17.1 Business Focus on Data Engineering ....................... 388
17.2 Data Mining for Business Problems ........................ 390
17.3 Data Mining and Business Intelligence .................... 396
17.4 Data Mining in Business - Case Studies ................... 399

18   Data Mining in Sales Marketing and Finance ............... 411
18.1 Data Mining can Bring Pinpoint Accuracy to Sales ......... 413
18.2 From Data Mining to Database Marketing ................... 414
     18.2.1 Data Mining vs. Database Marketing ................ 414
     18.2.2 What Exactly is Data Mining? ...................... 415
     18.2.3 Who is Developing the Technology? ................. 416
     18.2.4 Turning Business Problems into Business
            Solutions ......................................... 417
     18.2.5 A Possible Scenario for the Future of Data
            Mining ............................................ 419
18.3 Data Mining for Marketing Decisions ...................... 419
     18.3.1 Agent-Based Information Retrieval Systems ......... 421
     18.3.2 Applications of Data Mining in Marketing .......... 424
18.4 Increasing Customer Value by Integrating Data Mining ..... 425
     18.4.1 Some Definitions .................................. 425
     18.4.2 Data Mining Defined ............................... 426
     18.4.3 The Purpose of Data Mining ........................ 427
     18.4.4 Scoring the Model ................................. 427
     18.4.5 The Role of Campaign Management Software .......... 427
     18.4.6 The Integrated Data Mining and Campaign
            Management Process ................................ 429
     18.4.7 Data Mining and Campaign Management in the
            Real World ........................................ 430
     18.4.8 The Benefits of Integrating Data Mining and
            Campaign Management ............................... 431
18.5 Completing a Solution for Market-Basket Analysis - Case
     Study .................................................... 431
     18.5.1 Business Problem .................................. 432
     18.5.2 Case Studies ...................................... 432
     18.5.3 Data Mining Solutions ............................. 433
     18.5.4 Recommendations ................................... 434
18.6 Data Mining in Finance ................................... 435
18.7 Data Mining for Financial Data Analysis .................. 436
18.8 Summary .................................................. 437
18.9 Review Questions ......................................... 438

19   Banking and Commercial Applications ...................... 439
19.1 Bringing Data Mining to the Forefront of Business
     Intelligence ............................................. 441
19.2 Distributed Data Mining Through a Centralized Solution ... 441
     A Case Study ............................................. 442
     19.2.1 Background ........................................ 442
19.3 Data Mining in Commercial Applications ................... 444
     19.3.1 Data Cleaning and Data Preparation ................ 444
     19.3.2 Involving Business Users in the KDD Process ....... 445
     19.3.3 Business Challenges for the KDD Process ........... 446
19.4 Decision Support Systems - Case Study .................... 446
     19.4.1 A Functional Perspective .......................... 447
     19.4.2 Decisions ......................................... 450
19.5 Keys to the Commercial Success of Data Mining - Case
     Studies .................................................. 452
     19.5.1 Case Study 1: Commercial Success Criteria ......... 452
     19.5.2 Case Study 2: A Service Provider's View ........... 454
19.6 Data Mining Supports E-Commerce 458
     19.6.1  Data Mining Application Possibilities in Web
             Stores ........................................... 459
19.7 Data Mining for the Retail Industry ...................... 462
19.8 Business Intelligence and Retailing ...................... 463
     19.8.1 Applications of Data Warehousing and Data
            Mining in the Retail INDUSTRY ..................... 463
     19.8.2 Key Trends in the Retail Industry ................. 464
     19.8.3 Business Intelligence Solutions for the Retail
            Industry .......................................... 465
19.9 Summary .................................................. 471
19.10 Review Questions ........................................ 472

20   Data Mining for Insurance ................................ 473
20.1 Insurance Underwriting ................................... 474
     20.1.1 Data Mining and Insurance: Improving the
            Underwriting Decision-Making Process .............. 475
     20.1.2 What does an Insurance Underwriter Do? ............ 479
     20.1.3 How is the Underwriting Function Changing? ........ 485
     20.1.4 How can Data Mining Help Underwriters Make
            Better Business Decisions ......................... 485
20.2 Business Intelligence and Insurance ...................... 487
     20.2.1 Insurance Industry Overview and Major Trends ...... 487
     20.2.2 Business Intelligence and the Insurance Value
            Chain ............................................. 488
     20.2.3 Customer Relationship Management .................. 489
     20.2.4 Channel Management ................................ 491
     20.2.5 Actuarial ......................................... 493
     20.2.6 Underwriting and Policy Management ................ 493
     20.2.7 Claims Management ................................. 494
     20.2.8 Finance and Asset Management ...................... 495
     20.2.9 Human Resources ................................... 496
     Ht-t 20.2.10 Corporate Management ........................ 497
20.3 Summary .................................................. 497
20.4 Review Questions ......................................... 498

21   Data Mining in Biomedicine and Science ................... 499
21.1 Applications in Medicine ................................. 501
     21.1.1 HealthCare ........................................ 501
     21.1.2 Data Mining in Clinical Domains ................... 501
     21.1.3 Data Mining In Medical Diagnosis Problem .......... 502
21.2 Data Mining for Biomedical and DNA Data Analysis ......... 502
     21.2.1 Semantic Integration of Heterogeneous,
            Distributed Genome Databases ...................... 503
     21.2.2 Similarity Search and Comparison Among DNA
            Sequences ......................................... 503
     21.2.3 Association Analysis: Identification of
            Co-occurring Gene Sequences ....................... 504
     21.2.4 Path Analysis: Linking Genes to Different Stages
            of Disease Development ............................ 504
     21.2.5 Visualization Tools and Genetic Data Analysis ..... 504
21.3 An Unsupervised Neural Network Approach .................. 504
     21.3.1 Knowledge Extraction Through Data Mining .......... 505
     21.3.2 Traditional Difficulties in Handling Medical
            Data .............................................. 505
     21.3.3 An Illustrative Case Study ........................ 506
     21.3.4 Organizing Medical Data ........................... 506
     21.3.5 Building the Neural Network Tool .................. 508
     21.3.6 Applying Data Mining and Data Visualization
            Techniques ........................................ 509
21.4 Data Mining - Assisted Decision Support for Fever
     Diagnosis - Case Study ................................... 515
     21.4.1 Architecture for Fever Diagnosis .................. 516
     21.4.2 Medical Data Definition Component ................. 516
     21.4.3 Physician-System Interface ........................ 517
     21.4.4 Diagnostic Question Banque ........................ 517
     21.4.5 Pattern Extractor ................................. 519
     21.4.6 Rule Constructor .................................. 519
21.5 Data Mining and Science .................................. 520
21.6 Knowledge Discovery in Science as Opposed to Business-
     Case Study ............................................... 522
     21.6.1 Why is Data Mining Different? ..................... 522
     21.6.2 The Data Management Context ....................... 522
     21.6.3 Business Data Analysis ............................ 523
     21.6.4 Scientific Data Analysis .......................... 523
     21.6.5 Scientific Applications ........................... 524
     21.6.6 Example of Predicting Air Quality ................. 524
21.7 Data Mining in a Scientific Environment .................. 529
     21.7.1 What is Data Mining? .............................. 529
     21.7.2 Traditional Uses of Data Mining ................... 531
     21.7.3 Data Mining in a Scientific Environment ........... 532
     21.7.4 Examples of Scientific Data Mining ................ 533
     21.7.5 Concluding Remarks ................................ 533
21.8 Flexible Earth Science Data Mining System Architecture ... 534
     21.8.1 DESIGN ISSUES ..................................... 534
     21.8.2 ADaM System Features .............................. 535
     21.8.3 ADaM Plan Builder Client .......................... 540
     21.8.4 Research Directions ............................... 541
21.9 Summary .................................................. 542
21.10 Review Questions ........................................ 543

22   Text and Web Mining ...................................... 545
22.1 Data Mining and the Web .................................. 547
     22.1.1 Resource Discovery ................................ 548
     22.1.2 Information Extraction ............................ 548
     22.1.3 Generalization .................................... 548
22.2 An Overview on Web Mining ................................ 549
     22.2.1 Taxonomy of Web Mining ............................ 550
     22.2.2 Database Approach ................................. 550
     22.2.3 Web Mining Tasks .................................. 552
     22.2.4 Mining Interested Content from Web Document ....... 553
     22.2.5 Mining Pattern from Web Transactions/Logs ......... 554
     22.2.6 Web Access Pattern Tree (WAP tree) ................ 557
22.3 Text Mining .............................................. 558
     22.3.1 Definition ........................................ 558
     22.3.2 S&T Text Mining Applications ...................... 559
     22.3.3 Text Mining Tools ................................. 560
     22.3.4 Text Data Mining .................................. 561
22.4 Discovering Web Access Patterns and Trends ............... 563
     22.4.1 Design of a Web Log Miner ......................... 565
     22.4.2 Database Construction from server log Files ....... 567
     22.4.3 Multidimensional Web log data cube ................ 568
     22.4.4 Data mining on Web log data cube and Web log
            database .......................................... 569
22.5 Web Usage Mining on Proxy Servers: A Case Study .......... 572
     22.5.1 Aspects of Web Usage Mining ....................... 573
     22.5.2 Data Collection ................................... 573
     22.5.3 Preprocessing ..................................... 574
     22.5.4 Data Cleaning ..................................... 574
     22.5.5 User and Session Identification ................... 575
     22.5.6 Data Mining Techniques ............................ 575
     22.5.7 E-metrics ......................................... 577
     22.5.8 The Data .......................................... 579
22.6 Text Data Mining in Biomedical Literature ................ 581
     22.6.1 Information Retrieval Task - Retrieve Relevant
            Documents by Making use of Existing Database ...... 582
     22.6.2 Naive Bayes Classifier ............................ 582
     22.6.3 Experimental results of Information Retrieval
            task .............................................. 583
     22.6.4 Text Mining Task - Mining MEDLINE by Combining
            Term Extraction and Association Rule Mining ....... 583
     22.6.5 Finding the Relations Between MeSH Terms and
            Substances ........................................ 584
     22.6.6 Finding the Relations Between Other Terms ......... 584
22.7 Related Work ............................................. 585
     22.7.1 Future Work: For the Information Retrieval Task ... 586
     22.7.2 For the Text Mining Task .......................... 587
     22.7.3 Mutual Benefits between Two Tasks ................. 587
22.8 Summary .................................................. 588
22.9 Review Questions ......................................... 589

23   Data Mining in Information Analysis and Delivery ......... 591
23.1 Information Analysis: Overview ........................... 592
     23.1.1 Data Acquisition .................................. 592
     23.1.2 Extraction and Representation ..................... 593
     23.1.3 Information Analysis .............................. 593
23.2 Intelligent Information Delivery - Case Study ............ 595
     23.2.1 Alerts Run Rampant ................................ 595
     23.2.2 What an Intelligent Information Delivery System
            is ................................................ 596
     23.2.3 Simple Example of an Intelligent Information
            Delivery Mechanism ................................ 597
23.3 A Characterization of Data Mining Technologies and
     Processes - Case Study ................................... 599
     23.3.1 Data Mining Processes ............................. 600
     23.3.2 Data Mining Users and Activities .................. 601
     23.3.3 The Technology Tree ............................... 602
     23.3.4 Cross-Tabulation .................................. 609
     23.3.5 Neural Nets ....................................... 610
23.4 Summary .................................................. 612
23.5 Review Questions ......................................... 613

24   Data Mining in Telecommunications and Control ............ 615
24.1 Data Mining for the Telecommunication Industry ........... 616
     24.1.1 Multidimensional Analysis of Telecommunication
            Data .............................................. 617
     24.1.2 Fraudulent Pattern Analysis and the
            Identification of Unusual Patterns ................ 617
     24.1.3 Multidimensional Association and Sequential
            Pattern Analysis .................................. 617
     24.1.4 Use of Visualization Tools in Telecommunication
            Data Analysis ..................................... 618
24.2 Data Mining Focus Areas in Telecommunication ............. 618
     24.2.1 Systematic Error .................................. 618
     24.2.2 Data Mining in Churn Analysis ..................... 620
24.3 A Learning System for Decision Support in
     Telecommunications ....................................... 621
     24.4 Knowledge Processing in Control Systems ............. 623
     24.4.1  Preliminaries and General Definitions ............ 624
24.5 Data Mining for Maintenance of Complex Systems - A Case
     Study .................................................... 626
     24.6 Summary ............................................. 627
     24.7 Review Questions .................................... 627

25   Data Mining in Security .................................. 629
25.1 Data Mining in Security Systems .......................... 630
25.2 Real Time Data Mining-Based Intrusion Detection Systems
     - Case Study ............................................. 631
     25.2.1 Accuracy .......................................... 632
     25.2.2 Feature Extraction for IDS ........................ 633
     25.2.3 Artificial Anomaly Generation ..................... 634
     25.2.4 Combined Misuse and Anomaly Detection ............. 635
     25.2.5 Efficiency ........................................ 636
     25.2.6 Cost-Sensitive Modeling ........................... 637
     25.2.7 Distributed Feature Computation ................... 639
     25.2.8 System Architecture ............................... 643
25.3 Summary .................................................. 646

Data Mining Research Projects ................................. 649
A.l  National University of Singapore: Data Mining Research
     Projects ................................................. 649
     A.1.1  Cleaning Data for Warehousing and Mining .......... 649
     A.1.2  Data Mining in Multiple Databases ................. 650
     A.1.3  Intelligent WEB Document Management Using Data
            Mining Techniques ................................. 650
     A.l.4  Data Mining with Neural Networks .................. 650
     A.1.5  Data Mining in Semistructured Data ................ 651
     A.1.6  A Data Mining Application - Customer Retention
            in the Port of Singapore Authority (PSA) .......... 651
     A.1.7  A Belief-Based Approach to Data Mining ............ 651
     A.l.8  Discovering Interesting Knowledge in Database ..... 652
     A.1.9  Data Mining for Market Research ................... 652
     A.1.10 Data Mining in Electronic Commerce ................ 652
     А.1.11 Multidimensional Data Visualization Tool .......... 653
     A.l.12 Clustering Algorithms for Data Mining ............. 653
     A.1.13 Web Page Design for Electronic Commerce ........... 653
     A.1.14 Data Mining Application on Web Information
            Sources ........................................... 654
     A.1.15 Data Mining in Finance ............................ 654
     A.1.16 Document Summarization ............................ 654
     A.1.17 Data Mining and Intelligent Data Analysis ......... 655
A.2  HP Labs Research: Software Technology Laboratory ......... 658
     A.2.1  Data Mining Research .............................. 658
A.3  CRISP-DM: An Overview .................................... 661
     A.3.1  Moving from Technology to Business ................ 661
     A.3.2  Process Model ..................................... 662
A.4  Data Mining SuiteTM ...................................... 663
     A.4.1  Rule-based Influence Discovery .................... 665
     A.4.2  Dimensional Affinity Discovery .................... 665
     A.4.3  The OLAP Discovery System ......................... 665
     A.4.4  Incremental Pattern Discovery ..................... 665
     A.4.5  Trend Discovery ................................... 666
     A.4.6  Forensic Discovery ................................ 666
     A.4.7  Predictive Modeler ................................ 666
A.5  The Quest Data Mining System, IBM Almaden Research
     Center, CA, USA .......................................... 669
     A.5.1  Introduction ...................................... 669
     A.5.2  Association Rules ................................. 670
     A.5.3  Apriori Algorithm ................................. 670
     A.5.4  Sequential Patterns ............................... 672
     A.5.5  Time-series Clustering ............................ 673
     A.5.6  Incremental Mining ................................ 675
     A.5.7  Parallelism ....................................... 676
     A.5.8  System Architecture ............................... 676
     A.5.9  Future Directions ................................. 676
A.6  The Australian National University Research Projects ..... 676
     A.6.1  Applications of Inductive Learning ................ 676
     A.6.2  Logic in Machine Learning ......................... 677
     A.6.3  Machine-learning Summer Research Projects
            in Data Mining and Reinforcement Learning ......... 678
     A.6.4  Computational Aspects of Data Mining
            (3 Projects) ...................................... 678
     A.6.5  Data Mining the MACHO Database .................... 679
     A.6.6  Artificial Stereophonic Processing ................ 680
     A.6.7  Real-time Active Vision ........................... 680
     A.6.8  Web Teleoperation of a Mobile Robot ............... 680
     A.6.9  Autonomous Submersible Robot ...................... 681
     A.6.10 The SIT Project ................................... 682
A.7  Data Mining Research Group, Monash University Australia .. 682
     A.7.1  Current Projects .................................. 682
     A.7.2  ADELFI - A Model for the Deployment of High-
            Performance Solutions on the Internet and
            Intranets ......................................... 683
A.8  Current Projects, University of Alabama in Huntsville,
     AL ....................................................... 688
     A.8.1  Direct Mailing System ............................. 688
     A.8.2  A Vibration Sensor ................................ 688
     A.8.3  Current Status .................................... 689
     A.8.4  Data Mining Using Classification .................. 689
     A.8.5  Email Classification, Mining ...................... 690
     A.8.6  Data-based Decision Making ........................ 690
     A.8.7  Data Mining in Relational Databases ............... 691
     A.8.8  Environmental Applications and Machine Learning ... 691
     A.8.9  Current Research Projects ......................... 692
     A.8.10 Web Mining ........................................ 693
     A.8.11 Neural Networks Applications to ATM Networks
            Control ........................................... 693
     A.8.12 Scientific Topics ................................. 694
     A.8.13 Application Areas ................................. 695
A.9  Kensington Approach Toward Enterprise Data Mining Group .. 696
     A.9.1  Distributed Database Support ...................... 696
     A.9.2  Distributed Object Management ..................... 696
     A.9.3  Groupware, Security, and Persistent Objects ....... 697
     A.9.4  Universal Clients - User-friendly Data Mining ..... 697
     A.9.5  High-Performance Server ........................... 697

Data Mining Standards ......................................... 699
II.1 Data Mining Standards .................................... 700
     II.1.1 Process Standards ................................. 700
     II.1.2 XML Standards/OR Model Defining Standards ... 704
     II.1.3 Web Standards ..................................... 707
     II.1.4 Application Programming Interfaces (APIs) ......... 711
     II.1.5 Grid Services ..................................... 716
II.2 Developing Data Mining Application Using Data Mining
     Standards ................................................ 719
     II.2.1 Application Requirement Specification ............. 719
     II.2.2 Design and Deployment ............................. 720
II.3 Analysis ................................................. 722
II.4 Application Examples ..................................... 723
     II.4.1 PMML Example ...................................... 723
     II.4.2 XMLA Example ...................................... 724
     II.4.3 OLEDB ............................................. 725
     II.4.4 OLEDB-DM Example .................................. 726
     II.4.5 SQL/MM Example .................................... 728
     II.4.6 Java Data Mining Model Example .................... 728
     II.4.7 Web Services ...................................... 730
II.5 Conclusion ............................................... 730

Intelligent Miner ............................................. 731
3А.1 Data Mining Process ...................................... 731
     3А.1.1 Selecting the Input Data .......................... 732
     3A.1.2 Exploring the Data ................................ 732
     ЗА.1.3 Transforming the Data ............................. 732
     3A.1.4 Mining the Data ................................... 733
3A.2 Interpreting the Results ................................. 733
3A.3 Overview of the Intelligent Miner Components ............. 734
     3A.3.1 User interface .................................... 734
     ЗА.3.2 Environment Layer API ............................. 734
     3A.3.3 Visualizer ........................................ 734
     3A.3.4 Data Access ....................................... 734
3A.4 Running Intelligent Miner Servers ........................ 734
3А.5 How the Intelligent Miner Creates Output Data ............ 736
     3A.5.1 Partitioned Output Tables ......................... 736
     3A.5.2 How the Partitioning Key is Created ............... 737
3A.6 Performing Common Tasks .................................. 737
3А.7 Understanding Basic Concepts ............................. 738
     3А.7.1 Getting Familiar with the Intelligent Miner Main
            Window ............................................ 738
3A.8 Main Window Areas ........................................ 738
     3A.8.1 Mining Base Container ............................. 738
     3A.8.2 Contents Container ................................ 739
     3A.8.3 Work Area ......................................... 739
     3A.8.4 Creating and Using Mining Bases ................... 739
3A.9 Conclusion ............................................... 740
     Clementine ............................................... 741
3B.1 Key Findings ............................................. 741
3B.2 Background Information ................................... 742
3B.3 Product Availability ..................................... 743
3B.4 Software Description ..................................... 744
3B.5 Architecture ............................................. 745
3B.6 Methodology .............................................. 746
     3B.6.1 Business Understanding ............................ 746
     3B.6.2 Data Understanding ................................ 748
     3B.6.3 Data Preparation .................................. 749
     3B.6.4 Modeling .......................................... 750
     3B.6.5 Evaluation ........................................ 752
     3B.6.6 Deployment ........................................ 753
3B.7 Clementine Server ........................................ 753
3B.8 How Clementine Server Improves Performance on Large
     Datasets ................................................. 754
     3B.8.1 Benchmark Testing Results: Data Processing ........ 755
     3B.8.2 Benchmark Testing Results: Modeling ............... 755
     3B.8.3 Benchmark Testing Results: Scoring ................ 757
3B.9 Conclusion ............................................... 758
     Crisp .................................................... 761
3C.1 Hierarchical Breakdown ................................... 761
3C.2 Mapping Generic Models to Specialized Models ............. 762
     3C.2.1 Data Mining Context ............................... 762
     3C.2.2 Mappings with Contexts ............................ 763
3C.3 The CRISP-DM Reference Model ............................. 763
     3C.3.1 Business Understanding ............................ 765
3C.4 Data Understanding ....................................... 769
     3C.4.1 Collect Initial Data .............................. 769
     3C.4.2 Output Initial Data Collection Report ............. 770
     3C.4.3 Describe Data ..................................... 770
     3C.4.4 Explore Data ...................................... 771
     3C.4.5 Output Data Exploration Report .................... 771
     3C.4.6 Verify Data Quality ............................... 771
3C.5 Data Preparation ......................................... 771
     3C.5.1 Select Data ....................................... 771
     3C.5.2 Clean Data ........................................ 772
     3C.5.3 Construct Data .................................... 773
     3C.5.4 Generated Records ................................. 773
     3C.5.5 Integrate Data .................................... 773
     3C.5.6 Output Merged Data ................................ 773
     3C.5.7 Format Data ....................................... 773
     3C.5.8 Reformatted Data .................................. 774
3C.6 Modeling ................................................. 774
     3C.6.1 Select Modeling Technique ......................... 774
     3C.6.2 Outputs Modeling Technique ........................ 774
     3C.6.3 Modeling Assumptions .............................. 774
     3C.6.4 Generate Test Design .............................. 774
     3C.6.5 Output Test Design ................................ 775
     3C.6.6 Build Model ....................................... 775
     3C.6.7 Outputs Parameter Settings ........................ 775
     3C.6.8 Assess Model ...................................... 776
     3C.6.9 Outputs Model Assessment .......................... 776
     3C.6.10 Revised Parameter Settings ....................... 776
3C.7 Evaluation ............................................... 776
     3C.7.1 Evaluate Results .................................. 776
3C.8 Conclusion ............................................... 777

Mineset ....................................................... 779
3D.1 Introduction ............................................. 779
3D.2 Architecture ............................................. 779
3D.3 MineSet Tools for Data Mining Tasks ...................... 780
3D.4 About the Raw Data ....................................... 781
3D.5 Analytical Algorithms .................................... 781
3D.6 Visualization ............................................ 782
3D.7 KDD Process Management ................................... 783
3D.8 History .................................................. 784
3D.9 Commercial Uses .......................................... 785
3D.10 Conclusion .............................................. 786

Enterprise Miner .............................................. 787
3E.1 Tools For Data Mining Process ............................ 787
3E.2 Why Enterprise Miner ..................................... 788
3E.3 Product Overview ......................................... 789
3E.4 SAS Enterprise Miner 5.2 Key Features .................... 790
     3E.4.1 Multiple Interfaces ............................... 790
     3E.4.2 Scalable Processing ............................... 791
     3E.4.3 Accessing data .................................... 791
     3E.4.4 Sampling .......................................... 791
     3E.4.5 Data Partitioning ................................. 792
     3E.4.6 Filtering Outliers ................................ 792
     3E.4.7 Transformations ................................... 792
     3E.4.8 Data Replacement .................................. 792
     3E.4.9 Descriptive Statistics ............................ 792
     3E.4.10 Graphs/Visualization ............................. 793
3E.5 Enterprise Miner Software ................................ 793
     3E.5.1 The Graphical User Interface ...................... 794
     3E.5.2 The GUI Components ................................ 794
3E.6 Enterprise Miner Process for Data Mining ................. 796
3E.7 Client/Server Capabilities ............................... 796
3E.8 Client/Server Requirements ............................... 796
3E.9 Conclusion ............................................... 797

References .................................................... 799


Архив выставки новых поступлений | Отечественные поступления | Иностранные поступления | Сиглы
 

[О библиотеке | Академгородок | Новости | Выставки | Ресурсы | Библиография | Партнеры | ИнфоЛоция | Поиск]
  © 1997–2024 Отделение ГПНТБ СО РАН  

Документ изменен: Wed Feb 27 14:26:24 2019. Размер: 55,805 bytes.
Посещение N 1669 c 20.05.2014