Preface ...................................................... xvii
Acknowledgments ............................................... xix
Trademark Information ......................................... xxi
Chapter 1 Introduction ......................................... 1
1.1 Introduction ............................................... 1
1.2 What We Are Talking About .................................. 1
1.3 The Concise Summary ........................................ 3
1.4 Some Initial Thoughts ...................................... 3
References ...................................................... 8
Chapter 2 Basic Concepts of Expert Systems ..................... 9
2.1 What Are Expert Systems? ................................... 9
2.2 The Conceptual Design of an Expert System ................. 10
2.3 Knowledge and Knowledge Representation .................... 12
2.3.1 Rules .............................................. 12
2.3.2 Semantic Networks .................................. 14
2.3.3 Frames ............................................. 16
2.3.4 Advantages of Rules ................................ 18
2.3.4.1 Declarative Language ....................... 18
2.3.4.2 Separation of Business Logic and Data ...... 18
2.3.4.3 Centralized Knowledge Base ................. 18
2.3.4.4 Performance and Scalability ................ 19
2.3.5 When to Use Rules .................................. 19
2.4 Reasoning ................................................. 20
2.4.1 The Inference Engine ............................... 20
2.4.2 Forward and Backward Chaining ...................... 22
2.4.3 Case-Based Reasoning ............................... 22
2.5 The Fuzzy World ........................................... 24
2.5.1 Certainty Factors .................................. 24
2.5.2 Fuzzy Logic ........................................ 25
2.5.3 Hidden Markov Models ............................... 26
2.5.4 Working with Probabilities - Bayesian Networks ..... 27
2.5.5 Dempster-Shafer Theory of Evidence ................. 28
2.6 Gathering Knowledge - Knowledge Engineering ............... 29
2.7 Concise Summary ........................................... 31
References ..................................................... 32
Chapter 3 Development Tools for Expert Systems ................ 35
3.1 Introduction .............................................. 35
3.2 The Technical Design of Expert Systems .................... 35
3.2.1 Knowledge Base ..................................... 35
3.2.2 Working Memory ..................................... 35
3.2.3 Inference Engine ................................... 36
3.2.4 User Interface ..................................... 36
3.3 Imperative versus Declarative Programming ................. 37
3.4 List Processing (LISP) .................................... 40
3.5 Programming Logic (PROLOG) ................................ 41
3.5.1 PROLOG Facts ....................................... 41
3.5.2 PROLOG Rules ....................................... 42
3.6 National Aeronautics and Space Administration's (NASA's)
Alternative - C Language Integrated Production System
(CLIPS) ................................................... 43
3.6.1 CLIPS Facts ........................................ 44
3.6.2 CLIPS Rules ........................................ 45
3.7 Java-Based Expert Systems - JESS .......................... 47
3.8 Rule Engines - JBoss Rules ................................ 48
3.9 Languages for Knowledge Representation .................... 49
3.9.1 Classification of Individuals and Concepts
(CLASSIC) .......................................... 50
3.9.2 Knowledge Machine .................................. 51
3.10 Advanced Development Tools ................................ 53
3.10.1 XpertRule .......................................... 55
3.10.2 Rule Interpreter (RI) .............................. 56
3.11 Concise Summary ........................................... 57
References ..................................................... 58
Chapter 4 Dealing with Chemical Information ................... 61
4.1 Introduction .............................................. 61
4.2 Structure Representation .................................. 61
4.2.1 Connection Tables (CTs) ............................ 61
4.2.2 Connectivity Matrices .............................. 62
4.2.3 Linear Notations ................................... 63
4.2.4 Simplified Molecular Input Line Entry
Specification (SMILES) ............................. 63
4.2.5 SMILES Arbitrary Target Specification (SMARTS) ..... 64
4.3 Searching for Chemical Structures ......................... 64
4.3.1 Identity Search versus Substructure Search ......... 64
4.3.2 Isomorphism Algorithms ............................. 65
4.3.3 Prescreening ....................................... 66
4.3.4 Hash Coding ........................................ 66
4.3.5 Stereospecific Search .............................. 67
4.3.6 Tautomer Search .................................... 67
4.3.7 Specifying a Query Structure ....................... 68
4.4 Describing Molecules ...................................... 69
4.4.1 Basic Requirements for Molecular Descriptors ....... 70
4.4.1.1 Independency of Atom Labeling .............. 71
4.4.1.2 Rotational/Translational Invariance ........ 71
4.4.1.3 Unambiguous Algorithmically Computable
Definition ................................. 71
4.4.1.4 Range of Values ............................ 71
4.4.2 Desired Properties of Molecular Descriptors ........ 72
4.4.2.1 Reversible Encoding ........................ 73
4.4.3 Approaches for Molecular Descriptors ............... 73
4.4.4 Constitutional Descriptors ......................... 73
4.4.5 Topological Descriptors ............................ 74
4.4.6 Topological Autocorrelation Vectors ................ 74
4.4.7 Fragment-Based Coding .............................. 75
4.4.8 3D Molecular Descriptors ........................... 76
4.4.9 3D Molecular Representation Based on Electron
Diffraction ........................................ 77
4.4.10 Radial Distribution Functions ...................... 77
4.4.11 Finding the Appropriate Descriptor ................. 78
4.5 Descriptive Statistics .................................... 79
4.5.1 Basic Terms ........................................ 79
4.5.1.1 Standard Deviation (SD) .................... 79
4.5.1.2 Variance ................................... 79
4.5.1.3 Covariance ................................. 80
4.5.1.4 Covariance Matrix .......................... 80
4.5.1.5 Eigenvalues and Eigenvectors ............... 80
4.5.2 Measures of Similarity .............................. 81
4.5.3 Skewness and Kurtosis ............................... 83
4.5.4 Limitations of Regression ........................... 85
4.5.5 Conclusions for Investigations of Descriptors ....... 86
4.6 Capturing Relationships - Principal Components ............ 87
4.6.1 Principal Component Analysis (PCA) ................. 87
4.6.1.1 Centering the Data ......................... 89
4.6.1.2 Calculating the Covariance Matrix .......... 89
4.6.2 Singular Value Decomposition (SVD) ................. 91
4.6.3 Factor Analysis .................................... 94
4.7 Transforming Descriptors .................................. 95
4.7.1 Fourier Transform .................................. 95
4.7.2 Hadamard Transform ................................. 96
4.7.3 Wavelet Transform .................................. 96
4.7.4 Discrete Wavelet Transform ......................... 97
4.7.5 Daubechies Wavelets ................................ 98
4.7.6 The Fast Wavelet Transform ......................... 99
4.8 Learning from Nature - Artificial Neural Networks ........ 102
4.8.1 Artificial Neural Networks in a Nutshell .......... 103
4.8.2 Kohonen Neural Networks - The Classifiers ......... 105
4.8.3 Counterpropagation (CPG) Neural Networks -
The Predictors .................................... 107
4.8.4 The Tasks: Classification and Modeling ............ 109
4.9 Genetic Algorithms (GAs) ................................. 110
4.10 Concise Summary .......................................... 112
References .................................................... 115
Chapter 5 Applying Molecular Descriptors ..................... 119
5.1 Introduction ............................................. 119
5.2 Radial Distribution Functions (RDFs) ..................... 119
5.2.1 Radial Distribution Function ...................... 119
5.2.2 Smoothing and Resolution .......................... 120
5.2.3 Resolution and Probability ........................ 122
5.3 Making Things Comparable - Postprocessing of RDF
Descriptors .............................................. 123
5.3.1 Weighting ......................................... 123
5.3.2 Normalization ..................................... 124
5.3.3 Remark on Linear Scaling .......................... 124
5.4 Adding Properties - Property-Weighted Functions .......... 125
5.4.1 Static Atomic Properties .......................... 125
5.4.2 Dynamic Atomic Properties ......................... 126
5.4.3 Property Products versus Averaged Properties ...... 126
5.5 Describing Patterns ...................................... 128
5.5.1 Distance Patterns ................................. 129
5.5.2 Frequency Patterns ................................ 129
5.5.3 Binary Patterns ................................... 130
5.5.4 Aromatic Patterns ................................. 130
5.5.5 Pattern Repetition ................................ 130
5.5.6 Symmetry Effects .................................. 130
5.5.7 Pattern Matching with Binary Patterns ............. 131
5.6 From the View of an Atom - Local and Restricted RDF
Descriptors .............................................. 131
5.6.1 Local RDF Descriptors ............................. 132
5.6.2 Atom-Specific RDF Descriptors ..................... 132
5.7 Straight or Detour — Distance Function Types ............. 133
5.7.1 Cartesian RDF ..................................... 133
5.7.2 Bond-Path RDF ..................................... 133
5.7.3 Topological Path RDF .............................. 134
5.8 Constitution and Conformation ............................ 135
5.9 Constitution and Molecular Descriptors ................... 136
5.10 Constitution and Local Descriptors ....................... 139
5.11 Constitution and Conformation in Statistical
Evaluations .............................................. 140
5.12 Extending the Dimension - Multidimensional Function
Types .................................................... 145
5.13 Emphasizing the Essential - Wavelet Transforms ........... 147
5.13.1 Single-Level Transforms ........................... 150
5.13.2 Wavelet-Compressed Descriptors .................... 151
5.14 A Tool for Generation and Evaluation of RDF
Descriptors - ARC ........................................ 151
5.14.1 Loading Structure Information ..................... 153
5.14.2 The Default Code Settings ......................... 153
5.14.3 Calculation and Investigation of a Single
Descriptor ........................................ 154
5.14.4 Calculation and Investigation of Multiple
Descriptor Sets ................................... 155
5.14.5 Binary Comparison ................................. 155
5.14.6 Correlation Matrices .............................. 155
5.14.7 Training a Neural Network ......................... 155
5.14.8 Investigation of Trained Network .................. 157
5.14.9 Prediction and Classification for a Test Set ...... 157
5.15 Synopsis ................................................. 157
5.15.1 Similarity and Diversity of Molecules ............. 162
5.15.2 Structure and Substructure Search ................. 162
5.15.3 Structure-Property Relationships .................. 162
5.15.4 Structure-Activity Relationships .................. 162
5.15.5 Structure-Spectrum Relationships .................. 162
5.16 Concise Summary .......................................... 163
References .................................................... 165
Chapter 6 Expert Systems in Fundamental Chemistry ............ 167
6.1 Introduction ............................................. 167
6.2 How It Began - The DENDRAL Project ....................... 167
6.2.1 The Generator - CONGEN ............................ 168
6.2.2 The Constructor - PLANNER ......................... 168
6.2.3 The Testing - PREDICTOR ........................... 169
6.2.4 Other DENDRAL Programs ............................ 171
6.3 A Forerunner in Medical Diagnostics ...................... 171
6.4 Early Approaches in Spectroscopy ......................... 175
6.4.1 Early Approaches in Vibrational Spectroscopy ...... 176
6.4.2 Artificial Neural Networks for Spectrum
Interpretation .................................... 177
6.5 Creating Missing Information - Infrared Spectrum
Simulation ............................................... 178
6.5.1 Spectrum Representation ........................... 178
6.5.2 Compression with Fast Fourier Transform ........... 179
6.5.3 Compression with Fast Hadamard Transform .......... 179
6.6 From the Spectrum to the Structure - Structure
Prediction ............................................... 179
6.6.1 The Database Approach ............................. 181
6.6.2 Selection of Training Data ........................ 181
6.6.3 Outline of the Method ............................. 182
6.6.3.1 Preprocessing of Spectrum Information ..... 182
6.6.3.2 Preprocessing of Structure Information .... 182
6.6.3.3 Generation of a Descriptor Database ....... 182
6.6.3.4 Training .................................. 182
6.6.3.5 Prediction of the Radial Distribution
Function (RDF) Descriptor ................. 183
6.6.3.6 Conversion of the RDF Descriptor .......... 184
6.6.4 Examples for Structure Derivation ................. 184
6.6.5 The Modeling Approach ............................. 187
6.6.6 Improvement of the Descriptor ..................... 188
6.6.7 Database Approach versus Modeling Approach ........ 189
6.7 From Structures to Properties ............................ 190
6.7.1 Searching for Similar Molecules in a Data Set ..... 191
6.7.2 Molecular Diversity of Data Sets .................. 193
6.7.2.1 Average Descriptor Approach ............... 194
6.7.2.2 Correlation Approach ...................... 194
6.7.3 Prediction of Molecular Polarizability ............ 199
6.8 Dealing with Localized Information - Nuclear Magnetic
Resonance (NMR) Spectroscopy ............................. 201
6.8.1 Commercially Available Products ................... 201
6.8.2 Local Descriptors for Nuclear Magnetic Resonance
Spectroscopy ...................................... 202
6.8.3 Selecting Descriptors by Evolution ................ 205
6.8.4 Learning Chemical Shifts .......................... 206
6.8.5 Predicting Chemical Shifts ........................ 207
6.9 Applications in Analytical Chemistry ..................... 208
6.9.1 Gamma Spectrum Analysis ........................... 208
6.9.2 Developing Analytical Methods - Thermal
Dissociation of Compounds ......................... 209
6.9.3 Eliminating the Unnecessary - Supporting
Calibration ....................................... 215
6.10 Simulating Biology ....................................... 217
6.10.1 Estimation of Biological Activity ................. 217
6.10.2 Radioligand Binding Experiments ................... 218
6.10.3 Effective and Inhibitory Concentrations ........... 219
6.10.4 Prediction of Effective Concentrations ............ 221
6.10.5 Progestagen Derivatives ........................... 221
6.10.6 Calcium Agonists .................................. 223
6.10.7 Corticosteroid-Binding Globulin (CBG) Steroids .... 224
6.10.8 Mapping a Molecular Surface ....................... 226
6.11 Supporting Organic Synthesis ............................. 229
6.11.1 Overview of Existing Systems ...................... 230
6.11.2 Elaboration of Reactions for Organic Synthesis .... 232
6.11.3 Kinetic Modeling in EROS .......................... 233
6.11.4 Rules in EROS ..................................... 233
6.11.5 Synthesis Planning - Workbench for
the Organization of Data for Chemical
Applications (WODCA) .............................. 234
6.12 Concise Summary .......................................... 236
References .................................................... 239
Chapter 7 Expert Systems in Other Areas of Chemistry ......... 247
7.1 Introduction ............................................. 247
7.2 Bioinformatics ........................................... 247
7.2.1 Molecular Genetics (MOLGEN) ....................... 248
7.2.2 Predicting Toxicology - Deductive Estimation of
Risk from Existing Knowledge (DEREK) for
Windows ........................................... 249
7.2.3 Predicting Metabolism - Meteor .................... 251
7.2.4 Estimating Biological Activity - APEX-3D .......... 251
7.2.5 Identifying Protein Structures .................... 254
7.3 Environmental Chemistry .................................. 257
7.3.1 Environmental Assessment - Green Chemistry
Expert System (GCES) .............................. 257
7.3.2 Synthetic Methodology Assessment for Reduction
Techniques ........................................ 258
7.3.3 Green Synthetic Reactions ......................... 259
7.3.4 Designing Safer Chemicals ......................... 260
7.3.5 Green Solvents/Reaction Conditions ................ 261
7.3.6 Green Chemistry References ........................ 261
7.3.7 Dynamic Emergency Management - Real-Time Expert
System (RTXPS) .................................... 262
7.3.8 Representing Facts - Descriptors .................. 262
7.3.9 Changing Facts - Backward-Chaining Rules .......... 263
7.3.10 Triggering Actions - Forward-Chaining Rules ....... 263
7.3.11 Reasoning - The Inference Engine .................. 264
7.3.12 A Combined Approach for Environmental
Management ........................................ 265
7.3.13 Assessing Environmental Impact - EIAxpert ......... 266
7.4 Geochemistry and Exploration ............................. 267
7.4.1 Exploration ....................................... 267
7.4.2 Geochemistry ...................................... 268
7.4.3 X-Ray Phase Analysis .............................. 268
7.5 Engineering .............................................. 269
7.5.1 Monitoring of Space-Based Systems - Thermal
Expert System (TEXSYS) ............................ 269
7.5.2 Chemical Equilibrium of Complex Mixtures - CEA .... 270
7.6 Concise Summary .......................................... 271
References .................................................... 274
Chapter 8 Expert Systems in the Laboratory Environment ....... 277
8.1 Introduction ............................................. 277
8.2 Regulations .............................................. 277
8.2.1 Good Laboratory Practices ......................... 278
8.2.1.1 Resources, Organization, and Personnel .... 278
8.2.1.2 Rules, Protocols, and Written
Procedures ................................ 278
8.2.1.3 Characterization .......................... 278
8.2.1.4 Documentation ............................. 278
8.2.1.5 Quality Assurance ......................... 279
8.2.2 Good Automated Laboratory Practice (GALP) ......... 279
8.2.3 Electronic Records and Electronic Signatures
(21 CFR Part 11) .................................. 280
8.3 The Software Development Process ......................... 281
8.3.1 From the Requirements to the Implementation ....... 282
8.3.1.1 Analyzing the Requirements ................ 282
8.3.1.2 Specifying What Has to Be Done ............ 282
8.3.1.3 Defining the Software Architecture ........ 282
8.3.1.4 Programming ............................... 282
8.3.1.5 Testing the Outcome ....................... 283
8.3.1.6 Documenting the Software .................. 283
8.3.1.7 Supporting the User ....................... 283
8.3.1.8 Maintaining the Software .................. 283
8.3.2 The Life Cycle of Software ........................ 283
8.4 Knowledge Management ..................................... 287
8.4.1 General Considerations ............................ 287
8.4.2 The Role of a Knowledge Management System (KMS) ... 288
8.4.3 Architecture ...................................... 289
8.4.4 The Knowledge Quality Management Team ............. 290
8.5 Data Warehousing ......................................... 290
8.6 The Basis - Scientific Data Management Systems ........... 293
8.7 Managing Samples - Laboratory Information Management
Systems (LIMS) ........................................... 295
8.7.1 LIMS Characteristics .............................. 296
8.7.2 Why Use a LIMS? ................................... 297
8.7.3 Compliance and Quality Assurance (QA) ............. 297
8.7.4 The Basic LIMS .................................... 298
8.7.5 A Functional Model ................................ 298
8.7.5.1 Sample Tracking ........................... 298
8.7.5.2 Sample Analysis ........................... 299
8.7.5.3 Sample Organization ....................... 299
8.7.6 Planning System ................................... 299
8.7.7 The Controlling System ............................ 300
8.7.8 The Assurance System .............................. 300
8.7.9 What Else Can We Find in a LIMS? .................. 301
8.7.9.1 Automatic Test Programs ................... 301
8.7.9.2 Off-Line Client ........................... 301
8.7.9.3 Stability Management ...................... 301
8.7.9.4 Reference Substance Module ................ 302
8.7.9.5 Recipe Administration ..................... 302
8.8 Tracking Workflows - Workflow Management Systems ......... 302
8.8.1 Requirements ...................................... 303
8.8.2 The Lord of the Runs .............................. 303
8.8.3 Links and Logistics ............................... 304
8.8.4 Supervisor and Auditor ............................ 304
8.8.5 Interfacing ....................................... 305
8.9 Scientific Documentation - Electronic Laboratory
Notebooks (ELNs) ......................................... 305
8.9.1 The Electronic Scientific Document ................ 307
8.9.2 Scientific Document Templates ..................... 309
8.9.3 Reporting with ELNs ............................... 310
8.9.4 Optional Tools in ELNs ............................ 310
8.10 Scientific Workspaces .................................... 312
8.10.1 Scientific Workspace Managers ..................... 313
8.10.2 Navigation and Organization in a Scientific
Workspace ......................................... 315
8.10.3 Using Metadata Effectively ........................ 315
8.10.4 Working in Personal Mode .......................... 319
8.10.5 Differences of Electronic Scientific Documents .... 319
8.11 Interoperability and Interfacing ......................... 320
8.11.1 eXtensible Markup Language (XML) - Based
Technologies ...................................... 320
8.11.1.1 Simple Object Access Protocol (SOAP) ..... 321
8.11.1.2 Universal Description, Discovery, and
Integration (UDDI) ....................... 321
8.11.1.3 Web Services Description Language
(WSDL) ................................... 321
8.11.2 Component Object Model (COM) Technologies ......... 321
8.11.3 Connecting Instruments - Interface Port
Solutions ......................................... 322
8.11.4 Connecting Serial Devices ......................... 322
8.11.5 Developing Your Own Connectivity - Software
Development Kits (SDKs) ........................... 324
8.11.6 Capturing Data — Intelligent Agents ............... 325
8.11.7 The Inbox Concept ................................. 327
8.12 Access Rights and Administration ......................... 328
8.13 Electronic Signatures, Audit Trails, and IP Protection ... 329
8.13.1 Signature Workflow ................................ 329
8.13.2 Event Messaging ................................... 331
8.13.3 Audit Trails and IP Protection .................... 331
8.13.4 Hashing Data ...................................... 331
8.13.5 Public Key Cryptography ........................... 332
8.13.5.1 Secret Key Cryptography .................. 333
8.13.5.2 Public Key Cryptography .................. 333
8.14 Approaches for Search and Reuse of Data and
Information .............................................. 333
8.14.1 Searching for Standard Data ....................... 334
8.14.2 Searching with Data Cartridges .................... 334
8.14.3 Mining for Data ................................... 335
8.14.4 The Outline of a Data Mining Service for
Chemistry ......................................... 336
8.14.4.1 Search and Processing of Raw Data ........ 336
8.14.4.2 Calculation of Descriptors ............... 337
8.14.4.3 Analysis by Statistical Methods .......... 337
8.14.4.4 Analysis by Artificial Neural Networks ... 337
8.14.4.5 Optimization by Genetic Algorithms ....... 338
8.14.4.6 Data Storage ............................. 338
8.14.4.7 Expert Systems ........................... 338
8.15 A Bioinformatics LIMS Approach ........................... 338
8.15.1 Managing Biotransformation Data ................... 339
8.15.2 Describing Pathways ............................... 340
8.15.3 Comparing Pathways ................................ 342
8.15.4 Visualizing Biotransformation Studies ............. 343
8.15.5 Storage of Biotransformation Data ................. 344
8.16 Handling Process Deviations .............................. 344
8.16.1 Covered Business Processes ........................ 345
8.16.2 Exception Recording ............................... 346
8.16.2.1 Basic Information Entry .................. 346
8.16.2.2 Risk Assessment .......................... 346
8.16.2.3 Cause Analysis ........................... 347
8.16.2.4 Corrective Actions ....................... 347
8.16.2.5 Efficiency Checks ........................ 348
8.16.3 Complaints Management ............................. 348
8.16.4 Approaches for Expert Systems ..................... 349
8.17 Rule-Based Verification of User Input .................... 350
8.17.1 Creating User Dialogues ........................... 350
8.17.2 User Interface Designer (UID) ..................... 351
8.17.3 The Final Step - Rule Generation .................. 354
8.18 Concise Summary .......................................... 354
References .................................................... 358
Chapter 9 Outlook ............................................ 361
9.1 Introduction ............................................. 361
9.2 Attempting a Definition .................................. 361
9.3 Some Critical Considerations ............................. 362
9.3.1 The Comprehension Factor .......................... 363
9.3.2 The Resistance Factor ............................. 363
9.3.3 The Educational Factor ............................ 363
9.3.4 The Usability Factor .............................. 364
9.3.5 The Commercial Factor ............................. 365
9.4 Looking Forward .......................................... 365
Reference ..................................................... 366
Index ......................................................... 367
|