JETZT ONLINE BESTELLEN
Add to Cart
Programming Collective Intelligence
Building Smart Web 2.0 Applications

First Edition September 2007
ISBN 978-0-596-52932-1
360 Seiten
EUR32.00, SFR54.90


Weitere Informationen zu diesem Buch

Inhaltsverzeichnis | Index | Probekapitel | Kolophon | Rezensionen |
Beispiele |


Index

	
[ A ], [ B ], [ C ], [ D ], [ E ], [ F ], [ G ], [ H ], [ I ], [ J ], [ K ], [ L ], [ M ], [ N ], [ O ], [ P ], [ Q ], [ R ], [ S ], [ T ], [ U ], [ V ], [ W ], [ X ], [ Y ], [ Z ],

A[ Top ]
advancedclassify.py
      dotproduct function, 203
      dpclassify function, 205
      getlocation function, 207, 208
      getoffset function, 213
      lineartrain function, 202
      loadnumerical function, 209
      matchcount function, 206
      matchrow class
            loadmatch function, 198
      milesdistance function, 207, 208
      nonlinearclassify function, 213
      rbf function, 213
      scaledata function, 210
      scaleinput function, 210
      yesno function, 206
agesonly.csv file, 198
Akismet, xvii, 138
akismettest.py, 138
algorithms, 4
      CART (see CART)
      collaborative filtering, 8
      feature-extraction, 228
      genetic (see genetic algorithms)
      hierarchical clustering, 35
      Item-based Collaborative Filtering Recommendation Algorithms, 27
      mass-and-spring, 111
      matrix math, 237
      other uses for learning, 5
      PageRank (see PageRank algorithm)
      stemming, 61
      summary, 277-306
            Bayesian classifier, 277-281
Amazon, 5, 53
      recommendation engines, 7
annealing
      defined, 95
      simulated, 95-96
articlewords dictionary, 231
artificial intelligence (AI), 3
artificial neural network (see neural network, artificial)
Atom feeds
      counting words in, 31-33
      parsing, 309
Audioscrobbler, 28

B[ Top ]
backpropagation, 80-82, 287
Bayes' Theorem, 125
Bayesian classification, 231
Bayesian classifier, 140, 277-281
      classifying, 279
      combinations of features, 280
      naïve, 279
      strengths and weaknesses, 280
      support-vector machines (SVMs), 225
      training, 278
Beautiful Soup, 45, 310
      crawler, 57
      installation, 311
      usage example, 311
bell curve, 174
best-fit line, 12
biotechnology, 5
black box method, 288
blogs
      clustering based on word frequencies, 30
      feeds
            counting words, 31-33
            filtering, 134-136
            (see also Atom feeds; RSS feeds)
Boolean operations, 84
breeding, 97, 251, 263

C[ Top ]
CART (Classification and Regression Trees), 145-146
categorical features
      determining distances using Yahoo! Maps, 207
      lists of interests, 206
      yes/no questions, 206
centroids, 298
chi-squared distribution, 130
classifiers
      basic linear, 202-205
      Bayesian (see Bayesian classifier)
      decision tree, 199-201
      decision tree (see decision tree classifier)
      naïve Bayesian (see naïve Bayesian classifier)
      neural network, 141
      persisting trained, 132-133
            SQLite, 132-133
      supervised, 226
      training, 119-121
classifying
      Bayesian classifier, 279
      documents, 118-119
            training classifiers, 119-121
click-training network, 74
closing price, 243
clustering, 29, 226, 232
      column, 40-42
      common uses, 29
      hierarchical (see hierarchical clustering)
      K-means, 248
      K-means clustering (see K-means clustering)
      word vectors (see word vectors)
clusters of preferences, 44-47
      Beautiful Soup, 45
      clustering results, 47
      defining distance metric, 47
      getting and preparing data, 45
      scraping Zebo results, 45
      Zebo, 44
clusters.py, 38
      bicluster class, 35
      draw2d function, 51
      drawdendrogram function, 39
      drawnode function, 39
      getheight function, 38
      hcluster function, 36
      printclust function, 37
      readfile function, 34
      rotatematrix function, 40
      scaledown function, 50
cocktail party problem, 226
collaborative filtering, 7
      algorithm, 8
      term first used, 8
collective intelligence
      defined, 2
      introduction, 1-6
column clustering, 40-42
conditional probability, 122, 319
      Bayes' Theorem, 125
content-based ranking, 64-69
      document location, 65
      normalization, 66
      word distance, 65, 68
      word frequency, 64, 66
converting longitudes and latitudes of two points into distance in miles, 208
cost function, 89-91, 109, 304
      global minimum, 305
      local minima, 305
crawler, 56-58
      Beautiful Soup API, 57
      code, 57-58
      urllib2, 56
crawling, 54
crossover, 97, 251, 263
cross-validation, 176-178, 294
      leave-one-out, 196
      squaring numbers, 177
      test sets, 176
      training sets, 176
cross-validation function, 219
cumulative probability, 185

D[ Top ]
data clustering (see clustering)
data matrix, 238
data, viewing in two dimensions, 49-52
dating sites, 5
decision boundary, 201
decision tree classifier, 199, 281-284
      interactions of variables, and, 284
      strengths and weaknesses, 284
      training, 281
decision tree modeling, 321
decision trees, 142-166
      best split, 147-148
      CART algorithm, 145-146
      classifying new observations, 153-154
      disadvantages of, 165
      displaying, 151-153
            graphical, 152-153
      early stopping, 165
      entropy, 148
      exercises, 165
      Gini impurity, 147
      introducing, 144-145
      missing data, 156-158, 166
      missing data ranges, 165
      modeling home prices, 158-161
            Zillow API, 159-161
      modeling hotness, 161-164
      multiway splits, 166
      numerical outcomes, 158
      predicting signups, 142-144
      pruning, 154-156
      real world, 155
      recursive tree binding, 149-151
      result probabilities, 165
      training, 145-146
      when to use, 164-165
del.icio.us, xvii, 314
      building link recommender, 19-22
            building dataset, 20
            del.icio.us API, 20
            recommending neighbors and links, 22
deliciousrec.py
      fillItems function, 21
      initializeUserDict function, 20
dendrogram, 34
      drawing, 38-40
            drawnode function, 39
determining distances using Yahoo! Maps, 207
distance metric
      defining, 47
distance metrics, 29
distributions, uneven, 183-188
diversity, 268
docclass.py
      classifer class
            catcount method, 133
            categories method, 133
            fcount method, 132
            incc method, 133
            incf method, 132
            setdb method, 132
            totalcount method, 133
      classifier class, 119, 136
            classify method, 127
            fisherclassifier method, 128
            fprob method, 121
            train method, 121
            weightedprob method, 123
      fisherclassifier class
            classify method, 131
            fisherprob method, 129
            setminimum method, 131
      getwords function, 118
      naivebayes class, 124
            prob method, 125
      sampletrain function, 121
document filtering, 117-141
      Akismet, 138
      arbitrary phrase length, 140
      blog feeds, 134-136
      calculating probabilities, 121-123
            assumed probability, 122
            conditional probability, 122
      classifying documents, 118-119
            training classifiers, 119-121
      exercises, 140
      Fisher method, 127-131
            classifying items, 130
            combining probabilities, 129
            versus naïve Bayesian filter, 127
      improving feature detection, 136-138
      naïve Bayesian classifier, 123-127
            choosing category, 126
      naïve Bayesian filter
            versus Fisher method, 127
      neural network classifier, 141
      persisting trained classifiers, 132-133
            SQLite, 132-133
      Pr(Document), 140
      spam, 117
document filtering (continued)
      varying assumed probabilities, 140
      virtual features, 141
document location, 65
      content-based ranking
            document location, 67
dorm.py, 106
      dormcost function, 109
      printsolution function, 108
dot-product, 322
      code, 322
dot-products, 203, 290
downloadzebodata.py, 45, 46

E[ Top ]
eBay, xvii
eBay API, 189-195, 196
      developer key, 189
      getting details for item, 193
      performing search, 191
      price predictor, building, 194
      Quick Start Guide, 189
      setting up connection, 190
ebaypredict.py
      doSearch function, 191
      getCategory function, 192
      getHeaders function, 190
      getItem function, 193
      getSingleValue function, 190
      makeLaptopDataset function, 194
      sendrequest function, 190, 191
elitism, 266
entropy, 148, 320
      code, 320
Euclidean distance, 203, 316
      code, 316
      k-nearest neighbors (kNN), 293
      score, 10-11
exact matches, 84

F[ Top ]
Facebook, 110
      building match dataset, 223
      creating session, 220
      developer key, 219
      downloading friend data, 222
      matching on, 219-224
      other Facebook predictions, 225
facebook.py
      arefriends function, 223
      createtoken function, 221
      fbsession class, 220
            getfriends function, 222
      getinfo method, 222
      getlogin function, 221
      getsession function, 221
      makedataset function, 223
      makehash function, 221
      sendrequest method, 220
factorize function, 238
feature extraction, 226-248
      news, 227-230
feature-extraction algorithm, 228
features, 277
features matrix, 234
feedfilter.py, 134
      entryfeatures method, 137
feedforward algorithm, 78-80
feedparser, 229
filtering
      documents (see document filtering)
      rule-based, 118
      spam
            threshold, 126
            tips, 126
financial fraud detection, 6
financial markets, 2
Fisher method, 127-131
      classifying items, 130
      combining probabilities, 129
      versus naïve Bayesian filter, 127
fitness function, 251
flight data, 116
flight searches, 101-106
full-text search engines (see search engines)
futures markets, 2

G[ Top ]
Gaussian function, 174, 321
      code, 321
Gaussian-weighted sum, 188
generatefeedvector.py, 31, 32
      getwords function, 31
generation, 97
genetic algorithms, 97-100, 306
      crossover or breeding, 97
      generation, 97
      mutation, 97
      population, 97
      versus genetic programming, 251
genetic optimization stopping criteria, 116
genetic programming, 99, 250-276
      breeding, 251
      building environment, 265-268
      creating initial population, 257
      crossover, 251
      data types, 274
            dictionaries, 274
            lists, 274
            objects, 274
            strings, 274
      diversity, 268
      elitism, 266
      exercises, 276
      fitness function, 251
      function types, 276
      further possibilities, 273-275
      hidden functions, 276
      measuring success, 260
      memory, 274
      mutating programs, 260-263
      mutation, 251
      nodes with datatypes, 276
      numerical functions, 273
      overview, 250
      parse tree, 253
      playing against real people, 272
      programs as trees, 253-257
      Python and, 253-257
      random crossover, 276
      replacement mutation, 276
      RoboCup, 252
      round-robin tournament, 270
      simple games, 268-273
            Grid War, 268
            playing against real people, 272
            round-robin tournament, 270
      stopping evolution, 276
      successes, 252
      testing solution, 259
      tic-tac-toe simulator, 276
      versus genetic algorithms, 251
Geocoding, 207
      API, 207
Gini impurity, 147, 319
      code, 320
global minimum, 94, 305
Goldberg, David, 8
Google, 1, 3, 5
      PageRank algorithm (see PageRank algorithm)
Google Blog Search, 134
gp.py, 254-258
      buildhiddenset function, 259
      constnode class, 254, 255
      crossover function, 263
      evolve function, 265, 268
      fwrapper class, 254, 255
      getrankfunction function, 267
      gridgame function, 269
      hiddenfunction function, 259
      humanplayer function, 272
      mutate function, 261
      node class, 254, 255
            display method, 256
            exampletree function, 255
            makerandomtree function, 257
      paramnode class, 254, 255
      rankfunction function
            breedingrate, 266
            mutationrate, 266
            popsize, 266
            probexp, 266
            probnew, 266
      scorefunction function, 260
      tournament function, 271
grade inflation, 12
Grid War, 268
      player, 276
group travel cost function, 116
group travel planning, 87-88
      car rental period, 89
      cost function (see cost function)
      departure time, 89
      price, 89
      time, 89
      waiting time, 89
GroupLens, 25
      web site, 27
groups, discovering, 29-53
      blog clustering, 53
      clusters of preferences (see clusters of preferences)
      column clustering (see column clustering)
      data clustering (see data clustering)
      exercises, 53
      hierarchical clustering (see hierarchical clustering)
groups, discovering (continued)
      K-means clustering (see K-means clustering)
      Manhattan distance, 53
      multidimensional scaling (see multidimensional scaling)
      supervised versus unsupervised learning, 30

H[ Top ]
heterogeneous variables, 178-181
      scaling dimensions, 180
hierarchical clustering, 33-38, 297
      algorithm for, 35
      closeness, 35
      dendrogram, 34
      individual clusters, 35
      output listing, 37
      Pearson correlation, 35
      running, 37
hill climbing, 92-94
      random-restart, 94
Holland, John, 100
Hollywood Stock Exchange, 5
home prices, modeling, 158-161
      Zillow API, 159-161
Hot or Not, xvii, 161-164
hotornot.py
      getpeopledata function, 162
      getrandomratings function, 162
HTML documents, parser, 310
hyperbolic tangent (tanh) function, 78

I[ Top ]
inbound link searching, 85
inbound links, 69-73
      PageRank algorithm, 70-73
      simple count, 69
      using link text, 73
independent component analysis, 6
independent features, 226-249
      alternative display methods, 249
      exercises, 248
      K-means clustering, 248
      news sources, 248
      optimizing for factorization, 249
      stopping criteria, 249
indexing, 54
      adding to index, 61
      building index, 58-62
      finding words on page, 60
      setting up schema, 59
      tables, 59
intelligence, evolving, 250-276
inverse chi-square function, 130
inverse function, 172
IP addresses, 141
item-based bookmark filtering, 28
Item-based Collaborative Filtering Recommendation Algorithms, 27
item-based filtering, 22-25
      getting recommendations, 24-25
      item comparison dataset, 23-24
      versus user-based filtering, 27

J[ Top ]
Jaccard coefficient, 14

K[ Top ]
Kayak, xvii, 116
      API, 101, 106
            data, 102
            firstChild, 102
            getElementsByTagName, 102
kayak.py, 102
      createschedule function, 105
      flightsearch function, 103
      flightsearchresults function, 104
      getkayaksession( ) function, 103
kernel
      best kernel parameters, 225
kernel methods, 197-225
      understanding, 211
kernel trick, 212-214, 290
      radial-basis function, 213
kernels
      other LIBSVM, 225
K-means clustering, 42-44, 248, 297-300
      function for doing, 42
k-nearest neighbors (kNN), 169-172, 293-296
      cross-validating, 294
      defining similarity, 171
      Euclidean distance, 293
      number of neighbors, 169
      scaling and superfluous variables, 294
      strengths and weaknesses, 296
      weighted average, 293
      when to use, 195

L[ Top ]
Last.fm, 5
learning from clicks (see neural network, artificial)
LIBSVM
      applications, 216
      matchmaker dataset and, 218
      other LIBSVM kernels, 225
      sample session, 217
LIBSVM library, 291
line angle penalization, 116
linear classification, 202-205
      dot-products, 203
      vectors, 203
LinkedIn, 110
lists of interests, 206
local minima, 94, 305
longitudes and latitudes of two points into distance in miles, converting, 208

M[ Top ]
machine learning, 3
      limits, 4
machine vision, 6
machine-learning algorithms (see algorithms)
Manhattan distance, 14, 53
marketing, 6
mass-and-spring algorithm, 111
matchmaker dataset, 197-219
      categorical features, 205-209
      creating new, 209
      decision tree algorithm, 199-201
      difficulties with data, 199
      LIBSVM, applying to, 218
      scaling data, 209-210
matchmaker.csv file, 198
mathematical formulas, 316-322
      conditional probability, 319
      dot-product, 322
      entropy, 320
      Euclidean distance, 316
      Gaussian function, 321
      Gini impurity, 319
      Pearson correlation coefficient, 317
      Tanimoto coefficient, 318
      variance, 321
      weighted mean, 318
matplotlib, 185, 313
      installation, 313
      usage example, 314
matrix math, 232-243
      algorithm, 237
      data matrix, 238
      displaying results, 240, 246
      factorize function, 238
      factorizing, 234
      multiplication, 232
      multiplicative update rules, 238
      NumPy, 236
      preparing matrix, 245
      transposing, 234
matrix, converting to, 230
maximum-margin hyperplane, 215
message boards, 117
minidom, 102
minidom API, 159
models, 3
MovieLens, using dataset, 25-27
multidimensional scaling, 49-52, 53, 300-302
      code, 301
      function, 50
      Pearson correlation, 49
multilayer perceptron (MLP) network, 74, 285
multiplicative update rules, 238
mutation, 97, 251, 260-263

N[ Top ]
naïve Bayesian classifier, 123-127, 279
      choosing category, 126
      strengths and weaknesses, 280
      versus Fisher method, 127
national security, 6
nested dictionary, 8
Netflix, 1, 5
network visualization
      counting crossed lines, 112
      drawing networks, 113
      layout problem, 110-112
network vizualization, 110-115
neural network, 55
      artificial, 74-84
            backpropagation, 80-82
            connecting to search engine, 83
            designing click-training network, 74
            feeding forward, 78-80
            setting up database, 75-77
            training test, 83
neural network classifier, 141
neural networks, 285-288
      backpropagation, and, 287
      black box method, 288
      combinations of words, and, 285
      multilayer perceptron network, 285
      strengths and weaknesses, 288
      synapses, and, 285
      training, 287
      using code, 287
news sources, 227-230
newsfeatures.py, 227
      getarticlewords function, 229
      makematrix function, 230
      separatewords function, 229
      shape function, 237
      showarticles function, 241, 242
      showfeatures function, 240, 242
      stripHTML function, 228
      transpose function, 236
nn.py
      searchnet class, 76
            generatehiddennode function, 77
            getstrength method, 76
            setstrength method, 76
nnmf.py
      difcost function, 237
non-negative matrix factorization (NMF), 232-239, 302-304
      factorization, 30
      goal of, 303
      update rules, 303
      using code, 304
normalization, 66
numerical predictions, 167
numpredict.py
      createcostfunction function, 182
      createhiddendataset function, 183
      crossvalidate function, 177, 182
      cumulativegraph function, 185
      distance function, 171
      dividedata function, 176
      euclidian function, 171
      gaussian function, 175
      getdistances function, 171
      inverseweight function, 173
      knnestimate function, 171
      probabilitygraph function, 187
      probguess function, 184, 185
      rescale function, 180
      subtractweight function, 173
      testalgorithm function, 177
      weightedknn function, 175
      wineprice function, 168
      wineset1 function, 168
      wineset2 function, 178
NumPy, 236, 312
      installation on other platforms, 313
      installation on Windows, 312
      usage example, 313
      using, 236

O[ Top ]
online technique, 296
Open Web APIs, xvi
optimization, 86-116, 181, 196, 304-306
      annealing starting points, 116
      cost function, 89-91, 304
      exercises, 116
      flight searches (see flight searches)
      genetic algorithms, 97-100
            crossover or breeding, 97
            generation, 97
            mutation, 97
            population, 97
      genetic optimization stopping criteria, 116
      group travel cost function, 116
      group travel planning, 87-88
            car rental period, 89
            cost function (see cost function)
            departure time, 89
            price, 89
            time, 89
            waiting time, 89
      hill climbing, 92-94
      line angle penalization, 116
      network visualization
            counting crossed lines, 112
            drawing networks, 113
            layout problem, 110-112
      network vizualization, 110-115
      pairing students, 116
      preferences, 106-110
            cost function, 109
            running, 109
            student dorm, 106-108
      random searching, 91-92
      representing solutions, 88-89
      round-trip pricing, 116
      simulated annealing, 95-96
      where it may not work, 100
optimization.py, 87, 182
      annealingoptimize function, 95
      geneticoptimize function, 98
            elite, 99
            maxiter, 99
            mutprob, 99
            popsize, 99
      getminutes function, 88
      hillclimb function, 93
      printschedule function, 88
      randomoptimize function, 91
      schedulecost function, 90

P[ Top ]
PageRank algorithm, 5, 70-73
pairing students, 116
Pandora, 5
parse tree, 253
Pearson correlation
      hierarchical clustering, 35
      multidimensional scaling, 49
Pearson correlation coefficient, 11-14, 317
      code, 317
Pilgrim, Mark, 309
polynomial transformation, 290
poplib, 140
population, 97, 250, 306
      diversity and, 257
Porter Stemmer, 61
Pr(Document), 140
prediction markets, 5
price models, 167-196
      building sample dataset, 167-169
      eliminating variables, 196
      exercises, 196
      item types, 196
      k-nearest neighbors (kNN), 169
      laptop dataset, 196
      leave-one-out cross-validation, 196
      optimizing number of neighbors, 196
      search attributes, 196
      varying ss for graphing probability, 196
probabilities, 319
      assumed probability, 122
      Bayes' Theorem, 125
      combining, 129
      conditional probability, 122
      graphing, 186
      naïve Bayesian classifier (see naïve Bayesian classifier)
      of entire document given classification, 124
product marketing, 6
public message boards, 117
pydelicious, 314
      installation, 314
      usage example, 314
pysqlite, 58, 311
      importing, 132
      installation on other platforms, 311
      installation on Windows, 311
      usage example, 312
Python
      advantages of, xiv
      tips, xv
Python Imaging Library (PIL), 38, 309
      installation on other platforms, 310
      usage example, 310
      Windows installation, 310
Python, genetic programming and, 253-257
      building and evaluating trees, 255-256
      displaying program, 256
      representing trees, 254-255
      traversing complete tree, 253

Q[ Top ]
query layer, 74
querying, 63-64
      query function, 63

R[ Top ]
radial-basis function, 212
random searching, 91-92
random-restart hill climbing, 94
ranking
      content-based (see content-based ranking)
      queries, 55
recommendation engines, 7-28
      building del.icio.us link recommender, 19-22
            building dataset, 20
            del.icio.us API, 20
            recommending neighbors and links, 22
      collaborative filtering, 7
      collecting preferences, 8-9
            nested dictionary, 8
recommendation engines (continued)
      exercises, 28
      finding similar users, 9-15
            Euclidean distance score, 10-11
            Pearson correlation coefficient, 11-14
            ranking critics, 14
            which metric to use, 14
      item-based filtering, 22-25
            getting recommendations, 24-25
            item comparison dataset, 23-24
      item-based filtering versus user-based filtering, 27
      matching products, 17-18
      recommending items, 15-17
            weighted scores, 15
      using MovieLens dataset, 25-27
recommendations based on purchase history, 5
recommendations.py, 8
      calculateSimilarItems function, 23
      getRecommendations function, 16
      getRecommendedItems function, 25
      loadMovieLens function, 26
      sim_distance function, 11
      sim_pearson function, 13
      topMatches function, 14
      transformPrefs function, 18
recursive tree binding, 149-151
returning ranked list of documents from query, 55
RoboCup, 252
round-robin tournament, 270
round-trip pricing, 116
RSS feeds
      counting words in, 31-33
      filtering, 134-136
      parsing, 309
rule-based filters, 118

S[ Top ]
scaling and superfluous variables, 294
scaling data, 209-210
scaling dimensions, 180
scaling, optimizing, 181-182
scoring metrics, 69-73
      PageRank algorithm, 70-73
      simple count, 69
      using link text, 73
search engines
      Boolean operations, 84
      content-based ranking (see content-based ranking)
      crawler (see crawler)
      document search, long/short, 84
      exact matches, 84
      exercises, 84
      inbound link searching, 85
      indexing (see indexing)
      overview, 54
      querying (see querying)
      scoring metrics (see scoring metrics)
      vertical, 101
      word frequency
            bias, 84
      word separation, 84
searchengine.py
      addtoindex function, 61
      crawler class, 55, 57, 59
      createindextables function, 59
      distancescore function, 68
      frequencyscore function, 66
      getentryid function, 61
      getmatchrows function, 63
      gettextonly function, 60
      import statements, 57
      importing neural network, 83
      inboundlinkscore function, 69
      isindexed function, 58, 62
      linktextscore function, 73
      normalization function, 66
      searcher class, 65
            nnscore function, 84
            query method, 83
      searchnet class
            backPropagate function, 81
            trainquery method, 82
            updatedatabase method, 82
      separatewords function, 60
searchindex.db, 60, 62
searching, random, 91-92
self-organizing maps, 30
sigmoid function, 78
signups, predicting, 142-144
simulated annealing, 95-96, 305
socialnetwork.py, 111
      crosscount function, 112
      drawnetwork function, 113
spam filtering, 117
      method, 4
      threshold, 126
      tips, 126
SpamBayes plug-in, 127
spidering, 56 (see crawler)
SQLite, 58
      embedded database interface, 311
      persisting trained classifiers, 132-133
      tables, 59
squaring numbers, 177
stemming algorithm, 61
stochastic optimization, 86
stock market analysis, 6
stock market data, 243-248
      closing price, 243
      displaying results, 246
      Google's trading volume, 248
      preparing matrix, 245
      running NMF, 246
      trading volume, 243
      Yahoo! Finance, 244
stockfeatures.txt file, 247
stockvolume.py, 245, 246
      factorize function, 246
student dorm preference, 106-108
subtraction function, 173
supervised classifiers, 226
supervised learning methods, 29, 277-296
supply chain optimization, 6
support vectors, 216
support-vector machines (SVMs), 197-225, 289-292
      Bayesian classifier, 225
      building model, 224
      dot-products, 290
      exercises, 225
      hierarchy of interests, 225
      kernel trick, 290
      LIBSVM, 291
      optimizing dividing line, 225
      other LIBSVM kernels, 225
      polynomial transformation, 290
      strengths and weaknesses, 292
synapses, 285

T[ Top ]
tagging similarity, 28
Tanimoto coefficient, 47, 318
      code, 319
Tanimoto similarity score, 28
temperature, 306
test sets, 176
third-party libraries, 309-315
      Beautiful Soup, 310
      matplotlib, 313
            installation, 313
            usage example, 314
      NumPy, 312
            installation on other platforms, 313
            installation on Windows, 312
            usage example, 313
      pydelicious, 314
            installation, 314
            usage example, 314
      pysqlite, 311
            installation on other platforms, 311
            installation on Windows, 311
            usage example, 312
      Python Imaging Library (PIL), 309
            installation on other platforms, 310
            usage example, 310
            Windows installation, 310
      Universal Feed Parser, 309
trading behavior, 5
trading volume, 243
training
      Bayesian classifier, 278
      decision tree classifier, 281
      neural networks, 287
      sets, 176
transposing, 234
tree binding, recursive, 149-151
treepredict.py, 144
      buildtree function, 149
      classify function, 153
      decisionnode class, 144
      divideset function, 145
      drawnode function, 153
      drawtree function, 152
      entropy function, 148
      mdclassify function, 157
      printtree function, 151
      prune function, 155
      split_function, 146
      uniquecounts function, 147
      variance function, 158
trees (see decision trees)

U[ Top ]
uneven distributions, 183-188
      graphing probabilities, 185
      probability density, estimating, 184
Universal Feed Parser, 31, 134, 309
unsupervised learning, 30
unsupervised learning techniques, 296-302
unsupervised techniques, 226
update rules, 303
urllib2, 56, 102
Usenet, 117
user-based collaborative filtering, 23
user-based efficiency, 28
user-based filtering
      versus item-based filtering, 27

V[ Top ]
variance, 321
      code, 321
varying assumed probabilities, 140
vector angles, calculating, 322
vectors, 203
vertical search engine, 101
virtual features, 141

W[ Top ]
weighted average, 175, 293
weighted mean, 318
      code, 318
weighted neighbors, 172-176
      bell curve, 174
      Gaussian function, 174
      inverse function, 172
      subtraction function, 173
      weighted kNN, 175
weighted scores, 15
weights matrix, 235
Wikipedia, 2, 56
word distance, 65, 68
word frequency, 64, 66
      bias, 84
word separation, 84
word usage patterns, 226
word vectors, 30-33
      clustering blogs based on word frequencies, 30
      counting words in feed, 31-33
wordlocation table, 63, 64
words commonly used together, 40

X[ Top ]
XML documents, parser, 310
xml.dom, 102

Y[ Top ]
Yahoo! application key, 207
Yahoo! Finance, 53, 244
Yahoo! Groups, 117
Yahoo! Maps, 207
yes/no questions, 206

Z[ Top ]
Zebo, 44
      scraping results, 45
      web site, 45
Zillow API, 159-161
zillow.py
      getaddressdata function, 159
      getpricelist function, 160

	

Zurück zu Programming Collective Intelligence


Themen

Buchreihen

Special Interest

International Sites

O'Reilly China O'Reilly USA O'Reilly Japan O'Reilly Taiwan