Building Scalable Web Sites
Building, scaling, and optimizing the next generation of web applications

First Edition Juni 2006
ISBN 978-0-596-10235-7
Seiten 348
EUR38.00, SFR64.90


Weitere Informationen zu diesem Buch

Inhaltsverzeichnis | Index | Probekapitel | Kolophon | Rezensionen |


Index

	
[ A ], [ B ], [ C ], [ D ], [ E ], [ F ], [ G ], [ H ], [ I ], [ J ], [ K ], [ L ], [ M ], [ N ], [ O ], [ P ], [ R ], [ S ], [ T ], [ U ], [ V ], [ W ], [ X ],

A[ Top ]
active/active pairs, 209
active/passive redundant pairs, 209
AddSlashes( ) function, 112
alerting, 285-287
      low-watermark checks, 286
      resource-level monitoring, 286
      threshold checks, 286
      uptime checks, 285
Analog, 259
annotation logs, 31
Apache
      logfiles, 258
      statistics, 276
APIs, 89, 288
      abuse, 315-318
            caching, 317
            monitoring with API keys, 315
            throttling, 316
      API transports, 307-315
            REST, 307
            SOAP, 311-313
            XML-RPC, 308-310
      authentication, 318-321
            MAC (message authentication code), 320
            plain text, 319
            token-based systems, 320
      data feeds (see data feeds)
      mobile content, 300
            WAP (Wireless Application Protocol), 300
            XHTML mobile profile, 302
      UTF-8 and, 89
      web services, 304-307
application development
      coding standards, 64-66
      development environments (see development environments)
      source control and, 44
application monitoring, 267-287
      alerting (see alerting)
      collection of statistical data, 267
            Apache, 276
            bandwidth, 268
            custom visualizations, 283-285
            long-term statistics, 269
            memcached statistics, 278
            MySQL, 273-276
            Squid, 278-283
architecture, 3
      layered software architecture, 6-14
            layer functions, 13
      software interface design, 11-14
ARP (Address Resolution Protocol), 182
ASCII, UTF-8 compatiblity, 79
Atom, 293
auth-param and auth-scheme, 143

B[ Top ]
BCP (business continuity planning), 22
BDB (Berkeley DB), 231
beacons, 259-262
BGP (Border Gateway Protocol), flapping, 209
binary safe, 82
blacklists, 101
blame logs, 31
BOM (byte order mark), 78
Bonnie and Bonnie++, 177
bottlenecks, 162
      CPU usage, 168
      databases, 188-201
            caching, 197-200
            denormalization, 200
            query and index optimization, 193-197
            query profiling, 191-192
            query spot checks, 189-191
      external services, 188
      general solutions, 174
      I/O (input/output), 175-184
            disk I/O, 175-179
            memory I/O, 184
            network I/O, 179-183
      identifying, 162
      memory and swap, 185-188
      templates, speeding up, 173
boundary testing, 68
branches, 32
bug tracking (see issue tracking)
Bugzilla, 58
business logic, 7
byte order mark (BOM), 78

C[ Top ]
caching, 253-255
      data, 254
      HTTP requests, 255
CADT (Cascade of Attention-Deficit Teenagers), 62
capacity planning, 207
carriage returns, filtering, 98
Carrier Sense Multiple Access with Collision Detection protocol, 182
Cascade of Attention-Deficit Teenagers (CADT), 62
character sets (see Unicode; UTF-8)
charset property, 80
clustering, 240
code examples, xv
code points, 76
code profiling, 170
code separation, 9
coding standards, 64-66
cold spares, 208
collisions, 182
co-location (colo), 18
concurrency, xi
Concurrent Version System (CVS), 36-38
CPU usage, 168
cross-site scripting (see XSS)
CSMA/CD protocol, 182
cURL open source URL file transfer library, 144
CVS (Concurrent Version System), 36-38

D[ Top ]
data feeds, 288-300
      Atom, 293
      feed authentication, 297-300
      feed auto-discovery, 294
      feed templating, 295-297
      OPML (outline processor markup language), 297
      RDF (Resource Description Format), 292
      RSS, 289-291
data integrity, 90
      blacklists and whitelists, 101
      filtering control characters, 98
      filtering HTML, 99-102
      good, valid, and invalid, 92
      UTF-8 filtering, 93-98
            iconv extension, using, 96
            PHP state machine verifier, using, 96
            regular expressions, using, 94
data porn, 269
database partitioning, 240-244
database schema changes, 54
database stored procedures, 11
db_query( ) function, 189
degrading gracefully, 132
development environments, 27
      creating, 47
      source control (see source control)
      staging environments, 47
diffs, 29
disk I/O, 175-179
      benchmarking, 177
      Constant Linear Velocity (CLV), 177
      disk performance measures, 177
      read and write caches, 178
      Zoned Constant Angular Velocity (ZCAV), 177
document type definition (DTD), 154
documentation, 44
      recovery scenarios and, 208
DOM (Document Object Model), 154
DTD (document type definition), 154

E[ Top ]
email, 117
      character sets and encodings, 130
      injecting into web applications, 119
      MIME format, 121-125
      receiving, 117
      TNEF attachments, 125-127
      unit testing, 134
      user accounts, tieing emails to, 132-134
      UTF-8, usage with, 85
      wireless carriers, received through, 127-130
encoded-words, 86
escape( ) function, 87
escape_utf8( ) function, 88
Expat, 155

F[ Top ]
federation, 242-244, 251-253
feed authentication, 297-300
feed auto-discovery, 294
feed templating, 295-297
filesystems, 246
fixed-width encoding, 75
flapping, 209
FogBugz, 57
fonts, 74
fsockopen( ) function, 137, 143

G[ Top ]
Ganglia, 270-273
gettext function, 72
global server load balancing (GSLB), 223
globalization, 70
glyphs, 73
good data, 92
graphemes, 76
GSLB (global server load balancing), 223

H[ Top ]
hardware components, 167
hardware platforms, 16-19
      co-located hardware, 18
      commodity hardware, 16
      dedicated hardware, 18
      growth, 19-22
            connectivity, 22
            importing, shipping, and staging, 20
            NOC facilities, 21
            power, 21
            space, 21
      self-hosting, 19
      shared hardware, 17
      software matching to, 15
      vendors, choosing, 20
hardware redundancy, 22
head, 32
header( ) function, 81
heap, 232
Hitbox, 262
horizontal scaling, 205-207
      administration costs, 206
      hardware utilization, 207
hot failover, 147
hot spares, 208
HTML
      allowing as input, security issues, 100
      filtering, 99-102
HtmlSpecialChars( ) function, 104
HTTP, 140-145
      authentication, 142
      Authorization header, 143
      cURL open source URL file transfer library, 144
      making requests, 143
      Perl LWP modules, 145
      request and response cycle, 140
      request and response header formats, 141
      response codes, 141

I[ Top ]
i18n, 70
InfiniBand, 212
InfiniteStorage, 251
interaction logic, 8
internationalization, 70
      early character sets, 70
      W3C portal, 73
      web applications, in, 70
            character set issues, 70
intval( ) function, 113
invalid data, 92
iostat utility, 175
ISO 10646, 76
issue tracking, 55-62
      bugs, 59
      CADT, 62
      features, 60
      issue management strategy, 60
      operations, 60
      support requests, 60

J[ Top ]
Java, UTF-8 usage in, 83
JavaScript, UTF-8 usage in, 87

K[ Top ]
KickStart, 206

L[ Top ]
L10n, 70
layered architecture, 9
      separation, reasons for, 11
Lerdorf, Rasmus, 213
libcurl, 144
libxml, 155
ligatures, 76
Linux Virtual Server (LVS), 217
load balancing, 214-227
      GSLB, 223
      hardware, 215-216
            load balancer products, 216
      huge-scale balancing, 223-225
      layer 4, 218
      layer 7, 218-222
      non-HTTP traffic, 225-227
      software, 217
      web statistics tracking with load balancers, 264
locale, 71
localization, 70
      multiple frontends, 73
      multiple template sets, 72
      string substitution, 72
      web applications in, 71
logical components, 165
logs, 29
      analysis, 259
      rotating, 258
LVS (Linux Virtual Server), 217

M[ Top ]
MAC (message authentication code), 320
magic_quotes_gpc directive, 113
maint, 32
Mantis Bug Tracker, 57
markup, 9
markup layer, 8
Mason, 10
master-slave replication, 232
mbstring (multibyte string) extension, 83
memcached statistics, 278
memstat utility, 187
merge operations and conflicts, 30
merged config.php, 51
merging, 32
meta tag, 81
MIME format, 121-125
mirroring, 249
mod_log_spread, 264
modes, 249
modules, 28, 32
      module namespaces, 10
MRTG (Multi-Router Traffic Grapher), 268
MTBF (Mean Time Between Failures), 146
Multi-Router Traffic Grapher (MRTG), 268
multihoming, 183, 212
MVCC (Multi-Versioned Concurrency Control), 231
MyISAM, 230
MySQL, x, xiii
      log files, 274
      pre-compiled binaries, 17
      scaling, 227-244
            BDB, 231
            heap, 232
            InnoDB, 231
            MyISAM, 230
            replication (see MySQL replication)
            storage backends, 228
      statistics collection, 273-276
      string manipulation functions, 84
      UTF-8, usage with, 84
MySQL replication, 232-240
      master-master replication, 236-237
      master-slave replication, 232
      replication failure, 238
      replication lag, 239
      tree replication, 234-236
mysqli extension, 114

N[ Top ]
Nagios, 285
NAS (network attached storage), 251
.NET and UTF-8, 83
NetApp "filers", 251
network I/O, 179-183
      collisions, 182
      misconfiguration, 181
      netstat utility, 179
network operations center (NOC), 18
networking, 23-25
      OSI model, 23
      TCP/IP model, 23
NFS (network filesystem), 247
NOC (network operations center), 18, 21

O[ Top ]
OmniStor, 251
one step builds, 46-55
      build tools, 49-52
      live editing, 46
      non-automated production processes, 54
            database schema changes, 54
            software and hardware configuration changes, 55
      release management, 52
      release process, 49
      work environments, 47-49
opcode caching, 172
OPML (outline processor markup language), 297
optimization, premature, 163

P[ Top ]
page logic, 8
PDO, 116
Perforce, 40-41
Perl, xiii
      Mason, 10
      module namespaces, 10
      Template Toolkit, 10
      UTF-8 and, 83
Perlbal, 217
persistent storage, 6
personalization, 70
PHP, xiii
      Savant, 10
      Smarty, 10
      Unicode and iconv enxtension, 96
      UTF-8, usage with, 82
physical components, 165
Pound, 217
pre-built software, advantages of, 17
preg_replace( ) function, 98
premature optimization, 163
presentation layer, 9
projects, 28, 32
protocols for data storage, 247
pstree command, 186

R[ Top ]
RAID (Redundant Arrays of Independent Disks), 249
RCS (Revision Control System), 36
RDF (Resource Description Format), 292
read caches, 178
real servers, 215
recovery scenarios, 208
Red Hat Kickstart, 206
Redundant Arrays of Independent Disks (RAID), 249
release management, 52
release process, 49
remote services, 136
      asynchronous systems, 149-152
            callbacks, 150
            tickets, 151
      bad user experiences, 149
      handling service failures, 136
      HTTP, 140-145
      lightweight protocols, 157-161
      memory usage, 158
      network speed, 158
      proprietary protocols, 160-161
      redundancy, 145-149
            hot failover, 147
            spare node calculation, 145
      sockets and, 137-140
      XML parsing speed, 159
      XML writing speed, 159
      XML, exchanging, 153-157
Report Magic, 259
repository, 28
Request Tracker, 58
request tracking (see issue tracking)
resource level monitoring, 286
REST protocol, 155, 307
Revision Control System (RCS), 36
rollback, 29
rotated daemon, 259
rotation of logfiles, 258
round robin databases, 268
RRDTool (round robin database tool), 270
RSS, 289-291

S[ Top ]
Samba, 248
SAN (storage area network), 251
Savant, 10
SAX (Simple API for XML), 154
scaling, 202
      data storage, 246-255
            caching (see caching)
            federation, 251-253
            filesystems, 246
            protocols, 247
            RAID, 249
            Samba, 248
      database partitioning, 240-244
            clustering, 240
            federation, 242-244
      hardware platforms, 204-211
            capacity planning, 207
            horizontal scaling, 205-207
            redundancy, 208-211
            vertical scaling, 204
      large database scaling, 244
            write-through cache layers, 245
      load balancing (see load balancing)
      matching cost of operation and computing power, 206
      MySQL, 227-244
            BDB, 231
            heap, 232
            InnoDB, 231
            MyISAM, 230
            replication (see MySQL replication)
            storage backends, 228
      networks, 211
      PHP, 212
      scalability criteria, 203
SCM (software configuration management), 28
security
      cross-site scripting (see XSS)
      HTML filtering, 99-102
            allowing HTML as input, 100
            balancing of tags, 101
            blacklists and whitelists, 101
      SQL injection attacks, 110-116
      SQL injection attacks (see SQL injection attacks)
shared nothing architecture, 213
SHOW PROCESSLIST command, 273
SHOW SLAVE STATUS command, 273
SHOW STATUS command, 273
Simple Network Management Protocol (SNMP), 268
Smarty, 10
Smarty templates, 173
SNMP (Simple Network Management Protocol), 268
SOAP, 157, 311-313
sockets, 137-140
      failure checking, 137
software configuration management (SCM), 28
software interface design, 11-14
source code and development cycles, x
source control, 28-45
      application development and, 44
      branching, 32
      build tools, 45
      diffs, 29
      documentation, 44
      locking or reserved check outs, 31
      logs, 29
      merging, 32
      multiuser editing and merging, 29
            annotation or blame, 30
            merge conflicts, 30
      products, 36-43
            CVS, 36-38
            Perforce, 40-41
            RCS, 36
            Subversion, 39-40
            VSS, 42-43
      projects or modules, 32
      rollback, 29
      scaling of the development model, 63
      software configurations, 44
      tagging, 32
      utilities, 33-36
            commit databases, 35
            commit hooks, 35
            commit log mailing lists, 34
            commit log RSS feeds, 34
            shell and editor integration, 33
            web interfaces, 34
      versioning, 28
      what to leave out, 45
Spread toolkit, 263-264
SQL injection attacks, 110-116
Squid, 255, 278-283
sticky sessions, 215
storage, persistent, 6
stored procedures, 11
StorEdge, 251
String.getCodeAt( ) method, 87
striping, 249
Subversion, 39-40
surrogate pairs, 76
swap, 185
System Imager and System Configurator, 206

T[ Top ]
tagging, 32
Template Toolkit, 10
templating, 10, 12
      localization using, 72
      speeding up templates, 173
testing
      boundary testing, 68
      email functions, 134
thrashing, 187
threshold checks, 286
throttling, 316
TNEF attachments, 125-127
token-based authentication systems, 320
Trac, 59
tree replication, 234-236
trunk, 32

U[ Top ]
UBB (Ultimate Bulletin Board), x
undo history, 28
Unicode, 73-79
      ASCII and, 74
      code point formatting conventions, 75
      code points, 76
      encodings, 75
      graphemes, 76
      origins, 71
      Unicode categories, 77
      UTF-8 (see UTF-8)
Unicode Transformation Format 8 (see UTF-8)
uptime checks, 285
UTF-8, 79-89
      APIs and, 89
      ASCII compatibility, 79
      binary sorts, 80
      byte layout, 79, 93
      email, using with, 85
            encoded words, 86
      endian-ness and, 80
      filtering for data integrity, 93-98
      JavaScript, 87
      MySQL, 84
      Perl, Java, and .NET, 83
      PHP and, 82
            operations requiring Unicode support, 82
      web applications, 80
            input, 82
            output, 80
      XML documents and feeds, 81

V[ Top ]
valid data, 92
variable length encoding, 75
versioning, 28
vertical partitioning, 240
vertical scaling, 204
VIP (virtual IP), 215
virtual servers, 215
Visual Source Safe (VSS), 42-43
VM (virtual memory) mode, 186
vmstat, 186
VSS (Visual Source Safe), 42-43
vulnerability scanners, 104

W[ Top ]
WAE (Wireless Application Environment), 301
WAP (Wireless Application Protocol), 300
warm spares, 208
web applications, ix, 1-3
      APIs (see APIs)
      architecture, 3
            layered architecture, 6-14
      design, 3
      development, 4
      development and source control, 44
      enabling email for, 119
      hardware components, 167
      hardware, matching to, 15
      internationalization, 70
      languages, technologies, and DBMSes, choosing, 25
      localization, 71
      logical components, 165
      monitoring (see application monitoring)
      pre-built software, 17
      scaling, 14
      scaling (see scaling)
      testing, 66
            manual testing, 67
            regression testing, 66
      testing environments, 47-48
Web Applications Scale of Stupidity, 13
web services, 304-307
web site localization, 71
web site usage tracking, 257
web statistics tracking, 257-266
      beacons, 259-262
      custom metrics, 265
      load balancers, 264
      Spread toolkit, 263-264
Webalizer, 259
web-based issue tracking tools, 56
WebSideStory, 262
Webtrends, 262
whitelists, 101
Wireless Application Environment (WAE), 301
wireless emails, 127-130
write caches, 178
write-through cache layers, 245

X[ Top ]
XML, 153
      CDATA segments, 153
      DTD (document type definition), 154
      exchanging over a connection, 153-157
            REST protocol, 155
            SOAP, 157
            XML-RPC, 156
      namespaces, 153
      parsing, 153-155
      parsing speed, 159
      writing speed, 159
      XML parsers, 154
XML-RPC, 156, 308-310
XSS (cross-site scripting), 102-110
      protocol filtering, 108
      tag and bracket balancing, 105
      user input holes, 105

	

Zurück zu Building Scalable Web Sites


Themen

Buchreihen

Special Interest

International Sites

O'Reilly China O'Reilly France O'Reilly USA O'Reilly Japan O'Reilly Taiwan