segmentation fault - NGINX + PHP5-FPM segfaults under high load -
i have been dealing problem day , driving me insane. google results , searches here lead dead ends. hope can work me provide solution myself , future victims. here go.
i running popular website on 3m page views day. on average 34 page views per second, more realistically, during peak hours, gets on 300 page views per second. think of these requests.
i running ubuntu 10.04 64-bit server 2 e5620 cpus, 12gb ram, , micron p300 6gb/s ssd. during peak hours cpu , memory load average (20-30% cpu , half of memory used).
the software powers site is: nginx, mysql, php5-fpm, php-apc, , memcached. ok, meat of post, here error logs. there bunch of these errors logged.
/var/log/php5-fpm
jul 20 14:49:47.289895 [notice] fpm running, pid 29373
jul 20 14:49:47.337092 [notice] ready handle connections
jul 20 14:51:23.957504 [error] [pool www] unable retrieve process activity of 1 or more child(ren). try again later.
jul 20 14:51:41.846439 [warning] [pool www] child 29534 exited code 1 after 114.518174 seconds start
jul 20 14:51:41.846797 [notice] [pool www] child 29597 started
jul 20 14:51:41.896653 [warning] [pool www] child 29408 exited on signal 11 sigsegv after 114.596706 seconds start
jul 20 14:51:41.897178 [notice] [pool www] child 29598 started
jul 20 14:51:41.903286 [warning] [pool www] child 29398 exited code 1 after 114.605761 seconds start
jul 20 14:51:41.903719 [notice] [pool www] child 29600 started
jul 20 14:51:41.907816 [warning] [pool www] child 29437 exited code 1 after 114.601417 seconds start
jul 20 14:51:41.908253 [notice] [pool www] child 29601 started
jul 20 14:51:41.916002 [warning] [pool www] child 29513 exited code 1 after 114.592514 seconds start
jul 20 14:51:41.916501 [notice] [pool www] child 29602 started
jul 20 14:51:41.916558 [warning] [pool www] child 29494 exited on signal 11 sigsegv after 114.597355 seconds start
jul 20 14:51:41.916873 [notice] [pool www] child 29603 started
jul 20 14:51:41.921389 [warning] [pool www] child 29502 exited code 1 after 114.600405 seconds start
/var/log/nginx/error.log 2011/07/20 15:48:42 [error] 29583#0: *569743 readv() failed (104: connection reset peer) while reading upstream, client: 77.223.197.193, server: domain.com, request: "get /favicon.ico http/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29578#0: *571695 readv() failed (104: connection reset peer) while reading upstream, client: 150.70.64.196, server: domain.com, request: "get /page http/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29581#0: *571050 readv() failed (104: connection reset peer) while reading upstream, client: 110.136.157.66, server: domain.com, request: "get /page http/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29581#0: *564892 readv() failed (104: connection reset peer) while reading upstream, client: 110.136.161.214, server: domain.com, request: "get /page http/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29585#0: *456171 readv() failed (104: connection reset peer) while reading upstream, client: 93.223.33.135, server: domain.com, request: "get /favicon.ico http/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29585#0: *471192 readv() failed (104: connection reset peer) while reading upstream, client: 74.90.33.142, server: domain.com, request: "get /page http/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29580#0: *570132 readv() failed (104: connection reset peer) while reading upstream, client: 180.246.182.191, server: domain.com, request: "get /page http/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
finally, want point out did try disable php-apc see if bug opt cacher, segfaults still persisted. have php5-suhosin installed , disabled too, errors still keep happening.
this issue happend me.
php5-fpm having segfaults on of children. in case, had 0bytes available on harddisk. quick log shredding stopped segfaults.
Comments
Post a Comment