-
Notifications
You must be signed in to change notification settings - Fork 217
Closed
Labels
Description
Enter an issue title
BBS is hung with huge number of goroutines
Summary
Most of time BBS performs very good, but sometimes the bbs goroutines got increased from about 6K to 100K in 3 minutes (according to firehose metric bbs.numGoRoutines), then BBS could not give responses to the requests from cells.
Steps to Reproduce
This is an intermittent issue, and we could not reporduce it by intention.
Diego repo
BBS
Environment Details
diego/2.42.0 and stemcell ubuntu-xenial/621.64.1
Possible Causes or Fixes
There are a lot of below hung goroutines in the bbs goroutine dump. If I understand correctly this code: https://github.com/cloudfoundry/lager/blob/master/writer_sink.go#L57 , it needs to acquire the lock to write every line of log, and sometimes if the lock is not efficient, tons of goroutines would be blocked.
goroutine 276586954 [semacquire]:
sync.runtime_SemacquireMutex(0xc00028e31c, 0xc001734100, 0x1)
/var/vcap/data/packages/golang-1-linux/e58a3e47148fdd4fe1de759f1b85a6e579e11255/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00028e318)
/var/vcap/data/packages/golang-1-linux/e58a3e47148fdd4fe1de759f1b85a6e579e11255/src/sync/mutex.go:138 +0xfc
sync.(*Mutex).Lock(...)
/var/vcap/data/packages/golang-1-linux/e58a3e47148fdd4fe1de759f1b85a6e579e11255/src/sync/mutex.go:81
code.cloudfoundry.org/lager.(*prettySink).Log(0xc00028e300, 0xc172e59240, 0x14, 0xc0002d0be8, 0x3, 0xc1903d6c80, 0x39, 0x1, 0xc1b5393530, 0x0, ...)
/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/writer_sink.go:57 +0x176
code.cloudfoundry.org/lager.(*truncatingSink).Log(0xc00028e320, 0xc172e59240, 0x14, 0xc0002d0be8, 0x3, 0xc1903d6c80, 0x39, 0x1, 0xc1b5393530, 0x0, ...)
/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/truncating_sink.go:31 +0x1d0
code.cloudfoundry.org/lager.(*ReconfigurableSink).Log(0xc00028e340, 0xc172e59240, 0x14, 0xc0002d0be8, 0x3, 0xc1903d6c80, 0x39, 0x1, 0xc1b5393500, 0x0, ...)
/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/reconfigurable_sink.go:28 +0x80
code.cloudfoundry.org/lager.(*logger).Info(0xc002948360, 0xe7d646, 0x8, 0x0, 0x0, 0x0)
/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:107 +0x33b
code.cloudfoundry.org/bbs/db/sqldb.(*SQLDB).StartActualLRP(0xc0003d4320, 0x100fbc0, 0xc0c1be8200, 0x10186e0, 0xc002948360, 0xc09d18da40, 0xc0d3fc43a0, 0xc0c1be8280, 0x0, 0x0, ...)
/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/db/sqldb/actual_lrp_db.go:272 +0x226
code.cloudfoundry.org/bbs/controllers.(*ActualLRPLifecycleController).StartActualLRP(0xc000142000, 0x100fbc0, 0xc0c1be8200, 0x10186e0, 0xc14888f800, 0xc09d18da40, 0xc0d3fc43a0, 0xc0c1be8280, 0x0, 0x0)
/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/controllers/actual_lrp_lifecycle_controller.go:116 +0x249
code.cloudfoundry.org/bbs/handlers.(*ActualLRPLifecycleHandler).StartActualLRP(0xc000464540, 0x10186e0, 0xc14888f800, 0x100cf80, 0xc18b53a460, 0xc200181b00)
/var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/handlers/actual_lrp_lifecycle_handler.go:74 +0x319
code.cloudfoundry.org/bbs/handlers/middleware.LogWrap.func3(0x100cf80, 0xc18b53a460, 0xc200181b00)