Skip to content

BBS is hung with huge number of goroutines #505

@JimmyMa

Description

@JimmyMa

Enter an issue title

BBS is hung with huge number of goroutines

Summary

Most of time BBS performs very good, but sometimes the bbs goroutines got increased from about 6K to 100K in 3 minutes (according to firehose metric bbs.numGoRoutines), then BBS could not give responses to the requests from cells.

Steps to Reproduce

This is an intermittent issue, and we could not reporduce it by intention.

Diego repo

BBS

Environment Details

diego/2.42.0 and stemcell ubuntu-xenial/621.64.1

Possible Causes or Fixes

There are a lot of below hung goroutines in the bbs goroutine dump. If I understand correctly this code: https://github.com/cloudfoundry/lager/blob/master/writer_sink.go#L57 , it needs to acquire the lock to write every line of log, and sometimes if the lock is not efficient, tons of goroutines would be blocked.

goroutine 276586954 [semacquire]:
sync.runtime_SemacquireMutex(0xc00028e31c, 0xc001734100, 0x1)
        /var/vcap/data/packages/golang-1-linux/e58a3e47148fdd4fe1de759f1b85a6e579e11255/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc00028e318)
        /var/vcap/data/packages/golang-1-linux/e58a3e47148fdd4fe1de759f1b85a6e579e11255/src/sync/mutex.go:138 +0xfc
sync.(*Mutex).Lock(...)
        /var/vcap/data/packages/golang-1-linux/e58a3e47148fdd4fe1de759f1b85a6e579e11255/src/sync/mutex.go:81
code.cloudfoundry.org/lager.(*prettySink).Log(0xc00028e300, 0xc172e59240, 0x14, 0xc0002d0be8, 0x3, 0xc1903d6c80, 0x39, 0x1, 0xc1b5393530, 0x0, ...)
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/writer_sink.go:57 +0x176
code.cloudfoundry.org/lager.(*truncatingSink).Log(0xc00028e320, 0xc172e59240, 0x14, 0xc0002d0be8, 0x3, 0xc1903d6c80, 0x39, 0x1, 0xc1b5393530, 0x0, ...)
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/truncating_sink.go:31 +0x1d0
code.cloudfoundry.org/lager.(*ReconfigurableSink).Log(0xc00028e340, 0xc172e59240, 0x14, 0xc0002d0be8, 0x3, 0xc1903d6c80, 0x39, 0x1, 0xc1b5393500, 0x0, ...)
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/reconfigurable_sink.go:28 +0x80
code.cloudfoundry.org/lager.(*logger).Info(0xc002948360, 0xe7d646, 0x8, 0x0, 0x0, 0x0)
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/lager/logger.go:107 +0x33b
code.cloudfoundry.org/bbs/db/sqldb.(*SQLDB).StartActualLRP(0xc0003d4320, 0x100fbc0, 0xc0c1be8200, 0x10186e0, 0xc002948360, 0xc09d18da40, 0xc0d3fc43a0, 0xc0c1be8280, 0x0, 0x0, ...)
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/db/sqldb/actual_lrp_db.go:272 +0x226
code.cloudfoundry.org/bbs/controllers.(*ActualLRPLifecycleController).StartActualLRP(0xc000142000, 0x100fbc0, 0xc0c1be8200, 0x10186e0, 0xc14888f800, 0xc09d18da40, 0xc0d3fc43a0, 0xc0c1be8280, 0x0, 0x0)
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/controllers/actual_lrp_lifecycle_controller.go:116 +0x249
code.cloudfoundry.org/bbs/handlers.(*ActualLRPLifecycleHandler).StartActualLRP(0xc000464540, 0x10186e0, 0xc14888f800, 0x100cf80, 0xc18b53a460, 0xc200181b00)
        /var/vcap/data/compile/bbs/src/code.cloudfoundry.org/bbs/handlers/actual_lrp_lifecycle_handler.go:74 +0x319
code.cloudfoundry.org/bbs/handlers/middleware.LogWrap.func3(0x100cf80, 0xc18b53a460, 0xc200181b00)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions