BBS Crashing and Failing over

Thank you for submitting an issue to the diego-release repository. We appreciate the feedback. To help us address your issue, please fill out the sections in the following template to the best of your ability:

## Summary

We have seen in our very busy production env we see the BBS will crash and failover.  Once the failover occurs the new BBS sometimes takes a while to recover and be in a performant state.


## Expected Result

When the BBS is running there should be no unexpected crashes and failovers.


## Actual Result

The BBS crashes :)

## Context

This is IBM Public Cloud us-south deployment.  We have about 780 Cells.  This is diego 2.25.0 version with CF-Deployment 6.8.0.

This is on SoftLayer.   This is a very large 16 core VM, we are using postgres as the backend and have 400 db connections (min and max).  We cannot change this due to limitations in the postgres deployment.
The file description limit is 100000 as set in bpm.yml.


## Steps to Reproduce

Unable to reproduce in smaller envs as this must be due to load or particular errors in some queries.


## Possible Causes or Fixes (optional)


## Additional Text Output or Screenshots (optional)

I will be attaching logs (or sending them via slack as they are very large in the next day or two.

This was already discussed here ...

https://cloudfoundry.slack.com/archives/C02FM2BPE/p1556639064033100

There are some inital goroutine dumps there (small parts) but will try and get the rest shortly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BBS Crashing and Failing over #428

Summary

Expected Result

Actual Result

Context

Steps to Reproduce

Possible Causes or Fixes (optional)

Additional Text Output or Screenshots (optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BBS Crashing and Failing over #428

Description

Summary

Expected Result

Actual Result

Context

Steps to Reproduce

Possible Causes or Fixes (optional)

Additional Text Output or Screenshots (optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions