What do I do if a RabbitMQ instance dies?
It depends how and why it died, of course. Please differentiate between a dead machine and a partitioned network. One quick fix for a truly dead node may be to get a backup machine, reinstall the OS and Rabbit and then just restart Rabbit with the contents of the old mnesia data directory (if the disk is still ok, then you could just try slotting it in the new machine). Make sure that the backup machine has the same name as the machine that died. If this works, you are in luck. If not, i.e. mnesia does not seem to be recovering itself (it hangs will the waiting_for_tables error message), then what you can try is to nuke the mnesia directory and bring this node as part of the cluster and let it replicate itself from the other cluster members. Note that this will not restart queue processes that were running on this node before it crashed. But you can just re-declare the queues.