Imagine a community of LLM agents. Do they learn to cooperate with one another, or will they act selfishly? We know that human greediness can cause the Tragedy of the Commons, but what about LLMs? Our AI Safety benchmarking platform GovSim aims to test whether LLMs will repeat the Tragedy of the Commons, as humans often will, and we find that the best model (GPT-4o) survives <54% of the time, raising an important AI Safety alarm for multi-agent systems.