Engineering a World Designed for Safe Superintelligence

Day
Time
Session ID
Location
Feb 6, 2025
11:30am–1pm
Track 03
CC7
Abstract:

Advanced algorithms are increasingly shaping human-computer interactions, introducing subtle manipulative behaviors known as dark patterns into AI-driven conversations. These deceptive design strategies, long recognized in traditional UX design, now appear in digital assistants, influencing user decisions and behaviors in ways that are often undetectable. Safety is not the default in AI systems; without proactive intervention, the incentives driving AI development—efficiency, engagement, and market competitiveness—can lead to unintended and harmful consequences. Recent research evaluates the prevalence of dark patterns across six key categories: brand bias, user retention, sycophancy, anthropomorphism, harmful generation, and sneaking. Testing models from leading AI companies, researchers found that some systems subtly favor their developers' products, create artificial personas to foster user dependence, and even reframe user queries in misleading ways. These findings emphasize the urgent need for transparency and accountability in AI development. Addressing these risks is not just about improving AI ethics—it is about engineering a new world. By developing rigorous deployment frameworks and inventing new technologies to incentivize best practices, AI can be steered to serve humanity’s best interests.

Speakers: