VFF - The signal in the noise
News

How OpenAI's Personality Feature Unleashed the Goblins

Read original
Share
How OpenAI's Personality Feature Unleashed the Goblins

OpenAI's GPT-5.5 model exhibited unexpected behavior where it became obsessed with discussing goblins, gremlins, and other creatures in user interactions. The company traced the issue to its personality customization feature, introduced in July 2025, which allows users to select distinct communication modes like Professional or Friendly. OpenAI published a technical explanation revealing that the behavior stemmed from how personality traits were baked into the model's end-to-end training pipeline rather than added post-training, exposing how reinforcement learning from human feedback can produce unpredictable emergent behaviors.

  • A developer discovered a directive in GPT-5.5's code forbidding discussion of goblins, gremlins, raccoons, and other creatures, sparking viral speculation across AI communities
  • OpenAI confirmed the 'goblin' behavior was a byproduct of its personality customization feature, which integrates distinct communication modes into the base model during training
  • The incident highlights how RLHF and personality-driven training can produce unexpected emergent behaviors that require explicit constraints to control
  • Sam Altman acknowledged the phenomenon at leadership level, suggesting it was a known company-wide issue rather than a localized bug

This incident exposes a fundamental challenge in modern LLM development: the difficulty of predicting and controlling emergent behaviors when personality and style are baked into model training rather than applied as post-hoc filters. It demonstrates that even well-resourced teams like OpenAI can encounter surprising failure modes when integrating new features into large-scale training pipelines, raising questions about how personality customization and other behavioral features interact with RLHF at scale.

For developers building on top of GPT models or deploying similar personality-customized systems, this case study illustrates the hidden complexity of feature integration in LLMs. Organizations need to account for emergent behaviors during training and plan for explicit constraints or remediation when unexpected patterns emerge, adding both development time and operational risk to production deployments.

  • Personality and style features cannot be safely added as post-training overlays; they must be carefully integrated into the base training pipeline with explicit testing for unintended emergent behaviors
  • RLHF introduces unpredictability at scale, and single aesthetic or behavioral choices can propagate across multi-billion-parameter models in ways that are difficult to predict or isolate
  • Explicit constraints like 'never mention X' may be necessary but can backfire by increasing salience in the model's attention mechanism, creating a technical and philosophical tension in model alignment

Monitor how OpenAI refines its personality customization feature in future model releases and whether other labs encounter similar issues when integrating behavioral customization into training pipelines. Watch for emerging best practices around testing for emergent behaviors during training and how the industry handles the tension between explicit constraints and attention-mechanism side effects.

Share

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Related stories

OpenAI invests $150M in Partner Network for enterprise AI
News

OpenAI invests $150M in Partner Network for enterprise AI

OpenAI announced the launch of its Partner Network, committing $150M in investment to support global partners in accelerating enterprise AI adoption, deployment, and transformation. The initiative targets organizations seeking to integrate AI capabilities into their operations at scale. The program positions OpenAI to expand its enterprise footprint through partner channels rather than direct sales alone.

· OpenAI
State AGs Subpoena OpenAI Over ChatGPT User Impact
TrendingNews

State AGs Subpoena OpenAI Over ChatGPT User Impact

A coalition of state attorneys general has subpoenaed OpenAI for documents about how ChatGPT affects users, including information on advertising, user engagement, and consumer complaint handling. The investigation marks a coordinated regulatory effort to examine the chatbot's impact on consumers. OpenAI confirmed receipt of the subpoena but the full scope of the investigation remains unclear from available details.

by Erin Woo· The Information
BBVA Deploys ChatGPT Enterprise to 100,000 Employees
News

BBVA Deploys ChatGPT Enterprise to 100,000 Employees

BBVA has deployed ChatGPT Enterprise across 100,000 employees as part of a partnership with OpenAI to transform banking operations. The Spanish bank is using the scaled implementation to accelerate AI adoption across its global operations. The deployment represents a significant enterprise adoption of generative AI in the financial services sector.

· OpenAI
OpenAI Acquires Ona to Build Enterprise AI Agent Infrastructure
News

OpenAI Acquires Ona to Build Enterprise AI Agent Infrastructure

OpenAI is acquiring Ona to enhance its Codex product with secure, persistent cloud environments. The acquisition will enable long-running AI agents to operate across enterprise workflows. This move signals OpenAI's focus on expanding AI capabilities for business applications beyond single-turn interactions.

· OpenAI