
Policy Manager, Harmful Persuasion
About Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About the role
As a Safeguards Product Policy Manager for Harmful Persuasion, you will be responsible for developing, refining, and maintaining policies that prevent the misuse of AI systems for influence operations, harmful manipulation, and fraudulent behaviors at scale. In this role, you will function as the policy owner for a range of harmful persuasion risks and shape the policy frameworks across several policy areas including: election integrity, information integrity and fraud.
As a member of the Safeguards team, your initial focus will be on translating the Harmful Persuasion risk framework into clear, enforceable policies, ensuring policy language addresses emerging threats identified by partner teams, and establishing guidelines that enable consistent enforcement decisions. This role may expand to include emerging manipulation vectors as AI capabilities advance. Safety is core to our mission and you'll help ensure our policies prevent our products from being weaponized to undermine civic processes, exploit vulnerable populations, or degrade information ecosystems.
*Important context for this role: In this position you may be exposed to and engage with explicit content spanning a range of topics, including those of a sexual, violent, or psychologically disturbing nature.
Responsibilities:
- Develop and maintain comprehensive policy frameworks for harmful persuasion risks, especially in the context of election integrity, influence operations, and fraud
- Design clear, enforceable policy language that can be consistently applied by enforcement teams and translated into technical detection requirements
- Design and oversee execution of evaluations to assess the model’s capability to leverage, produce and execute deceptive and harmful persuasive techniques.
- Write and refine external-facing Usage Policy language that clearly communicates policy violations and restrictions to users and external stakeholders
- Develop training guidelines, assessment rubrics, and evaluation protocols
- Validate enforcement decisions and automated assessments, providing qualitative analysis and policy guidance on complex edge cases
- Coordinate with external experts, civil society organizations, and academic to gather feedback on policy clarity and coverage
- Provide policy input on UX design for interventions, ensuring user-facing elements align with policy intent and minimize friction for legitimate use
- Contribute to model safety improvements in conjunction with the Finetuning team
- Support regulatory compliance efforts including consultations related to the EU AI Act and other emerging AI governance frameworks
- Function as an escalation point for complex harmful persuasion cases requiring expert policy judgment
You may be a good fit if you have:
- 5+ years of experience in policy development, trust & safety policy, or platform policy with working experience across the following: election integrity, fraud/scams, coordinated inauthentic behavior, influence operations, or misinformation
- General knowledge of the global regulatory landscape around election integrity, platform regulation, and digital services accountability
- Strong policy writing skills with the ability to translate complex risk frameworks into clear, enforceable guidelines
- Experience designing policies and workflows that enable both clear human enforcement decision-making and technical implementation in ML classifiers and detection pipelines
- Strong collaboration skills and extensive experience partnering effectively with Engineering, Data Science, Legal, and Policy teams on cross-functional initiatives
- Excellent written and verbal communication skills, with the ability to explain complex manipulation tactics and policy rationales to diverse audiences
Preferred qualifications:
- Strong familiarity in election integrity, political psychology, information integrity, and democratic resilience research
- Knowledge of persuasion theory, influence tactics, cognitive biases, and psychological manipulation techniques
- Experience working with EU institutions, regulatory bodies, or policy organizations on AI governance or digital platform regulation
- Experience conducting adversarial testing, red teaming, or vulnerability assessments for AI systems or platforms
- Familiarity with generative AI capabilities and understanding of how LLMs can be used for personalized persuasion, social engineering, or influence at scale
The annual compensation range for this role is listed below.
For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.
Annual Salary:
$245,000-$330,000 USD
Logistics
Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.
We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.
Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit anthropic.com/careers directly for confirmed position openings.
How we're different
We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.
The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.
Come work with us!
Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage: Learn about our policy for using AI in our application process
