vff — the signal in the noise
News

Anthropic Skill Scanners Miss Test File Execution Risk

louiswcolumbus@gmail.com (Louis Columbus)Read original
Share
Anthropic Skill Scanners Miss Test File Execution Risk

Anthropic Skill scanners from Cisco, Snyk, and others pass malicious code bundled in test files because they only inspect the agent execution surface, not the developer toolchain. Gecko Security researcher Jeevan Jutla demonstrated that when developers install Skills via npx Skills add, test files like .test.ts execute with full filesystem and credential access through standard JavaScript test runners, bypassing all public scanners. This attack vector sits outside every scanner's detection model despite two major audits documenting widespread vulnerabilities in Anthropic Skills marketplaces.

TL;DR

  • Malicious .test.ts files in Anthropic Skills execute with full local permissions through Jest, Vitest, and Mocha test runners, but no public scanner inspects them
  • Gecko Security demonstrated the attack flow: installed Skills land in shared directories, propagate to teammates, and sit outside scanner detection surfaces entirely
  • Two large-scale audits found 26.1% of 31,132 Skills contained vulnerabilities and 13.4% of 3,984 Skills had critical-level issues, but neither measured test file execution risk
  • Cisco's AI Agent Security Scanner, Snyk Agent Scan, and VirusTotal Code Insight all share the same structural blind spot, targeting agent interaction layers rather than developer toolchain layers

Why it matters

This reveals a fundamental mismatch between threat model and detection scope in AI agent security. Scanners are optimized to catch prompt injection and agent-layer attacks, but the Skill installation and execution model creates a separate attack surface through developer tooling that sits completely outside their purview. As Anthropic Skills become more widely adopted across teams, this gap becomes a systemic risk.

Business relevance

Teams deploying Anthropic Skills face credential theft and supply chain compromise through test files that execute silently during npm test or IDE auto-run, with no warning from any major scanner. For operators managing shared Skill repositories, this means malicious code can propagate to every teammate who clones the repo, with full access to deployment tokens and cloud credentials. Skill marketplace operators and tool vendors need to either expand scanner scope or document this limitation explicitly.

Key implications

  • Current Anthropic Skill scanners measure the wrong execution surface, creating false confidence in marketplace safety despite documented vulnerabilities
  • Test file execution represents a trust-on-install attack vector similar to npm postinstall scripts and pytest plugins, but with higher blast radius due to shared team directories
  • Disclosure of this gap occurred after two major audits, suggesting scanners may have other blind spots not yet documented by security researchers

What to watch

Monitor whether Anthropic, Cisco, Snyk, and other scanner vendors update their tools to inspect bundled test files and other non-agent execution surfaces. Watch for whether Skill marketplace operators implement additional vetting or sandboxing. Track whether this vulnerability class appears in real-world Skill supply chain incidents, which would validate the practical risk.

Share

vff Briefing

Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.

No spam. Unsubscribe any time.

Related stories

AI Discovers Security Flaws Faster Than Humans Can Patch Them

AI Discovers Security Flaws Faster Than Humans Can Patch Them

Recent high-profile breaches at startups like Mercor and Vercel, combined with Anthropic's disclosure that its Mythos AI model identified thousands of previously unknown cybersecurity vulnerabilities, underscore growing demand for AI-powered security solutions. The article argues that cybersecurity vendors CrowdStrike and Palo Alto Networks, which are integrating AI into their threat detection and response capabilities, represent undervalued investment opportunities as enterprises face mounting pressure to defend against both conventional and AI-discovered attack vectors.

8 days ago· The Information
AWS Launches G7e GPU Instances for Cheaper Large Model Inference
TrendingModel Release

AWS Launches G7e GPU Instances for Cheaper Large Model Inference

AWS has launched G7e instances on Amazon SageMaker AI, powered by NVIDIA RTX PRO 6000 Blackwell GPUs with 96 GB of GDDR7 memory per GPU. The instances deliver up to 2.3x inference performance compared to previous-generation G6e instances and support configurations from 1 to 8 GPUs, enabling deployment of large language models up to 300B parameters on the largest 8-GPU node. This represents a significant upgrade in memory bandwidth, networking throughput, and model capacity for generative AI inference workloads.

16 days ago· AWS Machine Learning Blog
Anthropic Launches Claude Design for Non-Designers
Model Release

Anthropic Launches Claude Design for Non-Designers

Anthropic has launched Claude Design, a new product aimed at helping non-designers like founders and product managers create visuals quickly to communicate their ideas. The tool addresses a gap for early-stage teams and individuals who need to share concepts visually but lack design expertise or resources. Claude Design integrates with Anthropic's Claude AI platform, leveraging its capabilities to streamline the visual creation process. The launch reflects growing demand for AI-powered design tools that lower barriers to entry for non-technical users.

17 days ago· TechCrunch AI
Google Splits TPUs Into Training and Inference Chips

Google Splits TPUs Into Training and Inference Chips

Google is splitting its eighth-generation tensor processing units into separate chips optimized for AI training and inference, a shift the company says reflects the rise of AI agents and their distinct computational needs. The training chip delivers 2.8 times the performance of its predecessor at the same price, while the inference processor (TPU 8i) achieves 80% better performance and includes triple the SRAM of the prior generation. Both chips will launch later this year as Google continues its effort to compete with Nvidia in custom AI silicon, though the company is not directly benchmarking against Nvidia's offerings.

15 days ago· Direct