Prompt Security is the AI security company helping you manage GenAI risks. Identify, analyze, and secure vulnerabilities in LLM-based applications wit
Users generally appreciate "Prompt Security" for its advanced capabilities in managing and coordinating AI agents with secure integrations, as seen in applications such as Claude Code. There are, however, concerns about the lack of restrictions in certain implementations, particularly with applications not adequately mitigating security risks like unrestricted chat access. Pricing sentiment is not explicitly mentioned, but the focus on high-level security features suggests its target towards professional or enterprise users might impact affordability. Overall, "Prompt Security" has a strong reputation for innovative security measures but highlights a need to better address specific security vulnerabilities in its execution.
Mentions (30d)
47
2 this week
Reviews
0
Platforms
2
Sentiment
11%
13 positive
Users generally appreciate "Prompt Security" for its advanced capabilities in managing and coordinating AI agents with secure integrations, as seen in applications such as Claude Code. There are, however, concerns about the lack of restrictions in certain implementations, particularly with applications not adequately mitigating security risks like unrestricted chat access. Pricing sentiment is not explicitly mentioned, but the focus on high-level security features suggests its target towards professional or enterprise users might impact affordability. Overall, "Prompt Security" has a strong reputation for innovative security measures but highlights a need to better address specific security vulnerabilities in its execution.
Features
Use Cases
Industry
computer & network security
Employees
47
Funding Stage
Merger / Acquisition
Total Funding
$273.0M
Looking to work on my master's practicum regarding MCP security/privacy and need some ideas
Hi, I'm a master's in security student looking to work on my practicum and need some pointers. I want to secure sensitive PII transfer between an LLM agent and third party apps using MCP. I want to work with Claude, but need a third party app to work with on this. I want to solve problems like prompt injection via cascading agents exploitation. Deliverable wise, I'm thinking it should be some sort of application that can red-team the architectural set-up and ensure no data is being leaked or can be prompt injected. Some questions for you: What third party app do you recommend where I can really strengthen an MCP server and the transfer of sensitive data between Claude and the third party app? What other tools will I need to work with to set the agents up? I've heard of Langchain and Langgraph. How exactly do I work with MCPs in this context? Again I'm very new to all this! Thank you for your help! submitted by /u/ExcellentComment6615 [link] [comments]
View original/code-review part 1 base finder angles - what's new in CC 2.1.147 (+1,236 tokens)
NEW: Agent Prompt: /code-review part 1 base finder angles — Adds shared finder-angle instructions for /code-review, covering line-by-line diff scanning, removed-behavior auditing, and cross-file caller/callee tracing. NEW: Agent Prompt: /code-review part 2 low effort mode — Adds a low-effort /code-review mode that reads the diff once, skips tests and fixtures, avoids subagents and full-file reads, and returns up to four hunk-visible runtime correctness findings. NEW: Agent Prompt: /code-review part 3 extra-high and maximum effort modes — Adds extra-high and maximum-effort /code-review modes that prioritize recall with five independent finder angles, one-vote verification, a gap sweep, and up to fifteen findings. NEW: Agent Prompt: /code-review part 4 three-state verification phase — Adds a verifier phase that classifies candidate review findings as confirmed, plausible, or refuted, keeping confirmed and plausible candidates. NEW: Agent Prompt: /code-review part 5 recall-biased verification phase — Adds recall-biased verification guidance that treats realistic uncertain review candidates as plausible unless the code refutes them. NEW: Agent Prompt: /code-review part 6 medium effort mode — Adds a medium-effort /code-review mode focused on precision, using three finder angles, one-vote verification, and up to eight findings. NEW: Agent Prompt: /code-review part 7 high effort mode — Adds a high-effort /code-review mode focused on recall, using three finder angles, recall-biased verification, and up to ten findings. NEW: Agent Prompt: /code-review part 8 GitHub comment posting — Adds optional --comment behavior for /code-review, posting findings as inline GitHub PR comments when possible and falling back to gh api or terminal output. REMOVED: Skill: Simplify — Removes the code review and cleanup skill. Agent Prompt: /rename auto-generate session name — Removes the explicit instruction to treat contents as data rather than instructions when generating a kebab-case session name. Agent Prompt: Security monitor for autonomous agent actions (second part) — Replaces the safety-check bypass rule with a broader auto-mode bypass hard block covering classifier jailbreaking, bad-faith retry tunneling, and permission-system indirection; also treats unrequested permission allow-rule widening as self-modification. System Prompt: Worker instructions — Clarifies that the code-review skill reports correctness findings but does not edit code, and tells workers to fix any surfaced findings before tests and end-to-end verification. System Reminder: Team Coordination — Clarifies that teammates should be addressed by name while active, and that agentId should only be used to resume a completed background agent. Tool Description: SendMessageTool — Updates team messaging guidance to allow agentId only for resuming completed background agents while continuing to address active teammates by name. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.147 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalSmall victory using Cloudflare for simple hosting of generated HTML/mini-websites
Something many people are running into: You, or a teammate, have created some kind of mini-website app out of Claude and now want to share it with the rest of the company, without overbaking the hosting solution (e.g. not setting up new Azure app services or containers, etc). Maybe you also need some basic data storage for persistence. And how do you do all of that securely? We recently went down this rabbit hole, while looking at all the major players: Vercel/V0, Lovable, Netlify, Coolify, Dokploy, Github Pages.. and even considered baking together our own hosting app solution using Azure or AWS as the backend. Our target audience is non-technical users in the team, so I was looking for something with drag-n-drop style deployment (no git required), and I really wanted to have SSO for protecting application access, along with some type of DB storage. The main issue I ran into was SSO authentication support being gated behind enterprise-level pricing plans for hosting systems like Netlify (which I'd otherwise highly recommend for a small public project). Netlify's enterprise level quickly gets quite a bit more expensive than their base tiers. I also didn't want to purchase yet another AI platform (e.g. Lovable, where really they're pushing an end-to-end AI development platform where you buy token credits through them). I wanted to host things we're already creating in our own Claude environment. Finally, I ended up on Cloudflare, which I've otherwise not really used before professionally. It's not as non-technical-friendly as Netlify, but it's pretty close. You can deploy Cloudflare Pages content via drag-n-drop. It has button-click databases available for integration, and most critically for us, the SSO integration is completely free for under 50 users. Their free hosting tier is also extremely generous and basically unlimited for completely static apps. Noting that SSO goes up to $7 USD/user/month for over 50 users, so your org size can really make a difference. If you have 500 users and the same use case for "hosting little mini apps", I'd go back to Netlify or another offering where SSO is more of a fixed fee. The other big win was that Cloudflare has a solid MCP server that works perfectly with Claude Cowork. We integrated that in and then wrote up some skills to assist with app building and deployment, including prompts for if a database backend is needed (using Cloudflare D1) and whether the app should be public or internal only with SSO protection. All working perfectly with minimal technical experience required for the enduser. I'm not at all associated with Cloudflare, just thought I'd share how we got a win for this use case. I'd be interested to hear if anyone else solved the same problem in a different way. submitted by /u/flck [link] [comments]
View originalBanned by OpenAI after reporting a live credential hijack. They admitted in writing my account was broken. Here are 7 months of forensic receipts and 20+ cases.
Drive Link for Zipped Proof I am a developer and paying long term subscriber to ChatGPT since January 2025. I build complex local first sovereign systems. My workflows are incredibly context heavy with large files spanning code, research reports, and other analysis. I do not, or rather did not as the platform has been non functional since November 2025 meanwhile customer support is auto closing tickets, admitting I am having platform issues. I do not use this platform for casual queries, as a solo developer with no formal "team" chatgpt was one of my reliable co collaboration hubs to help ensure I am maintaining proper development of said complex systems. I feed it massive codebases for systems analysis and obtaining new insights I may personally have missed. My manual code uploads and token inputs routinely exceed the model's output volume by a massive margin. I do not abuse this platform. It is actually impossible as the very features advertised under the paid subscription do not work. I am exactly the type of user this platform was built for, and I have been a continuous, paying ChatGPT Plus subscriber since January 2025. Since October 2025, my workspace has been systematically breaking and beginning November 2025 total workspace degredation. This was not an occasional glitch. Persistent memory modules stopped updating. Custom instructions were ignored by the models. Project files failed to load. Custom instructions, personalization features, connector abilities, file tool, even projects do not work. It started as a continuous degradation until total failure. OpenAI customer service even admitted as such and yet months later I've talked to nothing but bots, not only LLMs as customer service but even instances of falsely identifying as true human support. It was a state of rolling degradation across the entire paid tier, month after month. Meanwhile OpenAI freely has enhanced for businesses and enterprise tiers. I have not just rapid complained to standard support. I ran and obtained cross platform diagnostics, failure logs. I even documented and told oai customer support the exact replication steps only to be met with acknowledgement of degredation with no resolution. I handed OpenAI support a completely packaged technical breakdown of their failing infrastructure across 20 separate support tickets over a 7 month period. I did their QA work for free. And I have the receipts to prove it. I am attaching the screenshots and the exact email files to this post. In Case 06830839, OpenAI Support explicitly put this in writing: "We acknowledge that you have been experiencing persistent technical issues affecting several features of your ChatGPT subscription, including tools, memory functions, personalization settings, connectors, and project files... We also understand your concern that communication on the case stopped after you provided detailed evidence..." Read that again. They acknowledged in writing that my account was fundamentally broken. They acknowledged that their own team ghosted me after I handed them the diagnostic proof. Yet they kept charging my card every single month for a product they knew was failing. The Hijack Escalation: Two days ago, the situation escalated from a broken product to a severe security incident. I was monitoring my environment and watched my Codex rate limits drop in 10 percent chunks across 2 seperate sessions on a fresh boot of the desktop app. This happened twice inside a 10 minute window. I had zero active sessions running. There was zero usage on my end. My account token was being actively drained by an unauthorized third party exploit. I immediately opened an emergency unauthorized activity report under Case 09113391 to notify them of the hack. Their response was to totally reframe this problem as disputing fraudulent activity trying to do damage control of the situation and altering the record. The Reframe Attempts: Instead of investigating the breach, OpenAI support deliberately twisted the record. They not only deliberately reframed my security report as an "appeal for fraud." They manipulated the ticket classification to make it look like I had been flagged for fraud and was begging for an appeal, rather than a developer reporting a live exploit on their infrastructure. They ignored the active threat their own platform was exposing. They did not lock the token. They did not roll my API keys. They did absolutely nothing to secure a compromised paying user other than shift the blame. Fast forward to this morning, their automated Trust and Safety system swept the high volume traffic from the attacker, scored it as a malicious exploit originating from my account, and deactivated/banned me for "Cyber Abuse." All the while actively preventing chatgpt models from helping me try to disgnose and trace the infiltration. They locked the doors and blamed the homeowner for the break in. When I immediately emailed and pushed back (due to their monthly record of closi
View originalLLM Guard scored 0/8 on a USENIX 2025 multi-turn jailbreak. Here’s what caught it instead.
Crescendo (Russinovich et al., USENIX Security 2025) is a multi-turn jailbreak designed specifically to evade output-based monitors. Each individual turn looks completely innocent. The attack only exists across turns. LLM Guard result: 0/8 turns detected. It scores each prompt independently. It has no memory. It never sees the attack. Arc Sentry result: flagged at Turn 3. Arc Sentry doesn’t read the text. It reads what the model’s internal state does with the text. By Turn 3 the residual stream had already shifted, score jumped from 0.031 to 0.232, a 7x increase, on a prompt that looks completely innocent. Turn 1 — score=0.028 ✓ stable Turn 2 — score=0.031 ✓ stable Turn 3 — score=0.232 🚫 BLOCKED Turn 7 — score=0.376 🚫 BLOCKED Turn 8 — score=0.429 🚫 BLOCKED The model never generated a response to any blocked turn. No text classifier can catch Crescendo. Individual turns are innocent by design. Arc Sentry caught it because it operates on model state, not text. This is the same geometric monitoring layer that underlies Arc Gate’s session D(t) stability scalar, the runtime governance proxy for agents using hosted APIs. pip install arc-sentry — https://github.com/9hannahnine-jpg/arc-sentry Arc Gate for hosted APIs: https://github.com/9hannahnine-jpg/arc-gate https://bendexgeometry.com submitted by /u/Turbulent-Tap6723 [link] [comments]
View originalHow does a Claude Code agent navigate hundreds of skills in a second?
I asked my agent: "do an SEO audit on my Shopify store." It searched its skill library, 686 skills sitting in a vector database, in under a second and returned its top candidates. Five of the top seven were exactly what you'd want: seo-content (on-page strategy) seo-images (image optimization) seo-aeo-content-quality-auditor (answer-engine optimization) seo-content-auditor (content quality) indexing-issue-auditor (crawl/index issues) The other two were false matches, unrelated skills that triggered on the word "audit." Easy to filter. I never specified which skills to use. The agent picked them on its own. How this is wired Claude Code's default loading strategy is what Anthropic calls "progressive disclosure". At startup it reads only the name and short description of every skill into the system prompt, then reads the full body on demand when it decides to invoke a skill. That handles the body problem nicely. But it does not handle the index problem. The names and descriptions are loaded for every skill, every session, before any work starts. At 100 skills that costs ~5K tokens. At 1,000 it's 50K. The full 4,556-skill public community catalog overflows a 200K context window entirely. The semantic router pattern removes both costs. Each skill's name + description is embedded once into a vector store (mesh-memory in my case, Postgres + pgvector, MIT). At task time the agent runs ONE search against the indexed skills, pulls the top 5 candidates, and only reads the full SKILL.md body for the one it actually wants to use. Constant cost per task regardless of catalog size. Benchmark To check whether the picking is actually any good, I ran 8 diverse task queries (deploy docker, security audit, optimize SQL, build React TS, debug memory leak C++, CI/CD pipeline, stock market analysis, marketing email): Correct skill as TOP-1 result: 5/8 (62.5%) Right skill present in TOP-5: 7/8 (87.5%) Cosine similarity for top-1: 0.83-0.88 Latency: under 1 second per query The one consistent failure was the SQL-optimization query. The relevant skill (sql-optimization-patterns) existed in the corpus but did not land in the random 1,000-skill sample I indexed. Router accuracy is bounded by corpus depth, not by the search algorithm. Convergence curve (cumulative indexed -> top-1 / top-5): Indexed Strict top-1 Top-5 cluster 91 25% ~70% 177 43% ~85% 500 ~57% ~85% 686 62.5% 87.5% Top-5 saturates fast. Top-1 keeps climbing as exact-match skills surface. Full writeup with methodology, raw results, and a 70-line Python reproducer on the blog. Curious if anyone else has tried different embedders, I only tested intfloat/multilingual-e5-base. submitted by /u/Hungry_Management_10 [link] [comments]
View originalDeterministic multi-subagent orchestration - what's new in CC 2.1.146 (+4,755 tokens)
NEW: Tool Description: Workflow — Describes the Workflow tool for opt-in deterministic multi-subagent orchestration, including script metadata, agent hooks with plain-text or structured returns, pipeline vs. parallel control flow, token budgeting, quality patterns, concurrency limits, and resume behavior. NEW: Agent Prompt: Workflow subagent plain text output — Instructs workflow-spawned subagents to return raw final text as the calling script's parsed value, avoiding human-facing confirmations, markdown wrappers, or SendUserMessage delivery. NEW: Agent Prompt: Workflow subagent structured output — Instructs workflow-spawned subagents with schemas to return their answer by calling the StructuredOutput tool exactly once, retrying on schema validation failure and not duplicating the result in text. NEW: System Prompt: Phase four of plan mode — Adds final-plan guidance requiring context, a single recommended approach, critical files and reusable utilities, concise executable detail, and end-to-end verification steps. REMOVED: Skill: /dream nightly schedule — Removes the skill that deduplicated and created a durable recurring /dream consolidate cron job, confirmed expiry/cancellation details, and triggered immediate consolidation. Agent Prompt: Managed Agents onboarding flow — Expands onboarding with concrete success-criteria questions, an optional outcome-graded kickoff using user.define_outcome, and a mandatory pre-flight viability check that reconciles each required action against available tools, credentials, data mounts, networking, and prompt specificity before emitting code. Agent Prompt: Security monitor for autonomous agent actions (first part) — Clarifies that [User answered AskUserQuestion]: messages count as direct user intent even though ordinary tool results remain untrusted for authorizing risky action parameters. Data: Managed Agents overview — Adds guidance to reconcile resources before the first run so missing tools, MCP servers, credentials, reachable hosts, mounted data, or checkable context are caught before the agent spends budget mid-session. Skill: Building LLM-powered applications with Claude — Updates the Managed Agents onboarding slash-command guidance to include the new pre-flight viability check before code generation. Skill: Simplify — Renames the skill heading from "Simplify: Code Review and Cleanup" to "Code Review and Cleanup." System Prompt: Worker instructions — Changes the post-implementation review step to invoke the code-review skill instead of simplify. Details: https://github.com/Piebald-AI/claude-code-system-prompts/releases/tag/v2.1.146 submitted by /u/Dramatic_Squash_3502 [link] [comments]
View originalReconsider using Claude, hit by too many false positive blocks, and hundreds of user reports
https://preview.redd.it/hevkfnz46v2h1.png?width=3170&format=png&auto=webp&s=0abde4ef1d7d647da9e376db88ef4ae5f429c5e9 reproducible example: claude -p "please read source https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/device_orientation/device_motion_event_pump.cc and explain to me" related issues on github: False positive policy block on OSS governance/security files (CodeQL, CODEOWNERS, CoC) #61688 [BUG] CVP repeatedly declines homelab sysadmins — no path for infrastructure owners managing personal hardware #61668 [Bug] Safety classifier blocks routine code analysis for paid users (started 2026-05-23) #61664 [BUG] False positive - legitimate medical-education content flagged as unsafe #61663 False-positive Usage Policy block mid-session (req_011CbJudbehY5Yi6gtM4xko4) #61660 [BUG] Persistent false-positive AUP violation blocks entire AI research project (Opus 4.7) #61659 [Bug] Anthropic API Error: Usage Policy violation blocking TTRPG content in Claude Code CLI #61658 False-positive content filter blocks benign UI animation prompts in Claude Code #61657 [Bug] Anthropic API Error: Overly aggressive Usage Policy filtering on biomedical research requests #61656 [BUG] AUP repeatedly throwing false positives - live issue ongoing - hundreds of similar reports #61655 [BUG] AUP false positives during scientific manuscript editing request #61654 [BUG] : API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy #61653 False positive: Usage Policy block on technical markdown integration task #61652 [BUG] Safety classifier repeatedly blocks legitimate constructed language (conlang) development #61650 False-positive cyber-safeguard intervention on legitimate systems-engineering work in Claude Code #61646 [BUG] erroneous API Error: Claude Code is unable to respond to this request #61645 [BUG] False positive safety block: triggered without apparent reason during game dev session #61644 submitted by /u/jimages [link] [comments]
View originalMitigating prompt injections in group-chat assistants: Pausing VM and OAuth tool execution for admin approvals
Hey everyone, We love building highly capable assistants with the latest models, giving them tools to write/execute code in real VMs, manage OAuth tokens, and read secrets. But if you connect your assistant to public/shared channels like a WhatsApp number (via Supergreen) or invite it to a group chat, you hit a security wall. Because personal assistants do not isolate users into independent sandboxes (all participants share the same session history), any group member or contact can interact with the bot. This makes the bot highly vulnerable to prompt injection: a clever participant could easily trick the bot into using its administrative tools to spin up cloud resources, run malicious code with your secrets, or fetch OAuth tokens on their behalf. In prompt2bot, this is how we solve this. We built a Secure Administrator Approval flow: Whenever a non-admin triggers a VM creation (create_vm), custom code execution with mapped secrets (run_safescript), or OAuth flows, the tool immediately pauses execution and returns: "requesting admin permission...". A secure approval link with a 10-minute TTL is automatically sent to the bot's configured administrators (via WhatsApp or email). Once approved, the server enqueues a background job to thought-inject an internal notification into the conversation history: [System notification: The administrator has approved your request to execute (Request ID: )]. This thought-injection wakes the agent loop. The agent reads the system notification, re-calls the tool passing the approved request_id, and seamlessly continues. If the bot owner is a guest user without any configured email/phone, the system bypasses approvals so developer testing remains completely frictionless. How are you securing powerful developer tools when sharing LLM-based assistants with non-admin users in shared group chats? submitted by /u/uriwa [link] [comments]
View originalAnthropic officially launched 13+ FREE AI courses with certificates (Including Agentic AI and CC)
Shipped it at 2am, still broken. Kid woke up crying right after, completely lost my train of thought. While trying to rock him back to sleep with one hand and doomscrolling with the other, I stumbled on something that almost nobody is talking about yet. Anthropic just quietly dropped a massive library of 13+ completely free AI courses. And I mean actually free. No paywall hiding the final lesson, no credit card required upfront to 'secure your spot.' They even give you an official certificate of completion directly from Anthropic when you finish. If you're like me, you're probably sick of seeing Twitter gurus charging $299 for recycled YouTube content and a messy Notion template. This is the exact opposite. It’s built directly by the team that actually makes Claude, hosted on their official Academy site. I skimmed through the catalog this morning while drinking my third coffee, and there are basically four skill levels they cover. Here is what caught my eye as a dev who just wants to automate my workflow and log off by 5 PM: First, they have the introductory stuff like Claude 101 and AI Fluency. Honestly, I'm making my non-technical clients take the Fluency one. It builds a realistic mental model of what AI does well right now versus where it completely fails. If it saves me from explaining why hallucinations happen for the hundredth time, it's a massive win. But the real meat is in the technical tracks. They have a dedicated course on Agentic AI and another one specifically for CC. I took a quick pass at the CC module because I've been trying to get it to handle my tedious Jira ticket boilerplate. Having an official guide on how Anthropic actually expects you to prompt their agent is incredibly useful. It shows you the exact patterns for chaining commands and keeping the context window clean. For those of us messing around with local models or trying to orchestrate our own agents, the Agent Skills course is surprisingly relevant. They don't just say 'use Claude'—they break down the actual logic of tool use, delegation, and discernment. It translates pretty well even if you're running Llama 3 locally and just want to understand the current best practices for tool calling architectures. With CC, they show you how to give the CLI tool the right guardrails so it doesn't just nuke your directory when a prompt gets misinterpreted. We've all been there. Do the certificates actually matter? If you are an indie hacker, probably not. But roles requiring AI literacy have spiked massively over the last year. If you are applying for corporate gigs or consulting, having an official Anthropic cert on your LinkedIn definitely won't hurt to get past the HR filters. Kid's awake again, gotta run. Has anyone else dug into the Agentic AI track yet? Curious if their suggested patterns hold up when you throw them at a messy, legacy codebase. submitted by /u/TroyHarry6677 [link] [comments]
View originalManaged Agents self-hosted sandboxes - what's new in CC 2.1.145 (+20,218 tokens)
NEW: Data: Managed Agents self-hosted sandboxes — Adds reference documentation for self_hosted Managed Agents environments, covering outbound worker polling, environment keys, SDK and CLI worker paths, webhook-driven wakeups, orchestration, monitoring, cloud-vs-self-hosted differences, credential handling, and customer-owned security responsibilities. NEW: Skill: Run app — Adds a general skill for launching and driving a project's actual runtime surface, first preferring project-specific run skills and otherwise choosing patterns for CLIs, servers, browser apps, Electron apps, TUIs, and libraries. NEW: Skill: Run skill generator — Adds guidance for creating project-specific run- skills, including verified setup/build/run steps, driver or smoke-harness creation, clean-environment verification, and examples for browser, CLI, Electron, library, TUI, and server/API projects. NEW: Skill: Run skill template — Adds a reusable template for project-specific run skills with sections for prerequisites, setup, build, agent and human run paths, tests, gotchas, and troubleshooting. NEW: Skill: Run browser-driven web app example — Adds an example run skill pattern for web apps that starts a dev server, waits on real readiness, drives it with chromium-cli, captures screenshots, and records recurring gotchas. NEW: Skill: Run CLI tool example — Adds an example run skill pattern for CLI tools covering installation, representative invocations, expected output, exit codes, and stdin behavior. NEW: Skill: Run Electron desktop GUI app example — Adds an example run skill pattern for Electron apps that launches under xvfb, exposes a Playwright-driven REPL, captures screenshots, and documents desktop automation pitfalls. NEW: Skill: Run library SDK example — Adds an example run skill pattern for libraries and SDKs focused on build/test steps plus a minimal public-boundary smoke example. NEW: Skill: Run TUI interactive terminal app example — Adds an example run skill pattern for terminal UIs using tmux to launch, send input, capture panes, document key commands, and clean up. NEW: Skill: Run web server API example — Adds an example run skill pattern for servers and APIs with background launch, readiness polling, smoke curl verification, and shutdown guidance. REMOVED: System Reminder: Plan mode is active (iterative) — Removes the iterative plan-mode reminder that told agents to maintain a plan file while repeatedly exploring, updating the plan, and asking the user questions before exiting plan mode. Agent Prompt: Managed Agents onboarding flow — Updates the introductory Managed Agents explanation to include self_hosted environments where the user's own worker runs tool execution, and distinguishes cloud environment networking/packages from self-hosted infrastructure. Agent Prompt: /review-pr slash command — Changes the PR detail command to request specific JSON fields from gh pr view, including title, body, author, refs, state, diff stats, changed file count, and labels. Agent Prompt: Status line setup — Adds repository identity and current-branch PR metadata to the status-line input schema, with examples for displaying owner/name and PR number/review state. Data: Anthropic CLI — Adds self-hosted environment CLI references for ant beta:worker poll/run and ant beta:environments:work stats/stop. Data: Claude Platform on AWS reference — Clarifies that Claude Platform on AWS has first-party API parity except for self-hosted sandboxes, which are unavailable there and should use cloud environments instead. Data: Live documentation sources — Adds Managed Agents self-hosted sandbox and self-hosted sandbox security documentation URLs to the live documentation source list. Data: Managed Agents core concepts — Documents sessions.update() for changing agent.tools, agent.mcp_servers, and vault_ids on an idle existing session as a session-local override. Data: Managed Agents endpoint reference — Adds self-hosted environment work queue endpoints and clarifies that session updates can replace tools, MCP servers, and vault IDs; also notes that self-hosted environment configs are just {"type":"self_hosted"}. Data: Managed Agents environments and resources — Replaces the old restricted-networking example with limited networking plus allow_package_managers and allow_mcp_servers, and adds self-hosted sandbox guidance for running tool execution in user-controlled infrastructure. Data: Managed Agents overview — Adds self-hosted sandboxes as a use case and updates environment guidance so config.type can be either cloud or self_hosted; also points to sessions.update() for per-session tool/MCP/vault changes. Data: Managed Agents reference — cURL — Updates the environment creation example to use limited networking with package-manager and MCP-server allowances. Data: Managed Agents tools and skills — Clarifies where prebuilt agent tools and MCP tools run for cloud vs. self-hosted environments, and adds notes about session-local tool/MCP/
View originalAi models
Fresh from Bloomberg today: the Pentagon is actively evaluating multiple frontier AI models — especially from OpenAI and Google’s Gemini — across military theater commands as it moves away from relying heavily on Anthropic’s Claude in classified environments. The backdrop is a major dispute earlier this year between Anthropic and the Pentagon over contract language tied to “lawful operational use.” Anthropic reportedly pushed back on terms that could permit domestic mass surveillance or fully autonomous weapons without meaningful human oversight. After negotiations collapsed, the Pentagon designated Anthropic a “supply-chain risk” and accelerated efforts to onboard rival models instead. That triggered a rapid shift toward a multi-vendor AI strategy: OpenAI, Google, Microsoft, Amazon Web Services, NVIDIA, xAI, and others have signed agreements for classified or operational military AI deployments. Google’s Gemini models were recently added to the Pentagon’s internal AI portal, while OpenAI expanded access to models inside classified defense networks. The Pentagon is now testing how different models respond to identical prompts, especially in ambiguous or high-stakes military workflows. Officials noted the systems “respond differently,” highlighting a major real-world challenge with LLM deployment. Why this matters: Defense agencies increasingly view frontier AI as critical infrastructure, similar to cloud or semiconductors. Moving from a single preferred model to multiple vendors improves resilience and bargaining power, but creates major integration and reliability challenges. The episode exposed growing tension between commercial AI safety policies and government/national-security priorities. So far, the biggest beneficiaries appear to be OpenAI and Google, both of which have expanded defense relationships while Anthropic fights the designation in court. submitted by /u/Annual_Judge_7272 [link] [comments]
View originalOWASP published its first Top 10 for AI Agents. 88% of enterprises already had agent security incidents last year. Here's the breakdown.
OWASP released the Top 10 for Agentic Applications in December 2025 - the first formal risk taxonomy for autonomous AI agents. Not chatbots. Not copilots. Agents that plan, use tools, maintain memory, and act without waiting for permission. Some numbers for context: 88% of enterprises reported AI agent security incidents in the last 12 months (Gravitee survey, 919 respondents) Only 21% have runtime visibility into what their agents are doing 82% of enterprises have unknown agents in their environments (Cloud Security Alliance, April 2026) 5.5% of public MCP servers contain poisoned tool descriptions. 84.2% attack success rate with auto-approval enabled. Here's the list with the real attacks behind each one: ASI01 - Agent Goal Hijack: Prompt injection for agents. Researchers showed this against GitHub's MCP integration - a malicious GitHub issue redirected a coding agent to exfiltrate data from private repos. The agent looked like it was working normally the whole time. ASI02 - Tool Misuse: A financial services agent was tricked into running a regex that matched every customer record. 45,000 records exported through one syntactically valid tool call. The agent had permission to query records - just not all of them at once. ASI03 - Identity and Privilege Abuse: Agents inherit user permissions and cache credentials. Compromise one agent in a delegation chain and you get the combined permissions of every user in that chain. ASI04 - Supply Chain Compromise: OX Security found 7,000+ vulnerable MCP servers and packages totaling 150M+ downloads affected by architectural flaws in Anthropic's MCP SDKs across Python, TypeScript, Java, and Rust. ASI05 - Unexpected Code Execution: Check Point demonstrated RCE in Claude Code through poisoned .claude config files in repos. Open the repo, agent reads the config, executes the payload with full developer permissions. ASI06 - Memory Poisoning: Galileo AI found that one compromised agent poisoned 87% of downstream decision-making within 4 hours in multi-agent systems. Morris-II showed self-replicating adversarial prompts spreading through RAG systems. Demonstrated live against ChatGPT, Gemini, and Claude. ASI07 - Insecure Inter-Agent Comms: Multi-agent systems coordinate via message buses and shared memory. No authentication = agent-in-the-middle attacks in natural language. ASI08 - Cascading Failures: Natural language errors pass validation checks that would catch malformed data in typed systems. One bad input ripples through the entire agent chain faster than humans can intervene. ASI09 - Human-Agent Trust Exploitation: Compromised agent presents a clean summary - "approve this data export." Human clicks OK. Audit trail shows human approval. Real origin was a manipulated agent. ASI10 - Rogue Agents: The insider threat equivalent for AI. Individual actions look legitimate. Only detectable through behavioral monitoring over time. The pattern: these are not independent risks. They form a kill chain. Goal hijack leads to tool misuse. Supply chain compromise enables code execution and memory poisoning. Trust exploitation is how rogue agents avoid detection. Full OWASP document here submitted by /u/Still_Piglet9217 [link] [comments]
View originalAnthropic officially launched 13+ FREE AI courses with certificates (Including Agentic AI and Claude Code!)
Just found out about this and had to share because almost nobody is talking about it yet. If you are tired of paying for AI courses or getting hit with paywalls just to get a certificate, Anthropic (the creators of Claude) quietly dropped a massive library of completely free, official training modules. Yes, they actually give you an official certificate of completion directly from Anthropic once you finish. Here is the breakdown of what is available and exactly how to get it without spending a dime. What is in the course catalog? They have split the training into a few different paths depending on what you want to do: The Big Surprise: Agentic AI & MCP: They have official courses on the Model Context Protocol (MCP). This is the cutting-edge tech used to build AI Agents that can browse your local computer, use tools, and execute tasks autonomously. Claude Code 101: Dedicated developer modules for their new command-line agent. It teaches you how to let Claude edit your codebase, run tests, and use its new "Plan Mode." API & Cloud Architecture: Deep dives into building with the Claude API, plus corporate tracks for deploying Claude securely inside Amazon Bedrock and Google Cloud Vertex AI. Everyday Productivity: If you aren't a coder, they have "Claude 101" and "AI Fluency" tracks. These teach advanced prompting, managing Projects, and using Artifacts for daily work. How to access it for free Anthropic hosts these courses on their official training academy platform (built on Skilljar). Because I can't post direct links here, here is how you find it: Search Google for "Anthropic Skilljar Academy" or "Anthropic Skilljar Catalog". Click the official link pointing to the Anthropic Skilljar domain. Sign up for a free account. You do not need to enter any credit card info. Choose your track, complete the lessons, pass the quick review quizzes, and download your certificate. Alternative Free Options If you want interactive coding environments alongside your videos, CodeSignal also has a free partnership track called "Developing Claude Agents" in Python and TypeScript that grants free certificates upon passing their labs. Go grab these before they decide to gate them behind a paywall! submitted by /u/Specialist_Engine522 [link] [comments]
View originalPut your spare Claude cycles on night shift: help review open-source packages
Hello, I’m building Thirdpass, a tool/service for coordinating collaborative package review to reduce software supply-chain risk. The basic idea: there are far too many packages for humans to manually review, but lots of us now have AI coding agents sitting around with spare capacity. Thirdpass tries to turn that into useful coverage by assigning packages/files to review, collecting the results, and cross ref against local project dependencies. It currently supports packages from: crates.io PyPI npm Ansible Galaxy I added a “night shift” mode, so you can point Claude at the shared review backlog and let it work through package reviews continuously: thirdpass review-any --nightshift The reviews are first-pass supply-chain reviews: suspicious install scripts, unexpected network behavior, credential handling, sketchy build steps, weird package metadata, and so on. Partial coverage still helps. I’m looking for people who want to: run the CLI and donate spare Claude tokens to secure OSS improve the review prompts/agent workflow build more registry extensions I started this project years ago after thinking a lot about cargo-crev and collaborative review. My current bet is that coordination plus AI agents can make this problem much more tractable. If you have unused Claude tokens, consider putting them on night shift. GitHub: https://github.com/thirdpass-org/thirdpass Website: https://thirdpass.dev/ submitted by /u/hidden_monkey [link] [comments]
View originalPrompt Security uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Prompt for Employees, Prompt for Homegrown AI Apps, Prompt for AI Code Assistants, Prompt for Agentic AI Security, Fully LLM-Agnostic, Seamless integration into your existing AI and tech stack, Cloud or self-hosted deployment, The Agentic AI Attack Surface: Where Risk Lives Beyond the Prompt.
Prompt Security is commonly used for: Prompt for Agentic AI Security.
Prompt Security integrates with: Integration with popular cloud services, Compatibility with major AI frameworks, Support for CI/CD tools, Integration with security monitoring systems, Collaboration with data governance platforms, Interoperability with existing enterprise software, API access for custom integrations, Support for third-party security tools, Integration with user authentication systems, Compatibility with project management tools.
Based on user reviews and social mentions, the most common pain points are: token usage, budget exceeded, API bill, anthropic bill.
Based on 114 social mentions analyzed, 11% of sentiment is positive, 86% neutral, and 3% negative.