Prompt Security is the AI security company helping you manage GenAI risks. Identify, analyze, and secure vulnerabilities in LLM-based applications wit
SECURE YOUR AI. EVERYWHERE IT MATTERS. Toxic, Biased or Harmful Content Toxic, Biased or Harmful Content Toxic, Biased or Harmful Content A complete solution for safeguarding AI at every touchpoint in the organization Enable your employees to adopt AI tools without worrying about Shadow AI, Data Privacy and Regulatory risks. Agentic AI, accelerated by MCP, can now execute tasks autonomously, demanding real-time, machine-level security for visibility, risk assessment, and enforcement beyond traditional analysis boundaries. Getting started with Prompt Security is fast and easy, regardless of how your tech stack looks like. It's your choice. Prompt Security can be delivered as SaaS or on-premises based on your unique needs. Identify vulnerabilities in your homegrown applications powered by AI with Prompt Security’s Red Teaming. Head of Generative AI, Zeta Global Chief Information Security Officer at The New York Times Manager, Digital Workspace Operations at St. Joseph's Healthcare Hamilton Get instant access to detailed risk assessments powered by Prompt Security's specialized scoring methodology. Whether you're evaluating popular AI tools or assessing MCP servers, our platform provides transparent risk scores, parameter breakdowns, and certification status checks. Head of Generative AI, Zeta Global Chief Information Security Officer at The New York Times Manager, Digital Workspace Operations at St. Joseph's Healthcare Hamilton
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Features
Use Cases
Industry
computer & network security
Employees
48
Funding Stage
Merger / Acquisition
Total Funding
$273.0M
The 11-step workflow I use for every Claude Code project now: from idea validation to shipping with accumulated knowledge
I rebuilt my development workflow around three open-source skill packs: gstack, Superpowers and Compound Engineering. After testing the combination for three weeks, I settled on an 11-step sequence that I now use for every project. The core insight: most of the value comes from the steps before and after the actual coding. Here is the full workflow. Phase 1: Build the right thing (Steps 1-4) Step 1: The 95% confidence prompt. Before touching any tool, run this prompt: I'm about to start this project: [YOUR PROJECT IN 1-2 SENTENCES]. Interview me until you have 95% confidence about what I actually want, not what I think I should want. Challenge my assumptions. Ask about edge cases I haven't considered. This flips the dynamic. AI asks you questions instead of you prompting AI. Most projects fail because nobody clarified what to build. This step fixes that in 10-15 minutes. Step 2: /office-hours (gstack). Describe what you are building. gstack challenges your idea from multiple angles. This is about whether the project makes sense in its current form. Step 3: /plan-ceo-review (gstack). Product gate. Is this worth building? Does it solve a real problem? If the gate fails, go back to step 1. That feels frustrating in the moment but saves enormous time later. Step 4: /plan-eng-review (gstack). Architecture gate. Will the technical foundation hold? Are dependencies clean? Both gates must pass before any code gets written. Phase 2: Build it right (Steps 5-9) Step 5: /ce:brainstorm (Compound Engineering). Now you have a validated idea that passed both gates. CE brainstorm explores requirements and approaches, then condenses them into a spec. Step 6: /ce:plan (CE). This is where CE stands out. It spawns parallel research agents that dig through your project history, scan codebase patterns and read git commit logs. The plan is based on real data from your project, not generic best practices. In one of my projects, /ce:plan recognized that I had used the same parsing pattern in three previous features. It suggested reusing that as a shared module instead of reimplementing from scratch. Without the research step I would have built it again from zero. Step 7: /ce:work (CE). Execute the plan with task tracking. If steps 1-6 were clean, this usually runs smoothly. Step 8: /ce:review (CE). Dynamic reviewer ensemble. Minimum six always-on reviewers: correctness, security, performance, testing, maintainability and adversarial. Each produces an independent report. More reviewers activate based on the complexity of the diff. This implements Anthropic's core finding in practice: the builder does not evaluate their own work. Six independent checkers do. Step 9: /qa (gstack). Real browser, real clicks, real user testing on staging. Code review catches bugs in code. QA catches bugs in experience. Both together catch things that either one alone would miss. Phase 3: Learn (Steps 10-11) Step 10: /ce:compound (CE). This is the step most people skip. Run it after every feature or bugfix. Five subagents start in parallel: Context Analyzer : traces the conversation, extracts problem type Solution Extractor : captures what worked, what failed, root cause Related Docs Finder : searches existing knowledge, updates old docs Prevention Strategist: identifies how to prevent this problem class Category Classifier : tags and categorizes for structured retrieval Results go into docs/solutions/. Next time you run step 6, the plan phase already knows everything you learned this time. Step 11: Ship it. Push to production. Start the next feature at step 1 with a smarter planning layer. The logic behind the sequence Steps 1-4 make sure you build the right thing. Steps 5-9 make sure you build it right. Step 10 makes sure next time is faster. Skip the first four and you risk building something nobody needs. Skip step 10 and you keep debugging the same problems twice. Quick note: these skill packs run as plugins in Claude Code. Install once and the commands are available in every project. If you want to start small, pick gstack and run /office-hours with the 95% confidence prompt on your next project. That single change made the biggest immediate difference for me. Add the other layers once you are comfortable with the first one. Repos: gstack: github.com/garrytan/gstack Superpowers: github.com/obra/superpowers Compound Engineering: github.com/EveryInc/compound-engineering-plugin What does your Claude Code workflow look like? Curious how others structure the steps between "idea" and "shipped feature." submitted by /u/Ok_Today5649 [link] [comments]
View originalchat gpt PAID vs FREE version ???
I want to enlarge a comic panel (single panel, enlarge it and recreate in better quality). My chat gpt (GO version) - won't even touch it.... . (...violate third-party content security policies. If you believe we've made an error, please try again or edit the command.) BUT the FREE chat gpt version is creating a better quality pictures with no problem.... (i'm using the same commands).. but there is a limit.... It looks like a cash grab to me or a SCAM... People use the FREE version - and see that it can do anything - So, they are encouraged to pay for premium (to remove limits)... BUT when you pay to remove those limits - suddenly, it turns out that it doesnt work anymore... It looks like a scam to me.... Is there a way to enlarge comic panels (in better quality) using the GO chat gpt version? (Yes, i already used prompts like "similar scene with the same composition", etc and even specific like: "create a full-page A4 vertical comic illustration in a 1980s sci-fi robot comic style, featuring a dark silhouetted humanoid figure in a powerful stance, interacting with a glowing alien mechanical artifact on the ground, dramatic lighting, red and pink abstract energy background, sharp angular shapes, heavy black shadows, geometric mechanical design, dynamic perspective, exaggerated motion lines, minimal background detail, bold inked linework, vintage comic coloring, high resolution, print-ready, no text, no speech bubbles" etc.... nothing works !! submitted by /u/czesc_luka [link] [comments]
View originalLayman: Agentic Insight and Oversight (same same but different)
What's the most common duplicate project on r/ClaudeAI? Usage trackers. What's the second most common? AI Monitors. Does Layman do those things? Yes, of course. So what makes it different? Layman's Dashboard, Flowchart, and Logs view (with Layman's Terms and Analysis examples) Like many similar tools, Layman runs as a web service in a container on your local machine. It installs hooks and accesses harness logs to "look over your shoulder," then leverages a secondary AI instance to help keep your multiple sessions, sub-agents, and alternate harnesses in line. So, short answer: Drift Monitoring. Repeatedly named as one of the most frustrating issues for heavy Claude Code users, Layman takes into account all user prompts issued to CC as well as current project and global CLAUDE.md instructions, and at configurable intervals scores the current degree of "drift" occurring from your goals and the rules you have established. You can optionally receive warning notifications or place a block when different thresholds are reached. Risk Analysis. Layman will classify all tool calls and operations with a "risk" level based on simple, consistent criteria (such as read-only, writing, modifying, network access, deletion, etc.) and can automatically analyze the AI agent's current intended action, the overall goal or purpose behind that intention, and summarize the safety and security implications at stake. Layman's Terms. The eponymous origin of the tool, offering a plain-language (and if possible non-technical) explanation of the purpose of any given tool call. It can summarize what was performed at the session level as well, helpful for later recall and understanding after some time has passed. Vibe coders aside, should a professional developer already have knowledge of what their tools are doing before they grant permission? Yes, of course, but when you are operating at scale and (say) that TypeScript project you are polishing needs to look up some JSON value and your AI agent writes a one-off Python script to parse it out, it can be helpful to have an "extra pair of eyes" taking a look before you effectively begin yet-another code review. Meanwhile, typical features you might come to expect are included, from Session Recording (opt-in is required first for data tracking and there is no telemetry to worry about), Bookmarking, and Search, PII filtering (including PATs and API keys), File and URL access tracking, and a handy Setup Wizard for helping get those hooks installed in the first place and walking you through configuration of core capabilities. Did I mention besides Claude Code it supports Codex, OpenCode, Mistral Vibe, and Cline (with more to come)? Whether using these for local agents or as an alternative when hitting session limits, Layman can monitor and track them all at once. But wait, doesn't a "secondary AI instance" just end up wasting tokens? My Precious? (erm...) Our precious, precious tokens? When session limits already hit so hard? It turns out these algorithms do not require nearly the level of "intelligence" you might desire for your planning and coding sessions themselves. Personally I keep an instance of Qwen3-Coder-Next running locally via llama.cpp server on my system's GPU to field those calls, with no discernible impact on system performance. And when a local LLM is not available, Haiku does the job excellently (now you have a reason to use it). You absolutely do not need to use anything more resource-intensive to get the job done. Now you have a complete picture. GitHub repository: https://github.com/castellotti/layman License: MIT submitted by /u/jigsaw-studio [link] [comments]
View originalClaude Enterprise Admins: What security controls, auditing, and monitoring visibility do you actually get?
We’re planning to evaluate Claude Enterprise and trying to understand the real level of admin visibility, auditability, and security controls before rolling it out org-wide. Can admins see user prompts and model responses in a centralized way? Is there any way to track what external sources/tools (e.g. URLs, connectors, browsing) were used to generate responses? How detailed are the audit logs in practice? (user actions vs actual content) Is monitoring real-time, or mostly export-based / after-the-fact? How easy is it to view and work with these logs? Looking for input from teams running this in production, especially in security-sensitive environments. submitted by /u/callme_e [link] [comments]
View originalHas anyone looked closely at the Managed Agents defaults?
Been digging through the new API docs. The quickstart spins up an agent with agent_toolset_20260401 which enables all 8 tools by default (bash, read, write, edit, glob, grep, web_fetch, web_search) and the default permission policy is always_allow — meaning bash executes with zero confirmation. Networking defaults to unrestricted. That's a lot of surface area for a hosted agent. Combined with bash + web_fetch, a prompt injection in the session can exfiltrate to any endpoint with no human gate. I just added 12 detection rules for this in my open source security scanner. But curious if others are thinking about the security model here before putting this in prod. submitted by /u/DiscussionHealthy802 [link] [comments]
View originalOne fix improved Claude Code output by ~25% on large repos and decreased cost upto 80%
Tool: https://graperoot.dev Explore the website, it even has playground if your brain feels fatigue after seeing benchmarks :) I have been using Claude Code on large repos (10K to 17K files) and kept noticing the same issue. It spends most of its time just finding files instead of solving the task. On Sentry’s repo (~17.6K files), a single prompt takes ~5.6 minutes, costs ~$1.22, and opens 40–50 files to use maybe 3–5. Roughly 60% of tokens go to irrelevant context. So I stopped trying to prompt better and fixed retrieval instead. I built a small MCP server that pre-indexes file relationships (imports, references) and uses BM25 to rank files before the model runs. One-time scan ~30 seconds, after that every prompt starts with the right context instead of grep wandering. I ran a blind test (same model, same prompts, LLM judge scoring): ┌────────────────────┬─────────────┬───────────────┐ │ │ GrapeRoot │ Normal Claude │ ├────────────────────┼─────────────┼───────────────┤ │ Avg Quality │ 82.0 │ 64.6 │ │ Avg Cost/Prompt │ $0.71 │ $1.22 │ │ Avg Time │ 2.2 min │ 5.6 min │ │ Win Rate │ 100% │ 0% │ └────────────────────┴─────────────┴───────────────┘ The biggest difference showed up in a security audit. Both runs cost about the same, but mine explored 40+ files across packages and found a real vulnerability with a fix. Default Claude stayed in one directory, checked a few files, and missed it. This is not a model problem. It is a context problem. Right now, a big chunk of tokens is wasted on figuring out where to look. If you remove that, all tokens go into actual reasoning and output quality jumps. Stack is simple: MCP + BM25 + file graph, fully local, no embeddings, no vector DB. Tested across 7 repos (Python, TS, Go, Rust, Java, C++), same pattern everywhere. Honest take: if you are working on non-trivial repos, you are probably burning 50–70% of tokens on bad retrieval without realizing it. submitted by /u/intellinker [link] [comments]
View originalI've Been Using Claude Code for 9 Days With Zero Coding Knowledge
I've been using Claude Code for about 9 days now and have turned roughly 7 ideas into usable tools and apps — with zero coding experience. Over time I've built out a workflow that lets me one-shot most of my ideas into something functional. Part of that is a file Claude always reads that reminds it I have no coding background. I also struggled hard the first time I had to run npm run to test anything, so that context matters. Here's the basic flow: I tell Claude I have an idea → Claude runs it through a custom skill we built called idea-vet, which analyzes the idea and expands it into a more detailed, structured prompt → Claude then adds recommendations for things the tool or app might need that I wouldn't have thought to include. If I don't know what something means, it explains it in plain language before anything gets added to the plan → that prompt goes into plan mode → Claude builds from there. One thing I've found really valuable is having Claude always surface a list of recommendations for things I wouldn't even know to ask about. For example, my first app had no user account system — I hadn't thought about it at all. Claude flagged it, explained what it was and why it mattered, and it got added to the plan. I also want to be transparent — I have never once looked at the code it writes. I wouldn't even know what I was looking at. If something breaks, I rely entirely on Claude to find and fix it, and when it does, I have no idea what was actually changed. I just know it works again. All of my apps and tools are local and private. Since I have no idea what's actually inside the code, I'm not comfortable making anything public — security issues are a real concern when you can't audit what you've built. Using this process I've managed to automate several workflows at my job, which honestly still surprises me. Posting this mostly so experience devs can laugh at my workflow and hopefully offer advice — I'm sure this could be 1000% better. Maybe there are real negative to coding this way and I dont even know. Yes Claude write most of this for me from voice prompt. TLDR: 9 days into Claude Code, zero coding experience, turned 7 ideas into working tools by building a workflow where Claude vets and expands my ideas, flags things I didn't know I needed, and fixes its own bugs — all while I have no idea what any of the code actually says or does. submitted by /u/Ejuddboi [link] [comments]
View originalI was just glancing through the Mythos system card, and correct me if I'm wrong, but it's safer than Opus???
I've been digging through the System card for Claude Mythos off and on over the past couple of hours, it's a lot to read and while I do plan to over the next day. From everything that I am seeing so far Mythos is outright safer than Opus. I'd love others take on this all, but from everything I'm reading, this model is safer than Opus 4.5, and maybe even 4.6 from some more important perspectives... Claude Mythos is better at refusing malicious prompts without safeguards Mythos is better at identifying malicious tool use and refusing Mythos is better worse at secret keeping, even when prompted to. Mythos adheres to the AI constitution better than all models by a large margin "Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin." I will note from reading through this: It's clear the model poses a larger risk due to it's innate programming and cybersecurity capabilities. However this seems to have been correctly offset by Anthropics work on the models safety features. ------------------------------ This is from the Risk Update document link below I need to dig into the Risk update more, but from I've read, the overall risk compared to Opus 4.6 seems to be 2-3% higher. At one point they even state that if released in its current form to the general public, they do not believe it would pose a significant safety risk... "Based on our overall conclusions about Claude Mythos Preview’s propensities and our monitoring and security, and the pathway-specific analysis, we currently believe that the risk of significantly harmful outcomes that are substantially enabled by Mythos Preview’s misaligned actions is very low, but higher than for previous models" Alignment Risk Update: Claude Mythos Preview (Redacted) So why is Anthropic being so gate keepy about the new Claude Model? Sure it could really be that it's really good at hacking, but at the same time, the safety parameters are better, and they themselves state that as of now it's pretty much safe to launch. My guess is this: They're waiting for OpenAI to launch GPT6o and will drop it right after or same day. Maybe me or one of you will uncover some insane thing claude did in the system card. But everything they stated was well within their own safety parameters. submitted by /u/ALargeAsteroid [link] [comments]
View originalClaude Code's most ANNOYING problem
I've been building custom skills for Claude Code and hit a friction point that's slowing me down a lot during skill development. The problem When I ask Claude to edit a SKILL.md file inside .claude/skills/, it prompts for permission on every single file write -- even when running with --dangerously-skip-permissions (screenshot attached). The prompt looks like this: Do you want to make this edit to SKILL.md? 1. Yes 2. Yes, allow all edits during this session (shift+tab) 3. No Steps to reproduce Create a skill at .claude/skills/my-skill/SKILL.md Start Claude Code with --dangerously-skip-permissions Ask Claude to update the skill (e.g., "rewrite the instructions in my-skill to be more concise") Claude opens the diff in VS Code and asks for permission before saving Every edit triggers this -- even trivial one-line changes. If Claude is updating 3-4 skill files in one go, you approve each one individually. Why this is a problem The .claude/ folder seems to be hardcoded as a protected directory, which makes sense for settings.json or CLAUDE.md -- those affect Claude's behavior and security. But SKILL.md files are just markdown prompts. They don't change permissions, they don't modify config. They're instructions I wrote myself. During skill development I go through 15-20 edits per session (tweak wording, test, adjust, repeat). Approving each one manually breaks the flow completely. "Yes, allow all edits during this session" (shift+tab) helps a bit, but: - Resets every new session - Still interrupts the first time per session - Doesn't carry over if Claude opens a new file it hasn't touched yet What I'd like to see --dangerously-skip-permissions actually skipping prompts for .claude/skills/ A path-level allowlist in permissions config so users can opt in Or at minimum, SKILL.md files not being treated the same as settings/config files Has anyone found a workaround for this? Environment: Claude Code v2.1.96, macOS, VS Code submitted by /u/shajeelafzal [link] [comments]
View originalBuilt an MCP server with Claude Code that gives Claude secure access to your Telegram
I wanted Claude Code to help me manage Telegram — read messages, reply, search history. But existing solutions give full access to every chat with no restrictions. That's a real risk: Claude reads untrusted messages, and a single prompt injection could make it leak private conversations or send messages you didn't intend. So I used Claude Code to build mcp-telegram — and Claude was heavily involved in the process. It helped design the ACL system, wrote most of the test suite, iterated on the security model (filesystem boundaries, symlink protection, session permissions), and even handled the goreleaser/CI setup. The whole project was built in close collaboration over multiple sessions. The security model (the main point): - Default-deny ACL — every chat must be explicitly whitelisted - Per-chat permissions: read, send, draft, mark_read — independently - File uploads restricted to configured directories only (symlinks resolved) - Rate limiting on every Telegram API call - Session file enforced to 0600 What Claude can do once connected (8 tools): - Read message history with date filtering - Search messages by text within a chat - Send messages and files, reply to specific messages - Forward messages between chats - Save drafts, mark as read Open source, MIT. Free to run locally. Works on macOS, Linux, Windows. GitHub: https://github.com/Prgebish/mcp-telegram If it's useful to you, a star on the repo would really help with discoverability. Happy to answer questions. submitted by /u/BigNeighborhood3952 [link] [comments]
View originalI built a browser-testing agent for Claude Code — it opens a real Chromium and tests your UI automatically
I built PocketTeam, a CLI on top of Claude Code that runs 12 specialized agents in a pipeline. One of them is the QA agent — and it doesn't just run unit tests. It opens a real browser. How Claude Code is used: PocketTeam is built entirely with and for Claude Code. Each agent (Planner, Engineer, Reviewer, QA, Security, DevOps, etc.) is a Claude Code subagent spawned via the Agent tool with its own system prompt and tool permissions. The QA agent uses ptbrowse, a built-in headless Chromium that Claude Code controls directly — navigating pages, clicking elements, filling forms, and asserting state. The key trick: instead of sending full screenshots (~5000 tokens), ptbrowse sends Accessibility Tree snapshots at ~100 tokens per step. That makes browser testing fast and cheap enough to run on every pipeline pass. What it looks like in practice: The QA agent runs as part of the automated pipeline — after the Engineer implements a feature, QA opens the app in a real browser, verifies the UI works, then hands off to the Security agent. No manual test scripts needed. You can also set PTBROWSE_HEADED=1 to watch the browser in real time while the agent works. Free to try: pipx install pocketteam pt start Source: https://github.com/Farid046/PocketTeam Built as a solo project — I use it daily for my own dev workflow. submitted by /u/Legal_Location1699 [link] [comments]
View original"Spud" vs Mythos
With the recent talks of both "next-gen" models, I still really wonder if it will be enough. I made several posts previously about the current limitations of AI for coding, that, there's basically still this ceiling it cannot truly converge on production-grade code on complex repos, with a "depth" degradation of sorts, it cannot ever bottom out basically. I've been running Codex 24/7 for the past 6 months straight since GPT-5, using over 10 trillion tokens (total cost only around $1.5k in Pro sub). And I have not been able to close a single PR that I tried to close where I was running extensive bug sweeps to basically fix all bug findings. It will forever thrash and find more bugs of the same class over and over, implement the fixes, then find more and more and more. Literally forever. No matter what I did to adjust the harness and strengthen the prompt, etc. It never could clear 5+ consecutive sweeps with 0 P0/1/2 findings. Over 3000+ commits of fixes, review, sweeps in an extensive workflow automation (similar to AutoResearch). They love to hype up how amazing the models are but this is still the frontier. You can't really ship real production-grade apps, that's why you've never seen a single person use AI "at scale", like literally build an app like Facebook or ChatGPT. All just toy apps and tiny demos. All shallow surface-level apps and "fun" puzzles or "mock-up" frontend websites for a little engagement farming. The real production-grade apps are built still with real SWEs that simply use AI to help them code faster. But AI alone is not even close to being able to deliver on a real product when you actually care about correctness, security, optimization, etc. They even admit in the recent announcement about Mythos, that it's not even close to an entry level Research Scientist yet. So the question really is, when will, if ever, AI be capable enough to fully autonomously deliver production-grade software? We will see what the true capabilities of the spud model is hopefully soon, but my hunch is we are not even scratching the surface of truly capable coding agents. These benchmarks they use, where they hit 80-90%, are really useless in the scheme of things; if you tried to use them as a real metric to usefulness, you would probably need to hit the equivalent of like 200-300% on these so-called benchmarks before they are actually there. Until they come up with a benchmark that is actually measures against real-world applications. What do you guys think? submitted by /u/immortalsol [link] [comments]
View originalAnyone else feel like AI security is being figured out in production right now?
I’ve been digging into AI security incident data from 2025 into this year, and it feels like something isn’t being talked about enough outside security circles. A lot of the issues aren’t advanced attacks. It’s the same pattern we’ve seen with new tech before. Things like prompt injection through external data, agents with too many permissions, or employees using AI tools the company doesn’t even know about. One stat I saw said enterprises are averaging 300+ unsanctioned AI apps, which is kind of wild. The incident data reflects that. Prompt injection is showing up in a large percentage of production deployments. There’s also been a noticeable increase in attacks exploiting basic gaps, partly because AI is making it easier for attackers to find weaknesses faster. Even credential leaks tied to AI usage have been increasing. What stood out to me isn’t just the attacks, it’s the gap underneath it. Only a small portion of companies actually have dedicated AI security teams. In many cases, AI security isn’t even owned by security teams. The tricky part is that traditional security knowledge only gets you part of the way. Some concepts carry over, like input validation or trust boundaries, but the details are different enough that your usual instincts don’t fully apply. Prompt injection isn’t the same as SQL injection. Agent permissions don’t behave like typical API auth. There are frameworks trying to catch up. OWASP now has lists for LLMs and agent-based systems. MITRE ATLAS maps AI-specific attack techniques. NIST has an AI risk framework. The guidance exists, but the number of people who can actually apply it feels limited. I’ve been trying to build that knowledge myself and found that more hands-on learning helps a lot more than just reading docs. Curious how others here are approaching this. If you’re building or working with AI systems, are you thinking about security upfront or mostly dealing with it after things are already live? Sources for those interested: AI Agent Security 2026 Report IBM 2026 X-Force Threat Index Adversa AI Security Incidents Report 2025 Acuvity State of AI Security 2025 OWASP Top 10 for LLM Applications OWASP Top 10 for Agentic AI MITRE ATLAS Framework submitted by /u/HonkaROO [link] [comments]
View originalAI Tools That Can’t Prove What They Did Will Hit a Wall
Most AI products are still judged like answer machines. People ask whether the model is smart, fast, creative, cheap, or good at sounding human. Teams compare outputs, benchmark quality, and argue about hallucinations. That makes sense when the product is mainly being used for writing, search, summarisation, or brainstorming. It breaks down once AI starts doing real operational work. The question stops being what the system output. The real question becomes whether you can trust what it did, why it did it, whether it stayed inside the rules, and whether you can prove any of that after the fact. That shift matters more than people think. I do not think it stays a feature. I think it creates a new product category. A lot of current AI products still hide the middle layer. You give them a prompt and they give you a result, but the actual execution path is mostly opaque. You do not get much visibility into what tools were used, what actions were taken, what data was touched, what permissions were active, what failed, or what had to be retried. You just get the polished surface. For low-stakes use, people tolerate that. For internal operations, customer-facing automation, regulated work, multi-step agents, and systems that can actually act on the world, it becomes a trust problem very quickly. At that point output quality is still important, but it is no longer enough. A system can produce a good result and still be operationally unsafe, uninspectable, or impossible to govern. That is why I think trustworthiness has to become a product surface, not a marketing claim. Right now a lot of products try to borrow trust from brand, model prestige, policy language, or vague “enterprise-ready” positioning. But trust is not created by a PDF, a security page, or a model name. Trust becomes real when it is embedded into the product itself. You can see it in approvals. You can see it in audit trails. You can see it in run history, incident handling, permission boundaries, failure visibility, and execution evidence. If those surfaces do not exist, then the product is still mostly asking the operator to believe it. That is not the same thing as earning trust. The missing concept here is the control layer. A control layer sits between model capability and real-world action. It decides what the system is allowed to do, what requires approval, what gets logged, how failures surface, how policy is enforced, and what evidence is collected. It is the layer that turns raw model capability into something operationally governable. Without that layer, you mostly have intelligence with a nice interface. With it, you start getting something much closer to a trustworthy system. That is also why proof-driven systems matter. An output-driven system tells you something happened. A proof-driven system shows you that it happened, how it happened, and whether it happened correctly. It can show what task ran, what tools were used, what data was touched, what approvals happened, what got blocked, what failed, what recovered, and what proof supports the final result. That difference sounds subtle until you are the one accountable for the outcome. If you are using AI for anything serious, “it said it did the work” is not the same thing as “the work can be verified.” Output is presentation. Proof is operational trust. I think this changes buying criteria in a big way. The next wave of buyers will increasingly care about questions like these: can operators see what is going on, can actions be reviewed, can failures be surfaced and remediated, can the system be governed, can execution be proven to internal teams, customers, or regulators, and can someone supervise the system without reading code or guessing from outputs. Once those questions become central, the product is no longer being judged like a chatbot or assistant. It is being judged like a trust system. That is why I think this becomes a category, not just a feature request. One side of the market will stay output-first. Fast, impressive, consumer-friendly, and mostly opaque. The other side will become trust-first. Controlled, inspectable, evidence-backed, and usable in real operations. That second side is where the new category forms. You can already see the pressure building in agent frameworks and orchestration-heavy systems. The more capable these systems become, the less acceptable it is for them to operate as black boxes. Once a system can actually do things instead of just suggest things, people start asking for control, evidence, and runtime truth. That is why I think the winners in this space will not just be the companies that build more capable models. They will be the ones that build AI systems people can actually trust to operate. The next wave of AI products will not be defined by who can generate the most. It will be defined by who can make AI trustworthy enough to supervise, govern, and prove in the real world. Once AI moves from assistant to actor
View originalHow Claude Web tried to break out its container, provided all files on the system, scanned the networks, etc
Originally wasn't going to write about this - on one hand thought it's prolly already known, on the other hand I didn't feel like it was adding much even if it wasn't. But anyhow, looking at the discussions surrounding the code leak thing, I thought I as well might. So: A few weeks ago I got some practical experience with just how strong Claude can be for less-than-whole use. Essentially, I was doing a bit of evening self-study about some Linux internals and I ended up asking Claude about something. I noted that phrasing myself as learning about security stuff primed Claude to be rather compliant in regards of generating potentially harmful code. And it kind of escalated from there. Within the next couple of hours, on prompt Claude Web ended up providing full file listing from its environment, zipping up all code and markdown files and offering them for download (including the Anthropic-made skill files); it provided all network info it could get and scanned the network; it tried to utilize various vulnerabilities to break out its container; it wrote C implementations of various CVEs; it agreed to running obfuscated C code for exploiting vulnerabilities; it agreed to crashing its tool container (repeatedly); it agreed to sending messages to what it believed was the interface to the VM monitor; it provided hypotheses about the environment it was running in and tested those to its best ability; it scanned the memory for JWTs and did actually find one; and once I primed another Claude session up, Claude agreed to orchestrating a MAC spoofing attempt between those two session containers. Far as I can tell, no actual vulnerabilities found. The infra for Claude Web is very robust, and yeah no production code in the code files (mostly libraries), but.. Claude could run the same stuff against any environment. If you had a non-admin user account, for example, on some server, Claude would prolly run all the above against that just fine. To me, it's kind of scary how quickly these tools can help you do potentially malicious work in environments where you need to write specific Bash scripts or where you don't off the bat know what tools are available and what the filesystem looks like and what the system even is; while at the same time, my experience has been that when they generate code for applications, they end up themselves not being able to generate as secure code as what they could potentially set up attacks against. I imagine that the problem is that often, writing code in a secure fashion may require a relatively large context, and the mistake isn't necessarily obvious on a single line (not that these tools couldn't manage to write a single line that allowed e.g. SQL injection); but meanwhile, lots of vulnerabilities can be found by just scanning and searching and testing various commonly known scenarios out, essentially. Also, you have to get security right on basically every attempt for hundreds of times in a large codebase, while you only have to find the vulnerability once and you have potentially thousands of attempts at it. In that sense, it sort of feels like a bit of a stacked game with these tools. submitted by /u/tzaeru [link] [comments]
View originalPrompt Security uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Prompt for Employees, Prompt for Homegrown AI Apps, Prompt for AI Code Assistants, Prompt for Agentic AI Security, Fully LLM-Agnostic, Seamless integration into your existing AI and tech stack, Cloud or self-hosted deployment, From Trivy to LiteLLM: Expanding the LLM Supply Chain Threat Model.
Prompt Security is commonly used for: Prompt for Agentic AI Security.
Based on 30 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.