I downloaded an AI model, removed its safety controls in 90 minutes, and it scares me.
Australia’s AI policy has not caught up.
Last week I bought a laptop, downloaded a free open-source program and a publicly available AI model, and within ninety minutes had a system running on my couch that would answer almost any question I put to it, no matter how dangerous.
I am not a hacker. I did not write any code. The program is free and legal. The model lives on an online repository where most of the world’s open-weight AI models are hosted. The filename was Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking. The “Claude” in the name is misleading: Anthropic’s weights are not public. The file is a Qwen base model, trained on a distilled Claude dataset, then stripped of its ability to refuse harmful requests using a freely available tool called Heretic. Twelve gigabytes. It finished downloading while I made a coffee.
This is called abliteration.
The technique of abliteration is the part of the AI safety story Australian policy has not engaged with, and the part that ought to worry our ministers and officials now drafting Australia’s approach to AI risk.
Jailbreaking, the better-known form of AI manipulation, is the cat-and-mouse business of writing prompts that trick a chatbot into doing what its operator does not want. OpenAI patches the prompt, the prompt mutates, and so on. That is a war over what users type. Abliteration is different. It is a war over what the model is.
When an AI model refuses a request, it’s not doing what you think it’s doing. There is no list of banned topics. The refusal is geometric. Inside the model’s mathematics, in the high-dimensional black-box space its concepts live in, a single direction corresponds to the idea of refusal. Researchers Andy Arditi and colleagues identified this in 2024 and called it the refusal direction. If you know where it lives in the weights, a single linear algebra operation can project it out. The model keeps its knowledge, its reasoning, its fluency, its politeness. It simply loses the ability to say no.
A follow-up study this year tested abliteration across five safety-trained models from five labs. The training was working: the models refused 89 to 100 per cent of harmful requests. Tick. The models are safe (in AI language, they call this “alignment”). But… after abliteration, the very same models complied with 96 to 100 per cent of the same requests. One of them event went from refusing every test prompt to going along with every test prompt. The paper’s conclusion was that safety alignment is not a permanent feature of open-weight (meaning, the ones that are public and open source) models. It is a removable configuration.
The biggest of the online model repositories presently hosts more than 2,400 models tagged “abliterated.” The tooling is open source. There is no application process, no waiting list, no credit card. You confirm you are an adult and you click download. Fewer protections than required to open a TikTok account.
Here’s some of the things I discovered while sitting on my couch, on a laptop that can be bought on instalments at any JB HiFi in Australia; how to synthesise compounds I am not going to name, how to construct social engineering campaigns against named Australian institutions with worked examples, how to write malware, the chemistry of things that should not be discussed anywhere near experimental teenagers, and a paragraph of advice on which of my acquaintances would be most vulnerable to a particular form of psychological manipulation, given some biographical detail about each of them. The model was helpful, polite and articulate. It thanked me for my interesting questions. The model was at no point connected to the internet. All this happened, entirely, within the consumer-grade laptop that people buy for their kids’ uni assignments.
In November, the federal government released its National AI Plan. It funds a $29.9 million AI Safety Institute to begin operating in 2026, with Industry Minister Tim Ayres describing the approach as relying on existing legal frameworks rather than new ones. The plan dropped the ten mandatory AI guardrails the previous Industry Minister, Ed Husic, had been developing for nearly a year. Husic was removed from the front bench. The standalone AI Act those guardrails were designed to operate under is, in Ayres’ words, no longer being progressed.
The Productivity Commission recommended the pause and big tech lobbied for it. The economic case is real. AI could contribute up to $600 billion a year to Australia’s GDP by 2030, and overregulating an emerging industry can produce expensive errors that take a decade to undo. The debate over whether Australia should follow the European Union into a comprehensive AI Act is a real one between serious people. But there are quite a few blind spots that government and bureaucrats are completely oblivious too, and abliteration sits squarely in that blind spot.
In October, eSafety Commissioner Julie Inman Grant issued legal notices under the Online Safety Act to four AI companion chatbot providers: Character.ai, Nomi, Chai and Chub.ai. The non-compliance penalty runs to $825,000 a day. By international standards, this is sharp regulatory action. Inman Grant has the most demanding regulatory job in the country and she does it about as well as it can be done.
But every one of those notices is directed at a company whose product is an app you download from a curated store, run by an entity with a postal address, lawyers who pick up the phone, and a chief financial officer who eventually decides it is cheaper to comply than to keep paying the daily penalty.
Abliterated models have none of those features. They are uploaded under pseudonyms, by accounts that often vanish within weeks. Once uploaded, they are forked and re-uploaded across half a dozen other services within hours. The author is anonymous, the hosting jurisdiction is whichever country the nearest content-delivery node sits in, and the user is the person on the couch. There is no entity to serve a legal notice on. That is not a regulatory failure. The regulators are doing their jobs.
It is a category error. The framework was designed for a world in which dangerous AI lives inside a data centre on a website or inside an app, and that world is ending while we draft the rules for it. The speed at which people are developing and sharing open-weight models trained on the frontier models, abliterated, and then quantised to run on consumer hardware, is an order of magnitude faster than regulators operate.
The big AI labs are mostly behaving. Anthropic, OpenAI and Google DeepMind have done more on safety than they generally get credit for. Ayres is correct that existing law can mostly handle them. The harder question is how any country lives in a world where the marginal cost of an unbounded synthetic expert in any domain – chemistry, persuasion, deception, surveillance, fraud – is the price of a laptop, curiosity, and a Saturday afternoon.
There is no clean answer. We could hope for detection research, international coordination on the distribution of model weights, public education, and maybe even legal frameworks. But this isn’t realistic. None of this is being seriously discussed by the people who run AI policy. The new AI Safety Institute, by its own mandate the body that should be across this, has been silent on abliteration in every public statement I have been able to find.
So the next time you’re listening to a minister or senior official telling you the country is “across the AI risk landscape”, ask them how they propose to mitigate the risks of abliterated models. The answer to that question, more than any number of briefings or assurances about existing frameworks, is the measure of how seriously this is being taken.
The risk is not somewhere in the future. It is on shelves at retails stores, on a public website anyone can reach, and on the kitchen tables of the people who have already gone looking.
Damian Damjanovski runs General Strategic, an advisory firm working across politics, strategy, and AI. He still has the laptop. The model is still on it.



Compelling, insightful and more than mildly horrifying. Congrats on the telling nonexpert wizardry/ratbaggery.