blintz 7 days ago | next |

These watermarks are not robust to paraphrasing attacks: AUC ROC falls from 0.95 to 0.55 (barely better than guessing) for a 100 token passage.

The existing impossibility results imply that these attacks are essentially unavoidable (https://arxiv.org/abs/2311.04378) and not very costly, so this line of inquiry into LLM watermarking seems like a dead end.

jkhdigital 7 days ago | root | parent | next |

I spent the last five years doing PhD research into steganography, with a particular focus on how to embed messages into LLM outputs. Watermarking is basically one-bit steganography.

The first serious investigations into "secure" steganography were about 30 years ago and it was clearly a dead end even back then. Sure, watermarking might be effective against lazy adversaries--college students, job applicants, etc.--but can be trivially defeated otherwise.

All this time I'd been lamenting my research area as unpopular and boring when I should've been submitting to Nature!

impossiblefork 6 days ago | root | parent |

Though, surely secure steganography with LLMs should be quite easy?

Presumably there are things like key exchanges that look like randomness, and then you could choose LLM output using that randomness in such a way that you can send messages that look like an LLM conversation?

Someone starts the conversation with a real message 'Hello!' and then you do some kind of key exchange where what is exchanged is hard to distinguish from randomness, and use those keys to select the probabilities of the coming tokens from the LLM. Then once they is established you use some kind of cipher to generate random-looking ciphertext and use that as the randomness used to select words in the final bit?

Surely that would work? If there is guaranteed insecurity, it's for things like watermarking, not for steganography?

sbszllr 7 days ago | root | parent | prev | next |

I’ve been working in the space since 2018. Watermarking and fingerprinting (of models themselves and outputs) are useful tools but they have a weak adversary model.

Yet, it doesn’t stop companies from making claims like these, and what’s worse, people buying into them.

sgt101 7 days ago | root | parent | prev | next |

I think this misses a key point.

If there were a law that AI generated text should be watermarked then major corporations would take pains to apply the watermark, because if they didn't then they would be exposed to regulatory and reputational problems.

Watermarking the text would enable people training models to avoid it, and it would allow search engines to determine not to rely on it (if that was the search engine preference).

It would not mean that all text not watermarked was human generated, but it would mean that all text not watermarked and provided by institutional actors could be trusted.

auggierose 7 days ago | root | parent |

> It would not mean that all text not watermarked was human generated, but it would mean that all text not watermarked and provided by institutional actors could be trusted.

What?

sgt101 7 days ago | root | parent |

well - trusted in the sense that the unwatermarked text was human generated ;o)

soulofmischief 7 days ago | root | parent |

You simply cannot trust that non-watermarked text was human generated. Laws can be broken. Companies are constantly being found in violation of the law.

You're trading the warm feeling of an illusion of trust for a total lack of awareness and protection against even the most mild attempt at obfuscation. This means that people who want to hurt or trick you, will have free reign to do it, even if it means your 90-year-old grandmother lacks the skill.

sgt101 6 days ago | root | parent |

Here's an example of why I think this would work.

GDPR.

How many breaches of privacy by large orgnaizations occur in the EU? When they occur, what happens?

On the other hand - what's the story in the USA?

Alternatively what would have happened if we simply said "data privacy cannot be maintained, no laws will help"?

soulofmischief 6 days ago | root | parent | next |

Even if you achieved perfect compliance with law-abiding organizations, that does nothing to protect you against any individual organization which does not abide by local laws.

Consider any hacker from a non-extraditing rogue state.

Consider any nation state actor or well-equipped NGO. They are more motivated to manipulate you than Starbucks.

Consider the slavish, morbid conditions faced by foreign workers who manufacture your shoes and mine your lithium. All of your favorite large companies look the other way while continuing to employ such labor today, and have a long history of partnering with the US government to overthrow legitimate foreign democratic regimes in order maintain economic control. Why would these companies have better ethics regarding AI-generated output?

And consider the US government, whose own intelligence agencies are no longer forbidden from employing domestic propaganda, and whom will certainly get internal permission to circumnavigate any such laws, while still exploiting them to their benefit.

sgt101 6 days ago | root | parent |

Ok, so what protects you from these folks? What positive measure can be suggested here - that is better than the measures I suggest and subsumes them?

auggierose 6 days ago | root | parent |

The solution is not to watermark anything, because it is futile. Teach your citizens that anything that can be machine generated, will be machine generated. Where exactly is the problem here?

rlpb 6 days ago | root | parent | prev |

> How many breaches of privacy by large orgnaizations occur in the EU? When they occur, what happens?

Malicious non-compliance is still common IME. Enforcement is happening but has been focused on the very large egregious abuse so far only.

bko 8 days ago | prev | next |

This article goes into it a little bit, but an interview with Scott Aaronson goes into some detail about how watermarking works[0].

He's a theoretical computer scientist but he was recruited by OpenAI to work on AI safety. He has a very practical view on the matter and is focusing his efforts on leveraging the probabilistic nature of LLMs to provide a digital undetectable watermark. So it nudges certain words to be paired together slightly more than random and you can mathematically derive with some level of certainty whether an output or even a section of an output was generated by the LLM. It's really clever and apparently he has a working prototype in development.

Some work arounds he hasn't figured out yet is asking for an output in language X and then translating it into language Y. But those may still be eventually figured out.

I think watermarking would be a big step forward to practical AI safety and ideally this method would be adopted by all major LLMs.

That part starts around 1 hour 25 min in.

> Scott Aaronson: Exactly. In fact, we have a pseudorandom function that maps the N-gram to, let’s say, a real number from zero to one. Let’s say we call that real number ri for each possible choice i of the next token. And then let’s say that GPT has told us that the ith token should be chosen with probability pi.

https://axrp.net/episode/2023/04/11/episode-20-reform-ai-ali...

nicce 7 days ago | root | parent | next |

I don't think that provable watermarking is possible in practice. The method you mention is clever, but before it can work, you would need to know the probability of the every other source which could also be used to generate the output for the same purpose. If you can claim that the probability of that model is much higher on that model than in any other place, including humans, then watermark might give some stronger indications.

You would also need to define probability graph based on the output length. The longer the output, more certain you can be. What is the smallest amount of tokens that cannot be proved at all?

You would also need include humans. Can you define that for human? All LLMs should use the same system uniformally.

Otherwise, "watermaking" is doomed to be misused and not being reliable enough. False accusations will be take a place.

A_D_E_P_T 7 days ago | root | parent |

I agree. I'd add that not only could human-written content fail the test -- it's also the case that humans will detect the word pairing, just as they detected "delve" and various other LLM tells.

In time most forms of watermarking along those lines will seem like elements of an LLM's writing style, and will quickly be edited out by savvy users.

123yawaworht456 7 days ago | root | parent | prev | next |

>So it nudges certain words to be paired together slightly more than random and you can mathematically derive with some level of certainty whether an output or even a section of an output was generated by the LLM.

hah, every single LLM already watermarks its output by starting the second paragraph with "It is important/essential to remember that..." followed by inane gibberish, no matter what question you ask.

AlienRobot 7 days ago | root | parent |

I've always felt you'd be able to tell someone uses Reddit because they'll reply to a comment starting the sentence with "The problem is that..."

Now LLMs are trained on Reddit users.

littlestymaar 7 days ago | root | parent | prev | next |

Sounds interesting, but it also sounds like something that could very well be circumvented by using a technique similar to speculative decoding: you use the censored model like you'd use the fast llm in speculative decoding, and you check whether the other model agrees with it or not. But instead of correcting the token every time both models disagree like you'd do with speculative decoding, you just need to change it often enough to mess with the watermark detection function (maybe you'd change every other mismatched token, or maybe one every 5 tokens would be enough to reduce the signal-to-noise ratio below the detection threshold).

You wouldn't even need to have access to an unwatermarked model, the “correcting model” could even be watermaked itself as long as it's not the same watermarking function applied to both.

Or am I misunderstanding something?

jkhdigital 7 days ago | root | parent |

No you've got it right. Watermarks like this are trivial to defeat, which means they are only effective against lazy users like cheating college students and job applicants.

nprateem 7 days ago | root | parent | prev |

Or just check whether text contains the word delve and it's most likely AI generated. I fucking hate that word now.

namanyayg 8 days ago | prev | next |

"An LLM generates text one token at a time. These tokens can represent a single character, word or part of a phrase. To create a sequence of coherent text, the model predicts the next most likely token to generate. These predictions are based on the preceding words and the probability scores assigned to each potential token.

For example, with the phrase “My favorite tropical fruits are __.” The LLM might start completing the sentence with the tokens “mango,” “lychee,” “papaya,” or “durian,” and each token is given a probability score. When there’s a range of different tokens to choose from, SynthID can adjust the probability score of each predicted token, in cases where it won’t compromise the quality, accuracy and creativity of the output.

This process is repeated throughout the generated text, so a single sentence might contain ten or more adjusted probability scores, and a page could contain hundreds. The final pattern of scores for both the model’s word choices combined with the adjusted probability scores are considered the watermark. This technique can be used for as few as three sentences. And as the text increases in length, SynthID’s robustness and accuracy increases."

Better link: https://deepmind.google/technologies/synthid/

baobabKoodaa 7 days ago | root | parent | next |

I'm fascinated that this approach works at all, but that said, I don't believe watermarking text will ever be practical. Yes, you can do an academic study where you have exactly 1 version of an LLM in exactly 1 parameter configuration, and you can have an algorithm that tweaks the logits of different tokens in a way that produces a recognizable pattern. But you should note that the pattern will be recognizable only when the LLM version is locked and the parameter configuration is locked. Which they won't be in the real world. You will have a bunch of different models, and people will use them with a bunch of different parameter combinations. If your "detector" has to be able to recognize AI generated text from a variety of models and a variety of parameter combinations, it's no longer going to work. Even if you imagine someone bruteforcing all these different combos, trouble is that some of the combos will produce false positives just because you tested so many of them. Want to get rid off those false positives? Go ahead, make the pattern stronger. And now you're visibly altering the generated text to an extent where that is a quality issue.

In summary, this will not work in practice. Ever.

TeMPOraL 7 days ago | root | parent | next |

Even with temperature = 0, LLMs are still non-deterministic, as their internal, massively parallelized calculations are done with floating point arithmetic, which is order-dependent. Running the same LLM with the exact same parameters multiple times might still yield slightly different probabilities in the output, making this watermarking scheme even less robust.

jkhdigital 7 days ago | root | parent | next |

This isn't necessarily true, it just depends on the implementation. I can say that because I've published research which embeds steganographic text into the output of GPT-2 and we had to deal with this. Running everything locally was usually fine--the model was deterministic as long as you had the same initial conditions. The problems occurred when trying to run the model on different hardware.

nprateem 7 days ago | root | parent | prev | next |

That's not my experience unless LLM providers are caching results. It's frustratingly difficult to get it to output substantially different text for a given prompt. It's like internally it always follows mostly the same reasoning for step 1, then step 2 applies light fudging of the output to give the appearance of randomness, but the underlying structure is generally the same. That's why there's so much blog spam that all pretty much read the same, but while one "delves" into a topic another "dives" into it.

How long until they can write genuinely unique output without piles of additional prompting?

SirMaster 5 days ago | root | parent |

Hmm, I ask LLMs to write me stories all the time, and I only give it a couple sentences as a prompt, loosely describing the setting of the story. And If I prompt it the exact same way, the events of the story are usually very different.

emporas 7 days ago | root | parent | prev |

In practice, every programmer or a writer who gets the LLM output, does a lot of rewriting for already existing code, or already existing text. Stitching together parts of many LLM outputs is the only way to use an LLM effectively, even stitching together parts of different LLMs, which i do all the time.

Recognizing only parts of a watermark, and many watermarked parts scattered all around doesn't seem possible at all, in my mind.

They can however develop a software to sell very expensively to universities, schools etc, and it will occasionally catch a very guilty person who uses it all the time and doesn't even try to make the answer better, who always hands over the LLM answer in one piece.

At the end of the day, it will lead to so many false accusations people will stop trusting it. In chess players and tournaments false accusations of cheating happen all the time, for 15 years or more. Right now former world chess champion Kramnik has accused over 50 top chess players of cheating, including the 5 times US champion Nakamura, in the span of 2 months.

If a software like that gets applied to schools and universities, we are gonna have the fun of our lives.

bgro 8 days ago | root | parent | prev |

Couldn’t this be easily disrupted as a watermark system by simply changing the words to interfere with the relative checksum?

I suspect sentence structure is also being used or, more likely, the primary “watermark”. Similar to how you can easily identify if something is at least NOT a Yoda quote based on it having incorrect structure. Combine that with other negative patterns like the quote containing Harry Potter references instead of Star Wars, and you can start to build up a profile of trends like this statement.

By rewriting the sentence structure and altering usual wording instead of directly copying the raw output, it seems like you could defeat any current raw watermarking.

Though this hasn’t stopped Google and others in the past using bad science and stats to make unhinged entitled claims like when they added captcha problems everybody said would be “literally impossible“ for bots to solve.

What a surprise how trivial they were to automate and the data they produce can be sold for profit at the expense of mass consumer time.

scarmig 7 days ago | root | parent |

In principle, it seems like you could have semantic watermarking. For instance, suppose I want a short story. There are lots of different narrative and semantic aspects of it that each carry some number of bits of information: setting, characters, events, and those lay on a probability distribution like anything else. You just subtly shift the probability distribution of those choices, and then it's resistant to word choice, reordering, and any transformation that maintains its semantic meaning.

akomtu 7 days ago | root | parent |

Much simpler: make every sentence contain an even number of words. Then the chances of 10 sentences in a row to be all even is about 0.1%.

ruuda 7 days ago | prev | next |

Some comments here point at impossibility results, but after screening hundreds of job applications at work, it's not hard to pick out the LLM writing, even without watermark. My internal LLM detector is now so sensitive that I can tell when my confirmed-human colleagues used an LLM to rephrase something when it's longer than just one sentence. The writing style is just so different.

Maybe if you prompt it right, it can do a better job of masking itself, but people don't seem to do that.

auggierose 7 days ago | root | parent |

So, how many times did you actually get the confirmation that an LLM has/has not been used?

My guess is zero times. So, you are not describing an experiment here, you are just describing how you built up your internal bias.

kuhewa 7 days ago | root | parent | next |

Probably not entirely fair. e.g. After enough sentences it is trivially easy to identify LLM output. So you repeatedly get the opportunity to test a sentence or two, guess the provenance and then realise it is the first sentence in several paragraphs of generated output.

exe34 6 days ago | root | parent |

without external validation, you still just have a gut feeling. it doesn't matter if it's the first or thousandth.

I can tell if somebody ate cornflakes or oatmeal for breakfast just by looking at how they walk. I'm always right. you better believe me. I've seen thousands of people walk.

kuhewa 6 days ago | root | parent |

If the cornflakes had been sitting in old milk for a week, then yes, I would believe you could tell by how they walk, either hunched over with a cramping stomach or with shit pants. LLM text is quite similar.

exe34 6 days ago | root | parent |

Yes because it would be completely different if it were oatmeal.

kuhewa 6 days ago | root | parent |

If the oatmeal hadn't sat in old milk for a week, yes, it would not likely cause gi distress

exe34 6 days ago | root | parent |

and yet if it had, it would.

kuhewa 6 days ago | root | parent |

Let's make one of your breakfast foods eggs. Sometimes you won't notice from the walk if they just had normal eggs. However, when they are also rotten and were over indulged in, you can tell from the person's walk that they ate them for breakfast and due to the noxious sulfur smell emanating from the diarrhoea in the person's business casual slacks, you can tell with a high degree of confidence it wasn't the rotten milk cornflakes, but one of a very few number of sulfur-rich foods, probably eggs.

Bad human resume text and overuse of unmodified LLM output are both detectable, but they are detectable because they are bad in quite different ways.

Regarding the original resume reader's notion that they can detect LLM text with a high degree of accuracy, it is not their LLM output detection specificity I would take issue with (similarly, despite stating validation is critical, I would bet you, too, are pretty confident when you see an entire page of blogspam or marketih copy that you regard as LLM generated despite it rarely being marked as such). Rather, it is their sensitivity, as I am sure occasional use and especially slightly modified output from LLMs gets by them now and again without them knowing.

exe34 6 days ago | root | parent |

yes, it's like men who think they can always spot makeup on a woman. or the economists who predicted 19 of the last 5 market crashes. no need for external validation.

kuhewa 6 days ago | root | parent |

The makeup is a great analogu. I can bet with 99.9% that when I say a woman is wearing makeup that I'm correct (or it is tattoed on or similar). When it's obvious, it's obvious. However I don't detect makeup on women quite often.

The economists is not not as good of an analogy, almost converse of the makeup example as that is a high rate of false positives.

exe34 6 days ago | root | parent |

that's why I gave both examples - you have no evidence which camp you are in, without ground truth. if you add gut feeling to gut feeling, you don't get evidence.

kuhewa 6 days ago | root | parent |

One camp is a high specificity camp, the other is a high sensitivity camp. You definitely know what camp you are in if you are only making the argument you can detect true cases of LLM use without many false positives- I've already admitted less blatant LLM use goes undetected so I am not arguing for high sensitivity.

And we have plenty of evidence of hallmarks of LLM use, we can even replicate the LLM resume generation process if we wanted. There is plenty of useful "training data" available even if you don't have a validated set of resumes submitted for this type of role at this type of company from this demographic of applicants.

Basically what you are trying to argue is that you can't have confidence that the animals you see people walking down the street on leashes are dogs unless you ask the owners whether they are dogs are not... AND that it doesn't matter that dogs are highly distinct from other domestic pets AND that we've seen many verified dogs before in other contexts, AND have even bred different varieties of dogs on our own.

I highly doubt that you maintain that standard for inductive inference across the board in your own practice. Life would be very difficult if you refused to make inferences about novel things (with any confidence) based on generalised patterns derived from other, similar cases.

exe34 6 days ago | root | parent | next |

> if you are only making the argument you can detect true cases of LLM use without many false positives

this seems at odds with the ongoing issues at universities where professors blindly trust AI detectors that label their own work as AI generated. you'd think if it were that obvious, they would have high specificity.

with your dog example, children absolutely have to learn by checking with their parents - you show them a dog and teach them the word, they will apply it to cats and goats and you have to correct them.

you are like a child pointing at every animal and calling it a dog, but refusing to shift your position when your elders tell you no, that's a goat.

auggierose 6 days ago | root | parent | prev |

If there are alien shape shifters around which pose as dogs and look and behave just like them, then yes, your dog example is the same.

ksaj 8 days ago | prev | next |

Some of the watermarking is really obvious. If you write song lyrics in ChatGPT, watch for phrases like "come what may" and "I stand tall."

It's not just that they are (somewhat) unusual phrases, it's that ChatGPT comes up with those phrases so very often.

It's quite like how earlier versions always had a "However" in between explanations.

ksaj 3 days ago | root | parent | next |

I had to follow up: I told my partner about watermarking.

We asked ChatGPT to explain the meaning of "come what may" - a phrase it generates very often in lyrics - and it responded by needing proof that we were human.

It's definitely a watermark.

GaggiX 8 days ago | root | parent | prev | next |

ChatGPT does not have a watermark.

espadrine 7 days ago | prev | next |

The academic paper: https://www.nature.com/articles/s41586-024-08025-4

They use the last N prefix tokens, hash them (with a keyed hash), and use the random value to sample the next token by doing an 8-wise tournament, by assigning random bits to each of the top 8 preferred tokens, making pairwise comparisons, and keeping the token with a larger bit. (Yes, it seems complicated, but apparently it increases the watermarking accuracy compared to a straightforward nucleus9 sampling.)

The negative of this approach is that you need to rerun the LLM, so you must keep all versions of all LLMs that you trained, forever.

mmoskal 7 days ago | root | parent | next |

They actually run 2^30-way tournament (they derive an equivalent form that doesn't requires 2B operations). You do not need to run the LLM, it only depends on the tokenizer.

espadrine 7 days ago | root | parent |

You’re right. I understood it to require taking the top 2^30 tokens, but instead they sample 2^30 times with replacement.

Too bad they only formulate the detection positive rate empirically. I am curious what the exact probability would be mathematically.

samatman 7 days ago | prev | next |

This is information-theoretically guaranteed to make LLM output worse.

My reasoning is simple: the only way to watermark text is to inject some relatively low-entropy signal into it, which can be detected later. This has to a) work for "all" output for some values of all, and b) have a low false positive rate on the detection side. The amount of signal involved cannot be subtle, for this reason.

That signal has a subtractive effect on the predictive-output signal. The entropy of the output is fixed by the entropy of natural language, so this is a zero-sum game: the watermark signal will remove fidelity from the predictive output.

This is impossible to avoid or fix.

thornewolf 7 days ago | root | parent | next |

you are correct of we suppose we are at a global optimum. however, consider this example:

i have two hands

i have 2 hands

these sentences communicate the same thing but one could be a watermarked result. we can apply this equivalent meaning word/phrase change many times over and be confident something is watermark while having avoided any semantic shifts.

jkhdigital 7 days ago | root | parent | prev |

You're not wrong, but natural language has a lot of stylistic "noise" which can be utilized as a subliminal channel without noticeably degrading the semantic signal.

mateus1 8 days ago | prev | next |

Google is branding this in a positive light but this is just AI text DRM.

gwbas1c 7 days ago | root | parent | prev | next |

Like all things a computer can / can't do; DRM isn't inherently bad: It's how its used that's a problem.

IE, DRM can't change peoples' motivations. It's useful for things like national security secrets and trade secrets, where the people who have access to the information have very clear motivations to protect that information, and very clear consequences for violating the rules that DRM is in place to protect.

In this case, the big question of if AI watermarking will work / fail has more to do with peoples' motivations: Will the general public accept AI watermarking because it fits our motivations and the consequences we set up for AI masquerading as a real person, or AI being used for misinformation? That's a big question that I can't answer.

mateus1 7 days ago | root | parent |

This is not a “good deed for the public” done by Google, this is just a self serving tool to enforce their algorithms and digital property. There is nothing “bad” here for the public but it’s certainly not good either.

fastball 8 days ago | root | parent | prev |

I for one am glad we might have a path forward to filtering out LLM-generated sludge.

pyrale 7 days ago | root | parent |

> we

If by "we" you mean anyone else than Google and the select few other LLM provider they choose to associate with, I'm afraid you're going to be disappointed.

fastball 7 days ago | root | parent |

If there is a detectable fingerprint, we can detect it too. Probably don't even need a Bletchley Park.

fny 7 days ago | prev | next |

I think we just need to give up on this. What’s the harm? It’s not like some ground truth is fabricated.

I’m far, far more concerned about photo, video, and audio verification. We need a camera that can guarantee a recording is real.

foxglacier 7 days ago | root | parent | next |

Why do we need that for photo, video and audio? If it's about the general public believing something false, they're not going to check the watermarks of random internet content or trust anyone who says they checked it. If they really want to know, they can go to the source and if they trust that person or organization, they can also trust the content they published. If it's about use in court, we already have a system for that - the person who recorded it appears in court as a witness and promises that they didn't alter it then if it turns out they did, they can go to prison.

ziofill 7 days ago | root | parent | prev |

I've been thinking about this for a while. Digital signatures can guarantee that a piece of data is authentic, if the author wishes to sign it.

playingalong 8 days ago | prev | next |

> the team tested it on 20 million prompts given to Gemini. Half of those prompts were routed to the SynthID-Text system and got a watermarked response, while the other half got the standard Gemini response. Judging by the “thumbs up” and “thumbs down” feedback from users, the watermarked responses were just as satisfactory to users as the standard ones.

Three comments here:

1. I wonder how many of the 20M prompts got a thumbs up or down. I don't think people click that a lot. Unless the UI enforces it. I haven't used Gemini, so I might be unaware.

2. Judging a single response might be not enough to tell if watermarking is acceptable or not. For instance, imagine the watermarking is adding "However," to the start of each paragraph. In a single GPT interaction you might not notice it. Once you get 3 or 4 responses it might stand out.

3. Since when Google is happy with measuring by self declared satisfaction? Aren't they the kings of A/B testing and high volume analysis of usage behavior?

varispeed 8 days ago | root | parent |

> I don't think people click that a lot.

I sometimes do, but I almost always give wrong answer or opposite answer where possible.

froh 8 days ago | root | parent | prev |

but why? what for?

thebruce87m 7 days ago | root | parent | next |

My timesheet SAAS constantly asks for feedback, which I give 0/10 as constantly asking for feedback really annoys me.

They then contact me and ask me why, so I tell them then they say there is nothing they can do. A week later I’ll get a pop up asking for feedback and we go round the same loop again.

tokioyoyo 8 days ago | prev | next |

Correct me if I’m wrong, but wouldn’t it simply drive people to use LLMs that are not watermarking their content?

aleph_minus_one 8 days ago | root | parent | next |

I think your idea is basically right, but there are two points to consider:

- Your hypothesis only holds if the alternative LLM is also "sufficiently good". If Gemini does not stay competitive with other LLMs, Google's AI plans have a much more serious problem.

- Your hypothesis assumes that many people will be capable of detecting the watermarks (both of Gemini and other LLMs) so that they can make a conscious choice for another LLM. But the idea behind good watermarking is that it is not that easy to detect.

kranner 8 days ago | root | parent | prev | next |

According to the article, you can just have another LLM summarise Gemini's watermarked output and that will "likely" defeat the watermark detection.

scarmig 7 days ago | root | parent |

But, if all the good models can only be trained by large mega corps with close connections to the government, it's only a matter of time until that other LLM will just add its own watermark.

beepbooptheory 8 days ago | root | parent | prev | next |

Why does the user care if its watermarked? Surely there are only some use cases for this stuff where it matters. Most of the time isn't it just people having ephemeral chats where this wouldn't matter?

ajdlinux 8 days ago | root | parent | next |

Using LLMs to write your essays and reports for school or uni, in a way that could get you punished if caught, is a reasonably big use case.

beepbooptheory 7 days ago | root | parent | next |

Agreed its probably a big use case in general, but like token per token I bet its relatively small! How many big papers do you have to write a semester? Even if its four, that's nothing compared to the everyday use you will make of it.

ajdlinux 5 days ago | root | parent |

Sure, but institutions and regulators care about this issue, and at least making some attempt to address it will make them slightly happier.

highcountess 7 days ago | root | parent | prev |

I see no scenario where there won’t be an LLM that is deliberately tailored for that purpose, possibly even built by an “intel” agency for the very purpose of having blackmail over someone that may become useful later in their career.

tokioyoyo 7 days ago | root | parent | prev |

AIs and LLMs have an extremely uphill PR battle to fight right now. Anything that is deemed AI generated is assumed to be borderline trash (lots of exceptions, but you get the point). So, I can see that if someone uses LLM to generate text, they don’t want it to be marked as “low effort content”.

beepbooptheory 7 days ago | root | parent |

There are definitely exceptions, and that there are maybe proves that it is less Anti-AI prejudice at play and more just reacting to things that are indeed trashy. It just so happens a lot of it today is from AI I think (for, I hope, obvious reasons).

Just to say, maybe give it a little time, but a watermark like this is not going be thing that decides someone's reaction in the near future, just what it says. (I am just betting here).

But its going to be an uphill battle either way if you are really getting the model to write everything, I do not envy that kind of project.

onion2k 8 days ago | root | parent | prev | next |

People use Google Search despite it being littered with adverts and tracking. Maybe Google are counting on either being better than the competition despite watermarking, or simply accepting that people who don't care are enough of a market that it's still worth adding.

nicce 8 days ago | root | parent | prev | next |

Correct me if I'm wrong, but watermarking is only possible, if the model has a limited set of input you can provide (affects for the output) and a limited set of output it produces, and it should be completely deterministic. And you should pre-calculate all possible combinations.

And this should be also the case for every possible LLMs; then you can compare which LLMs could produce which outputs based on what inputs. Then there is some certainty that this output is produced by this LLM and this another LLM might produce it as well with these inputs.

So... impossible?

glenstein 7 days ago | root | parent | prev |

People made this same argument about DRM escalations, about increasing privacy violations in the browser, and about Google's donations to support climate change misinformation. Even about Facebook interface redesigns. Every variation of "people will be driven to do X" I've ever heard assumes some coherence and unity of collective purpose that rarely matches the reality of how people behave.

There are counter examples, e.g. Unity. But catching that lightning in a bottle is rare and merits special explanation rather than being assumed.

tokioyoyo 7 days ago | root | parent |

Using LLMs in exams and homeworks has a different driver. Getting caught results in punishment, so using alternative would be better. None of the aforementioned examples have a “stick” aspect to it when you stick to Google.

harimau777 7 days ago | prev | next |

This strikes me as potentially a bad thing for regular people. For example, corporations call still use AI filtering to force job seekers to jump through hoops but job seekers won't be able to use AI to generate the cover letters and resumes that those hoops demand.

sharpshadow 6 days ago | prev | next |

To archive the watermark they store every output which they create and let partners check against it. That’s how I understand the article.

Then they also store everything which the partners upload to check if it’s created by them.

If other AI players also would store everything they create and make it available in a similar way there could be indeed some working watermark.

If one would use a private run AI to change the public run AI generated content to alter it there still would be a percentage similarity recognisable to hint that it might come from one of the public AIs.

Timestamps would become quite relevant since much content would start to repeat itself at some point and the answers generated might be similar.

matteoraso 6 days ago | prev | next |

By design, a watermark would make it easy to create a discriminator that distinguishes between LLM content and human content. In that case, just make a discriminator yourself and use regex to find and remove any of the watermarks.

tomxor 7 days ago | prev | next |

> Such modifications introduce a statistical signature into the generated text,

Great so now people have to be worried about being too statistically similar to an arbitrary "watermark".

js8 8 days ago | prev | next |

I think people are already doing that. I frequently hear people watermarking their speeches with phrases like "are we aligned on this?", or "let's circle back" and similar.

lcnPylGDnU4H9OF 7 days ago | root | parent |

I can’t tell if this is satire but that’s just corp-speak. I imagine those people also occasionally suggest “touching base” and “taking this offline”.

The phrases usually mean something useful, if one knows the meaning, but it is amusing how much people seem to stick with the same ones, even across companies.

js8 7 days ago | root | parent |

I am not sure whether it was satire. I personally don't like corp speak - it feels like people talking like that are not humans. I am not sure I would welcome our AI overlords speaking like this, either.

But I find the idea that people will subconsciously start copying AI speech patterns (perhaps as a signal of submission) amusing. I think it's gonna throw a wrench into the idea.

IMHO LLMs either should help us communicate more clearly and succinctly, or we can use them as tools for creativity ("rephrase this in 18th century English"). Watermarking speech sabotages both of these use cases.

rany_ 8 days ago | prev | next |

I really want to be able to try Gemini without the AI watermark. IIRC they've used SynthID from the start and it makes me wonder if it's the source of all of Gemini's issues.

Obviously Google claims that it doesn't cause any issues but I'd think that OpenAI and other competitors would have something similar to SynthID if it didn't impact performance.

throwaway314155 7 days ago | root | parent |

> IIRC they've used SynthID from the start

Is that not at odds with what's presented in the article here?

rany_ 6 days ago | root | parent |

It's not. They've mentioned that they had SynthID integrated before (I'm almost certain it was from the very start). What changed is that tools to detect that something is from Google's LLM is public now.

> Google has already integrated this new watermarking system into its Gemini chatbot, the company announced today.

Key word: already

> It has also open-sourced the tool and made it available to developers and businesses, allowing them to use the tool to determine whether text outputs have come from their own large language models (LLMs), the AI systems that power chatbots.

That's basically the change.

lowbloodsugar 7 days ago | prev | next |

I want AI to use just the right word when it’s writing for me. If it’s going to nerf itself to not choose the perfect word so it can be watermarked, then why would I use that product? I’ll go somewhere else. And if it does use just the right word, then how is that different from a great human writer?

Nasrudith 7 days ago | root | parent |

There is the 'loser's litigation' method of getting all of your non-watermarked competitors banned. Usually involving some combination of magical rights removing brain-hacks like national security or 'the children'.

nprateem 7 days ago | prev | next |

Google are obviously pushing this as a way to root out AI blog spam.

If only they can get other providers to use it because of 'safety' or something they won't have to change their indexer much. Otherwise page rank is dead due to the ease of creating content farms.

riffraff 7 days ago | root | parent |

Not just them, openai is doing the same for the same reason: they need to avoid an Habsburg ai issue when the next half of their training material will be generated by themselves

ajwin 7 days ago | prev | next |

Do LLM's always pick the most probable next word? I would have thought this would lead to having the same output for every input? How does this deal with the randomness that you get from prompting the same thing over and over?

8note 7 days ago | root | parent | next |

There is at least a parameter called Temperature which decides how much randomness to include in the output.

It doesn't get you perfectly deterministic output to set it to 0 though, per https://medium.com/google-cloud/is-a-zero-temperature-determ... as you don't have perfect control over what approximations are being made on your floating point operations

mmoskal 7 days ago | root | parent |

The most typical reason argmax (temp 0) is non-deterministic is that your request is running batched with other people requests. The number and size of these affects the matrix sizes and thus tiling decisions. Then you get different floating point order and thus different results.

Nvidia gives some guarantees about deterministic results of their kernels but that only applies when you have exact same input data and this is not the case when in-flight batching.

janalsncm 7 days ago | root | parent | prev |

It depends. If we use beam search we pick the most likely sequence of tokens rather than the most likely token at each point in time. This process is deterministic though.

We can also sample from the distribution, which introduces randomness. Basically, if word1 should be chosen 75% of the time and word2 25% of the time, it will do that.

The randomness you’re seeing can also be due to implementation details.

https://community.openai.com/t/a-question-on-determinism/818...

playingalong 8 days ago | prev | next |

> It has also open-sourced the tool and made it available to developers and businesses, allowing them to use the tool to determine whether text outputs have come from their own large language models (LLMs), the AI systems that power chatbots. However, only Google and those developers currently have access to the detector that checks for the watermark.

These two sentences next to each other don't make much sense. Or are misleading.

Yeah. I know. Only the client is open source and it calls home.

villmann 7 days ago | prev | next |

To what degree will AI-generated text and watermarking influence how human language evolves... I bet "delve" will become more frequent in the spoken language :)

tiffanyh 7 days ago | prev | next |

OT: The publication (Spectrum by IEEE) has some really good content.

It's starting to become a common destination for when I want to read about interesting things.

FilipSivak 8 days ago | prev | next |

How is this supposed to work? By inserting special unicode characters?

How can you watermark text?

a2128 8 days ago | root | parent | next |

I haven't read how Google is doing it, but one way it could be done is to nudge which tokens get sampled. For example, every other token could have an odd numbered id (where each token is assigned an id from 0 to 32000 or however many it has). Then in order to detect the watermark you just tokenize the text and see if the pattern is there. A problem with this approach is that it harms the accuracy and coherency, for example if you ask "What is 2+2", and the token "4" is token #102, and it has to pick an odd-numbered token, then it may respond with a wrong answer or yap on strangely due to its limited selection of tokens (like "The accurate answer to your mathematical query is the number Four")

sumtechguy 8 days ago | root | parent | prev | next |

You do not even need extra characters (although they help). You can use spaces, missing punctuation, upper/lower case in particular cases, conjunction usage and not using it, word substitution, common misspellings, transposed letters, etc. How many extra spaces/tabs can you add to the end of a paragraph? At the beginning? Between sentences? Inside them? Then you have an AI agent design it and then train another one to detect it.

das_keyboard 8 days ago | root | parent | prev | next |

> SynthID-Text works by discreetly interfering in the generation process: It alters some of the words that a chatbot outputs to the user in a way that’s invisible to humans but clear to a SynthID detector. “Such modifications introduce a statistical signature into the generated text,” [...] “During the watermark detection phase, the signature can be measured to determine whether the text was indeed generated by the watermarked LLM.”

voidUpdate 7 days ago | root | parent | prev | next |

As stated in the article, it alters the probabilities that the network produces in a predictable way so that a different (but still correct-sounding) word is picked. It subtly alters the wording from what it would have output normally in such a way that you can detect it, while still sounding correct to the user