Cryptography nerd

  • 0 Posts
  • 285 Comments
Joined 11 months ago
cake
Cake day: August 16th, 2023

help-circle

  • Humans learn a lot through repetition, no reason to believe that LLMs wouldn’t benefit from reinforcement of higher quality information. Especially because seeing the same information in different contexts helps mapping the links between the different contexts and helps dispel incorrect assumptions. But like I said, the only viable method they have for this kind of emphasis at scale is incidental replication of more popular works in its samples. And when something is duplicated too much it overfits instead.

    They need to fundamentally change big parts of how learning happens and how the algorithm learns to fix this conflict. In particular it will need a lot more “introspective” training stages to refine what it has learned, and pretty much nobody does anything even slightly similar on large models because they don’t know how, and it would be insanely expensive anyway.


  • Yes, but should big companies with business models designed to be exploitative be allowed to act hypocritically?

    My problem isn’t with ML as such, or with learning over such large sets of works, etc, but these companies are designing their services specifically to push the people who’s works they rely on out of work.

    The irony of overfitting is that both having numerous copies of common works is a problem AND removing the duplicates would be a problem. They need an understanding of what’s representative for language, etc, but the training algorithms can’t learn that on their own and it’s not feasible go have humans teach it that and also the training algorithm can’t effectively detect duplicates and “tune down” their influence to stop replicating them exactly. Also, trying to do that latter thing algorithmically will ALSO break things as it would break its understanding of stuff like standard legalese and boilerplate language, etc.

    The current generation of generative ML doesn’t do what it says on the box, AND the companies running them deserve to get screwed over.

    And yes I understand the risk of screwing up fair use, which is why my suggestion is not to hinder learning, but to require the companies to track copyright status of samples and inform ends users of licensing status when the system detects a sample is substantially replicated in the output. This will not hurt anybody training on public domain or fairly licensed works, nor hurt anybody who tracks authorship when crawling for samples, and will also not hurt anybody who has designed their ML system to be sufficiently transformative that it never replicates copyrighted samples. It just hurts exploitative companies.




  • Wine/Proton on Linux occasionally beats Windows on the same hardware in gaming, because there’s inefficiencies in the original environment which isn’t getting replicated unnecessarily.

    It’s not quite the same with CPU instruction translation, but the main efficiency gain from ARM is being designed to idle everything it can idle while this hasn’t been a design goal of x86 for ages. A substantial factor to efficiency is figuring out what you don’t have to do, and ARM is better suited for that.






  • Neither of these mention networks, only protocols/schemes, which are concepts. Cryptography exists outside networks, and outside computer science (even if that is where it finds the most use).

    This is ridiculous rules lawyering and isn’t even done well. Such schemes inherently assume multiple communicating parties. Sure you might not need to have a network but you still have to have distinct devices and a communication link of some sort (because if you have a direct trusted channel you don’t need cryptography)

    You’re also wrong about your interpretation.

    Here’s how to read it:

    At point A both parties create their long term identity keys.

    At point B they initiate a connection, and create session encryption keys with a key exchange algorithm (first half of PFS)

    At point C they exchange information over the encrypted channel.

    At point D the session keys are automatically deleted (second half of PFS)

    At point E the long term key of one party is leaked. The contents from B and C can not be recovered because the session key is independent of the long term key and now deleted. This is forward secrecy. The adversary can’t compromise it after the fact without breaking the whole algorithm, they have to attack the clients as the session is ongoing.

    This is motivated for example by how SSL3.0 usually was used with a single fixed RSA keypair per server, letting user clients generate and submit session encryption keys - allowing a total break of all communications with the server of that key is comprised. Long term DH secrets were also often later used when they should be single use. Then we moved on to ECDH where generating new session secrets is fast and everybody adopted real PFS.

    Yes compromising the key means you often get stuff like the database too, etc. Not the point! If you keep deleting sensitive data locally when you should then PFS guarantees it’s actually gone, NSA can’t store the traffic in their big data warehouse and hope to steal the key later to decrypt what you thought you deleted. It’s actually gone.

    And both of the above definitions you quoted means the same as the above.

    In any case, both of these scenarios create an attack vector through which an adversary can get all of your old messages, which, whether you believe violates PFS by your chosen definition or not, does defeat its purpose (perhaps you prefer this phrasing to “break” or “breach”).

    Playing loose with definitions is how half of all broken cryptographic schemes ended up insecure and broken. Being precise with attack definitions allows for better analysis and better defenses.

    Like how better analysis of common attacks on long running chats with PFS lead to “self healing” properties being developed to counter point-in-time leaks of session keys by repeatedly performing key exchanges, better protecting long term keys by for example making sure software like Signal make use of the OS provided hardware backed keystore for it, etc. All of this is modeled carefully and described with precise terms.

    Edit: given modern sandbox techniques in phones, most malware and exploits doesn’t survive a reboot. If malware can compromise your phone at a specific time but can’t break the TPM then once you reboot and your app rekeys then the adversary no longer have access, and this can be demonstrated with mathematical proofs. That’s self healing PFS.

    Anyone can start a forum.

    Fair point, but my cryptography forum (reddit.com/r/crypto) has regulars that include people writing the TLS specifications and other well known experts. They’re hanging around because the forum is high quality, and I’m able to keep quality high because I can tell who’s talking bullshit and who knows their stuff.






  • I run a cryptography forum, I know the exact definition of these terms. Message logs in plaintext is very distinct from forward secrecy. What forward secrecy means in particular is that captured network traffic can’t be decrypted later even if you at a later point can steal the user’s keys (because the session used session keys that were later deleted). Retrieving local logs with no means of verifying authenticity is nothing more than a classical security breach.

    You can transfer messages as a part of an account transfer on Signal (at least on Android). This deactivates the app on the old device (so you can’t do it silently to somebody’s device)