Or does a self hosted lemmy instance federate all comments from all threads once it knows about a server?
Only in subscribed communities, but yes.
On the other hand, I don’t really trust Reddit with my upvotes/downvotes all that much more than random users, so I already refrain from voting on content I wouldn’t want to be associated with…
Of course, Reddit can still see what posts I view, while that isn’t the case for Lemmy (at least since I self-host an instance).
Debian is the classic server choice. If you don’t have any server administration experience, I’d consider it just for that reason: there should be a ton of resources available. If you want something else, any RPM-based distro (like Fedora Server, CentOS Stream, Rocky Linux, or even RHEL) could be another option, with Rocky Linux probably being the best choice out of those.
Alternatively, I’d consider NixOS or Alpine. NixOS is what I use on most of my servers, however both have attributes that might make them worse for a beginner. NixOS uses a custom programming language to configure the operating system, while Alpine is much more minimal than most other server distributions. On the off chance that you have experience with a functional language like Haskell, though, NixOS might be the best choice, since it having a unified configuration for the whole system makes it very convenient for hosting usecases.
I’d also like to note that I run both a single-user Mastodon and Lemmy instance, and find them both fairly easy to manage. There’s also GoToSocial, which is specifically designed to be easy to deploy.