Yep, Longest Common Subsequence is usually greedy and that’s the earliest set of lines that satisfies the search. Happens when you just treat a file as lines and only match those.
You can get better results with more syntax or content awareness. Chunk into paragraphs or code blocks or functions, then sentences or statement lists, then lines, then words, etc. I think Beyond Compare can do this.
8,000+ lines in a single file??? I’m going to be sick
Oh that’s not uncommon in the industry. Especially when dealing with legacy code.
Personal best was 40k lines in a file called
misc.c
containing all the global functions that don’t fit anywhere else.Runner up was the one where each developer dumped their miscellaneous functions in their own files, so they don’t have to deal with merge conflicts. Which means we had x1.c, x2.c, x3.c … etc.
Oh trust me, I know. Personal best is 20k lines in a Java file that served as the main control flow of the entire software. Just because it’s common doesn’t make me any less disgusted 😂
Thankfully now I’m the asshole senior who gets to prevent this kind of stuff from happening in the first place. But like you said, that doesn’t help with legacy applications lol.
Best I can offer is a combined UI and logic class with 12,500 lines currently. It started out with less than 3,000 lines in the year 2000 (using the brand new Java 1.3), grew to 14,000 over time and survived our recent project-wide one-year cleanup project with only minor losses of code lines.
I was kinda hoping you were gonna finish that first sentence with “in a java applet”. Cause that would’ve been awesome.
I work for one of the mega corporations as a decently high level software engineer. My team’s job is to maintain legacy code. This is my life. 😞
Ah, a fellow janitorial staff. Some of these shit have been there so long they’ve seeped through the walls. There’s no way to get rid of them, short of demolishing the whole building.
You should see Firefox source code, there are many files like that. Honestly it’s better than having 100,000 files which is what would happen with the size of Firefox.
As someone who professionally works in a project with many, many thousands of files (I don’t know the exact number right now, but we’re coming close to 10 million lines of code) and many of them having thousands of lines (see my other comment): No, longer files is not better than more files.
It depends, obviously if stuff is unrelated than they should be in separate files, but having in one folder 1000 files containing each function I think that would be very exhausting to search through to understand the code.
That’s not even that much. I’ve seen longer!
…that’s what she said
Do you like phabricator?
Its okay, I’ve only used it for contributing to firefox so I’m not that familiar.