• 0 Posts
  • 14 Comments
Joined 5 months ago
cake
Cake day: January 25th, 2024

help-circle





  • For the OCR, have you tried tesseract? For printed documents it can take image input and generate a pdf with selectable text. I don’t OCR much but it has been useful when I tried a few times.

    You might be able to have a script that takes the scanner input into tesseract and output a pdf. It only works on a single image per run so I had to make script to run it on whole pdf by separating it and stitching it back together.



  • Someone already talked about the XY problem, so I’ll say this.

    Why sound notification instead of notification content? If your notification program (dunst in my case) have pattern matching or calling scripts based on patterns and the script has access to which app, notification title, contents etc. then it’s just about calling something in your bash script.

    And any time you wanna add that functionality to something else, add one more line with a different pattern or add a condition in your script. Comparing text is lot more reliable than audio.

    Of course your use case could be completely different, so maybe give some examples of use case so people can give you different ways to solve that instead of just the one you’re thinking of.


  • Yeah sure, I’ll compile it in my OS. For any other OS, either I’m not knowledgeable about the tools available, and many of them that I am not going to spend money to acquire. If providing the binary a developer compiles for themselves would solve it, we’d not have that problem at all.

    I specifically hate when program or libraries are only in compiled form, and then I get an error messages talking about an absolute path it has with some usernames I’ve never seen before, and no way to correct it as there’s no code. Turns out when people pass compiled versions to the OS they don’t use themselves they don’t encounter the errors and think it works fine.





  • Not for handwritten text, but for printed fonts, getting OCR is as easy as just making a box in screen with current technology. So I don’t think we need AI things for that.

    Personally I use tesseract. I have a simple bash script that when run let’s me select a rectangle in screen, save that image and run OCR in a temp folder and copy that text to clipboard. Done.

    Edit: for extra flavor you can also use notify-send to send that text over a notification so you know what the OCR produced without having to paste it.