Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issues with ANSI files. #243

Open
AlexFielder opened this issue Mar 1, 2017 · 1 comment
Open

Encoding issues with ANSI files. #243

AlexFielder opened this issue Mar 1, 2017 · 1 comment
Labels

Comments

@AlexFielder
Copy link

Hi all,

We're now using Hound to index a fairly large codebase >1.5GB and I just noticed that there are 1,000s of files being ignored if they have ANSI encoding instead of UTF-8. They appear in the excluded_files.json file

Is this something that can be fixed within Hound or will I likely have to figure out how to change the encoding of all these files?

Thanks,

Alex.

@AlexFielder
Copy link
Author

For anyone else having the same issue, here is the (quickest) fix I could come up with:

  1. Install Notepad++ (NP++) 32 bit (important because there's currently no plugin manager for the default x64 installation from chocolatey.org!)
  2. Install the Python Script plugin for NP++
  3. Create new script: 2017-03-22 11_08_11-c__projects_go_hound no cm_error log - notepad administrator
  4. Inside of the new script add the following:
import os;
import sys;
filePathSrc="C:\\CM"
for root, dirs, files in os.walk(filePathSrc):
    for fn in files:
      if fn[-3:] == '.cm' :
        notepad.open(root + "\\" + fn)
        console.write(root + "\\" + fn + "\r\n")
        notepad.runMenuCommand("Encoding", "Convert to UTF-8")
        notepad.save()
        notepad.close()

(You can set up the script to only work on specific file extensions like I have or change it to suit.)
5. Save and close the script file, then NP++ itself. (This is necessary to get Python Script to pick up the newly saved script file)
6. Reopen NP++ and select your script file (whatever it's called):
2017-03-22 11_11_56-c__users_alex fielder_dropbox_scripts_convert_to_utf-8 py - notepad administr
7. Go and grab a coffee, because if (as I did) you have 20,000+ files to check it will take some time. Whilst the script is running the system will fight with NP++ for focus as it cycles through the folder structure you pointed it at.

FWIW: I am using an NVMe Samsung SSD and this script barely taxes it; if you have a regular spinny HDD then you might need to be selective where you point it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants