Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot index / search one file #30

Open
GoogleCodeExporter opened this issue Mar 30, 2015 · 8 comments
Open

Cannot index / search one file #30

GoogleCodeExporter opened this issue Mar 30, 2015 · 8 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Index the attached file with cindex
2. Search for a pattern inside it
3. No hits

What is the expected output? What do you see instead?

repro$
repro$ cindex -reset
repro$ cindex badfile
2013/03/13 18:51:52 index /tmp/repro/badfile
2013/03/13 18:51:52 flush index
2013/03/13 18:51:52 merge 0 files + mem
2013/03/13 18:51:52 0 data bytes, 92 index bytes
2013/03/13 18:51:52 done
repro$ cindex -list
/tmp/repro/badfile
repro$
repro$
repro$
repro$ grep main badfile
libc.so.6        __libc_start_main
torch             main
torch              realmain(int, char**)
libglib-2.0....          g_main_context_iteration
libglib-2.0....           g_main_context_prepare
libglib-2.0....            g_main_context_dispatch
#85 0x00000032f5c38f0e in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#87 0x00000032f5c3ca3a in g_main_context_iteration () from 
/lib64/libglib-2.0.so.0
#93 0x000000000040e74d in realmain(int, char**) ()
#94 0x000000000040e933 in main ()
repro$
repro$ csearch main  <= no results here !!
repro$ 
repro$ grep threads badfile 
============ All threads ==========
============ All threads ==========
repro$
repro$ csearch threads <= no results here !!
repro$ 

I cannot find (with csearch) text that is in a file I have indexed (cindex)

What version of the product are you using? On what operating system?

I'm using the Linux binaries that are available on the Download page.
I tried to compile go / codesearch but couldn't make it work (my go
install might be funky).

Please provide any additional information below.

It looks like the problem happens at indexing time.

Original issue reported on code.google.com by [email protected] on 14 Mar 2013 at 1:57

Attachments:

@GoogleCodeExporter
Copy link
Author

Also, I've been using codesearch as part of a webapp at work that does forensic 
analysis of crashes (by letting us search through backtraces), and it's amazing 
:)

I'm kinda stuck right now because I cannot index some files and I'm thinking 
about using a different indexer / search system, but really codesearch is all I 
need so if someone can figure out what the problem is that would be awesome.

Thanks !!

Original comment by [email protected] on 14 Mar 2013 at 2:00

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Also, I have one line that is crazy long: 2245 characters. Maybe the problem is 
that the indexer reads line by line and has some hardcoded limit on the number 
of chars in a single line ?

Original comment by [email protected] on 14 Mar 2013 at 2:04

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Try indexing with -verbose and -logskip flags to see if the file is getting 
skipped.

The arbitrary limits are in the source so you can always hand edit and tweak 
them. I have a version at

http://github.com/junkblocker/codesearch

which I did to specifically add such options.

Original comment by [email protected] on 14 Mar 2013 at 4:07

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Thanks for the tip. Indeed I've removed those long lines and now everything 
works fine. I've seen that your copy of the code has that -maxlinelen that 
should be what I need. Now I have to understand how to build a go program ...

Original comment by [email protected] on 14 Mar 2013 at 5:57

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Alright, I figured it out, thanks.

repro$ awk '{print length($0)}' badfile | sort -n | tail
972
1001
1043
1071
1456
1529
1724
1792
2259
2328

and in index/write.go there's a 
    maxLineLen      = 2000

Original comment by [email protected] on 14 Mar 2013 at 6:36

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Feel free to close the issue whoever can.

Original comment by [email protected] on 14 Mar 2013 at 6:39

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

I'm going to leave this open until I can get something like -logskip into
the mainline codesearch branch.

Original comment by [email protected] on 14 Mar 2013 at 2:08

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

I don't know how far you guys should go with that, but having those 2 options 
to set the maxLineLen and maxFileSize on the command line would also help.

The default behavior could be to print a message like that (with a better 
phrasing probably / different options names) when a file got skipped.

=> /tmp/foo wasn't indexed (maxLine too long) / try to reindex with cindex 
-maxLineLen 3000

=> /tmp/foo wasn't indexed (file too big) / try to reindex with cindex 
-maxFileSize 1M

Original comment by [email protected] on 14 Mar 2013 at 4:26

  • Added labels: ****
  • Removed labels: ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant