-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TigerLine 2011? #29
Comments
It seems this is a problem that's been active since at least the 2010 data came out: https://groups.google.com/d/msg/geocommons-geocode/PH6g20m7kaU/Z_W065lbyjkJ It looks like the data files in the 2011 distribution are kept in the same directories, instead of spread out over different states and counties. The principal directories are: EDGES These just contain the zip files directly. The script tiger2009_import seems to do this:
Would anyone be offended if I rewrote this using Ruby for Tiger2011? |
I get why it's written as a single long pipe command, it's an elegant solution to the problem of the size of the data. I have the following script written in Ruby. https://gist.github.com/1631758 This took roughly two hours on my quad core Mac to create the loading.sql file. It was roughly 99Gb. Unfortunately it seems to get stuck on the "cat loading.sql | sqlite3 #{database}" part. I gave it 16 hours, after which it was stuck using 1% of the CPU. Very strange. Probably need to rewrite it to use a single long pipe. |
I could be wrong but the state/county organization from TIGER/Line 2009 might be used in further steps after the import step. |
I just double checked and I'm not seeing any place where it's used. It looks like it simply imports the shp and dbf files into the database without regard to the folder names / placement. Of course, this shell script is pretty dense stuff for me. Here's my attempt to rewrite the above script while maintaining the whole pipe mechanism. https://gist.github.com/1694885 I haven't ran it since I just decided to use a commercial product for geocoding. But I hope we can get to the bottom of this and update geocoder to the new database. I'm going to work on it this week-end. |
Good call. Just out of curiosity, what commercial geocoding software (or service) are you using? I am working on porting TIGER/Line2011 onto HDFS instead of a database. Will post update once there are progress. |
Well, the data I was working on was 90% just city/state/zip. So I used for those: Then i used the geocoder gem with Bing maps for the last 10%: This is not ideal but I think I ended up with pretty high quality results. I'm hoping to get this geocoder database fixed, it doesn't have any usage limits and it's not locked down by any corporation or government. |
I can't believe it didn't occur to me but all you need to do is use the tiger_import script. Import for 2011 goes like this: First follow the Prerequisites section of the Geocoder man page (https://github.com/geocommons/geocoder) but skip "Additionally, you will need a custom build of the ‘sqlite3-ruby’ gem". It's not needed anymore. Next build the geocoder gem:
On Mac OS X it will fail at "make install" with "ld: symbol(s) not found for architecture x86_64". Here's the fix:
After you have successfully built geocoder::us please do the next from geocoder root.
Now open "build/tiger_import" in the text editor of your choice and change:
Now we can finally do the import:
It took my Amazon EC2 extra-large instance about 8 hours to do the import. I'm going to put up a torrent of the finished sqlite database, as well as upload it on rapidshare or something. I'll post the links here. Also, I'm going to fork the codebase and update the docs. This is one of the coolest libraries out there. I hope we can come together as a community and keep this thing working. |
I've uploaded a torrent of the full data here: Backup here: |
Can someone just upload their sqlite db file with 2011 loaded so that we can just use that? Are there problems with this approach? |
hekaldama: I did, it's in my last post. I uploaded it as a Torrent file. Let me know how that works out. |
Trying to download now. I am not sure if my firewall is blocking me or not, but it currently isn't downloading... |
I used this method on the TIGER2012 data. I was able to import and pass the tests. However, there are several lines like this in the log: |
Here you go guys: https://www.dropbox.com/s/7so3ivq2npxcndy/geocoder_us_tigerline_2011.7z |
Anyone uploaded a 2012 sqlite built database? This 2011 7z file is throwing an error trying to decompress :/ |
Here is the 2014 raw sqlite db. |
Hi!
I was merrily going about generating the geocoder.db file and I happened to download Tiger/Line 2011:
ftp://ftp2.census.gov/geo/tiger/TIGER2011/
It built successfully and I was able to install the gem. I went on to try to generate the data file as the README says and:
development:jjeffus@~/dev/geocoder[master]: build/tiger_import ~/geocoder.db /Volumes/Blimpy/TigerLine/
ls: /Volumes/Blimpy/TigerLine////tl_*_edges.zip: No such file or directory
Seeing a tiger2009_import I figured maybe things had changed and the readme hadn't been updated. So:
development:jjeffus@~/dev/geocoder[master]: build/tiger2009_import ~/geocoder.db /Volumes/Blimpy/TigerLine/
ls: /Volumes/Blimpy/TigerLine//[0-9]*: No such file or directory
I'm guessing that the directory structure has changed again? The script definitely is expecting a very different structure. I'm poking around trying to figure out how to change the script. But obviously someone has had this problem before. So maybe it's a futile cause? Has it changed so significantly that it would require a complete rewrite?
The text was updated successfully, but these errors were encountered: