Skip to content

This tool is designed for students, researchers, data scientists or anyone who would like to have access to SICAR files.

Notifications You must be signed in to change notification settings

genesis-dataculture/SICAR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SICAR

This tool is designed for students, researchers, data scientists, or anyone who would like to have access to SICAR files.

Badges

Open In Collab Open in Remote - Containers made-with-python Code style: black

Features

  • Get cities-codes by state code
  • Download city (Shapefile or csv) by code
  • Download lists of cities (Shapefile) by code
  • Download all cities (Shapefile) in a state by code
  • Download the entire country (Shapefile)
  • Tesseract, and PaddleOCR drivers to automatically detect captcha
  • Manual driver to automate the download process

Installation

Install SICAR with pip

pip install git+https://github.com/urbanogilson/SICAR

Prerequisite:

Install Google Tesseract OCR (additional info how to install the engine on Linux, Mac OSX and Windows).

Install PaddleOCR (additional info how to install the engine on Linux, Mac OSX and Windows).

If you don't want to install dependencies on your computer or don't know how to install them, we strongly recommend Google Colab.

Usage/Examples

from SICAR import Sicar
import pprint

# Create Sicar instance
car = Sicar(email = "[email protected]")

# Get cities codes in Roraima state
cities_codes = car.get_cities_codes(state='RR')

pprint.pprint(cities_codes)
# {'Alto Alegre': '1400050',
#  'Amajari': '1400027',
#  'Boa Vista': '1400100',
#  'Bonfim': '1400159',
#  'Cantá': '1400175',
#  'Caracaraí': '1400209',
#  'Caroebe': '1400233',
#  'Iracema': '1400282',
#  'Mucajaí': '1400308',
#  'Normandia': '1400407',
#  'Pacaraima': '1400456',
#  'Rorainópolis': '1400472',
#  'São João da Baliza': '1400506',
#  'São Luiz': '1400605',
#  'Uiramutã': '1400704'}

# Download 'Alto Alegre': '1400050'
car.download_city_code('1400050', folder='Roraima')

# Download in csv format
from SICAR import OutputFormat
car.download_city_code('1400050', output_format = OutputFormat.CSV, folder='Roraima')

# Download specific cities
cities_codes = {
    'São Gabriel da Cachoeira': '1303809',
    'São Paulo de Olivença': '1303908'
}

car.download_cities(cities_codes=cities_codes, folder='cities')

# Download all cities in Roraima state
car.download_state(state='RR', folder='RR')

OCR drivers

Optical character recognition (OCR) drivers are used to recognize characters in captcha.

We currently have two options for automating the download process.

Tesseract OCR (Default)

from SICAR import Sicar
from SICAR.drivers import Tesseract

# Create Sicar instance using Tesseract OCR
car = Sicar(email="[email protected]", driver=Tesseract)

# Download a city
car.download_cities(cities_codes={'Belo Horizonte': '3106200'}, folder='SICAR/cities')
from SICAR import Sicar
from SICAR.drivers import Paddle

# Create Sicar instance using PaddleOCR
car = Sicar(email="[email protected]", driver=Paddle)

# Download a city
car.download_cities(cities_codes={'Balneário Camboriú': '4202008'}, folder='SICAR/cities')

Run with Google Colab

Using Google Colab, you don't need to install the dependencies on your computer and you can save files directly to your Google Drive.

Open In Collab

Run with Docker

Update the entry point file ./examples/docker.py to download data based on your needs.

Generate docker image

# using the docker build script
./docker-build.sh

Run to download all data defined in the ./examples/docker.py entry point to an external directory.

Make an external directory to store the downloaded data, /my/local/data/dir, and use a volume parameter in the run command to point to it.

# run the docker image in detached mode
docker run -d --rm -v /my/local/data/dir:/data softwarevale/download-sicar:v0.1

Acknowledgements

Roadmap

  • Add support to download csv files

Contributing

The development environment with all necessary packages is available using Visual Studio Code Dev Containers.

Open in Remote - Containers

Contributions are always welcome!

Feedback

If you have any feedback, please reach me at [email protected]

License

MIT

About

This tool is designed for students, researchers, data scientists or anyone who would like to have access to SICAR files.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.5%
  • Jupyter Notebook 4.6%
  • Dockerfile 0.9%