Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* google drive crawler * updated gdrive crawler config * Refactored GdriveCrawler to use date filtering and removed local storage for processed files, and handling PDF files * Refactor: Switch to using index_file() for direct uploads, sanitize filenames * added numpy * Refactor gdrive_crawler.py: use slugify, logging.info, adjust date comparison, and rename byte_stream * standardize date handling * changed the file to earlier versions * changed crawing to crawling * removed redundant checks on dates and clubbed download() and export() into one * resolving commit issues * minor fixes * small mypy fix * updated Docker load of credentials.json * added typing annotations * added openpyxl * same run.sh as in main branch --------- Co-authored-by: Abhilasha Lodha <[email protected]> Co-authored-by: Ofer Mendelevitch <[email protected]>
- Loading branch information