In development & trying to think of a better name.
- Add price calculator to cmdline interface.
- Add a flask frontend.
- Pandas dfs don't play nice as shared objects. Replace them with SQLLite or SQLAlchemy connector.
- Dockerize for web deployment
bing_website_finder
(aka bwf
) finds websites for an arbitrarily long list of company names.
It does this using an asyncio
event loop, and is therefore both exceptionally fast & requires >=
python3.7
Update: bwf
will now find email addresses & attempt to profile company employees.
It will do this by default after finding a website, or you can feed a list of websites in like:
~$ bwf -o emails -h /path/to/infile_with_websites.csv /desired/outfile/path.csv
- Python >= 3.7
- A Bing Search API subscription
- A CSV full of company names
$ pip install bwf
From the commandline run:
$ git clone https://github.com/rtruxal/company-website-finder.git
$ cd bing-website-finder/
$ pip install .
$ git clone https://github.com/rtruxal/company-website-finder.git
$ cd bing-website-finder/
$ python --version `#check your python version to make sure it's >=3.7`
Python 3.7.2
$ python `#start python`
Python 3.7.2 (default, Jan 2 2119, 17:17:17) [MSC v.1111 2222 bit (DAM46)] :: Anaconda, Inc. on win3333
Type "help", "copyright", "credits" or "license" for more information.
And then:
>>> from bing_website_finder.get_websites import init
>>> from os import path
>>>
>>> infile = path.realpath('./bing_website_finder/data/example_input_website_finder.csv')
>>> outfile = path.realpath('./results.csv')
>>> bing_api_key = '987654321deadbeef123456789'
>>>
>>> init(infile, outfile, verbose=False, api_key=bing_api_key)
The easiest way to glean the input format is by checking out bing_website_finder/data/example_input_website_finder.csv
(which was gathered from a very old SEC website) for a practical example.
Your input CSV must include the following 2 columns (case sensitive):
- Company Name
- Website
Locate the bing_website_finder/config.py
file & modify this line:
DEFAULT_SEARCH_API_V7_KEY = 'CHANGE ME'
IMPORTANT NOTE: bwf
will fail silently if you do not change api_key=none
nor modify : DEFAULT_SEARCH_API_V7_KEY
inside of config.py
You can find the primary interface inside of bing_website_finder/get_websites.py
.
It's called init()
. Here is it's declaration as of v0.0.1:
def init(infilepth, outfilepth, verbose=False, api_key=None, num_workers=5):
assert os.path.exists(infilepth), "Please check the infile path you've specified."
cache = pd.read_csv(infilepth)
workers = (WebsiteWorker(cache, api_key) for i in range(num_workers))
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(_execute(workers, verbose, loop=loop))
finally:
cache.to_csv(outfilepth, index=False)
If you do not specify an api_key
argument, the DEFAULT_SEARCH_API_V7_KEY
variable in bing_website_finder/config.py
will be used.
Insallation via pip automatically creates an executable and places it in your $PYTHONPATH.
If /YOUR/PYTHON/ENV/bin
* is in your $PATH
**, simply type:
$ bwf --help
or
$ bing_website_finder --help
if neither of these^ work after installation, you can always use:
$ python -m bwf --help
or
$ python -m bing_website_finder --help
* - (/YOUR/PYTHON/ENV/Scripts
on Windows)
** - (%PATH%
on Windows)