Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New production crawler #1164

Closed
Tracked by #1167
michielbdejong opened this issue Sep 11, 2024 · 10 comments
Closed
Tracked by #1167

New production crawler #1164

michielbdejong opened this issue Sep 11, 2024 · 10 comments

Comments

@michielbdejong
Copy link
Member

We've set up the server at 206.81.0.208 with a user 'runner' that has both @madoleary's and my ssh key authorized.
It can run cd ~/server ; npx ota track in a screen.
I pull the data to my laptop and then relay it to github:

git clone [email protected]:tosdr/tosdr-versions
cd tosdr-versions
git remote add server runner@ota-tosdr-8gb:engine/data/versions
git fetch server
git merge server/main
git push

And same for tosdr-snapshots.

@michielbdejong
Copy link
Member Author

I'll create a 4Gb one as the production one, to save us money

@michielbdejong
Copy link
Member Author

Pointed ota.tosdr.org to 159.223.154.84 in DNS

@michielbdejong
Copy link
Member Author

michielbdejong commented Sep 11, 2024

Copied most updated instructions from #1160 :
So, I think the instructions that work are:

  • ssh into a fresh Ubuntu 20.04 box. Then:
adduser crawler

(pick a good hard-to-guess password that will be used for sudo later)
(leave room number etc empty)

usermod -aG sudo crawler
mkdir /home/crawler/.ssh
cp ~/.ssh/authorized_keys /home/crawler/.ssh
chown -R crawler /home/crawler/.ssh
su crawler
cd ~
sudo ls
wget https://raw.githubusercontent.com/tosdr/ota-engine/main/ota-tosdr-server-init.sh
time sh ./ota-tosdr-server.sh
cd engine
source ~/.bashrc
nvm install 20
nvm use 20
npm install
npx ota track --services "Musi"

@michielbdejong
Copy link
Member Author

I accidentally picked Ubuntu 24.04 and confirmed that this way of installing Puppeteer on a server really doesn't work for that OS version. Rebuilding it as Ubuntu 20.04 now.

michielbdejong added a commit to tosdr/ota-engine that referenced this issue Sep 11, 2024
@michielbdejong
Copy link
Member Author

sudo apt install certbot
sudo certbot certonly --standalone

@michielbdejong
Copy link
Member Author

michielbdejong commented Sep 11, 2024

Running

NODE_OPTIONS=--max_old_space_size=8000 npx ota track

in a screen now, let's see how it does on a 4Gb server.

@michielbdejong
Copy link
Member Author

The run took 35 minutes, so that's great! Will add the github user and schedule it hourly.

@michielbdejong
Copy link
Member Author

Cron job is running now

crawler@ota:~$ tail -f Wed\ Sep\ 11\ 13\:55\:01\ UTC\ 2024.log 

I set up a git ssh key, will see if the engine will do a git push. if not, we can add it to the /home/crawler/hourly.sh script

@michielbdejong
Copy link
Member Author

The robustness of the cronjob is still not acceptable for production usage, so this work is now blocked on #1174

@michielbdejong
Copy link
Member Author

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant