Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocket.Chat isn't registered with PM2 #1

Closed
jeansnkicks opened this issue Dec 14, 2015 · 27 comments
Closed

Rocket.Chat isn't registered with PM2 #1

jeansnkicks opened this issue Dec 14, 2015 · 27 comments

Comments

@jeansnkicks
Copy link

Running the playbook against a minimal, clean Centos 7 VM (CentOS Linux release 7.1.1503 (Core)). Everything succeeds (or appears to) except the task:

TASK: [cmacrae.rocket_chat | Register Rocket.Chat service status] *************
failed: [rc-alpha.greensky.local] => {"changed": false, "cmd": "pm2 show rocket.chat", "delta": "0:00:00.205768", "end": "2015-12-14 10:45:43.195818", "rc": 1, "start": "2015-12-14 10:45:42.990050", "stdout_lines": [], "warnings": []}
stderr: [PM2][WARN] rocket.chat doesn't exist
...ignoring

playbook is simple:

  • hosts: chat_servers

    vars:
    rocket_chat_automatic_upgrades: True

    roles:

    • cmacrae.rocket_chat

I'm new to rocket.chat, so I'm not sure where to start digging to provide additional troubleshooting details.

Rocket.chat is definitely not running (port 3000) on the host, although NGINX is (80/443), mongo is running on 27017.

@cmacrae
Copy link

cmacrae commented Dec 14, 2015

Hey @jeansnkicks - thanks for raising this issue 👍
Yep, so, as you can see there's the ...ignoring line, just after the stderr output.
This is intentional, it's because the Rocket.Chat application hasn't been registered with PM2 at that point.

This isn't a very good solution and I plan to handle the "error" properly.
I'll keep this issue open until I've addressed this in a better manner, but for the time-being, you can rest assured that it's functioning as currently intended :)

On a side note: you should re-install this role from Galaxy using the name 'RocketChat.Server' as it moved over to the official RocketChat Github namespace, the same goes for your roles section in your playbook, change cmacrae.rocket_chat to RocketChat.Server.

Thanks for using this and bringing it to my attention! You'll hear from me soon when I've decided how I'm going to handle this in a more graceful manner.

@cmacrae
Copy link

cmacrae commented Dec 15, 2015

@jeansnkicks: Well, c549084 should fix this - I just opted to us ps to check the running process instead PM2 - this should solve this problem 👍

Make sure you grab the role from Galaxy as 'RocketChat.Server'.

I've tested and it works for me, could you let me know how it goes for you?

@cmacrae
Copy link

cmacrae commented Dec 15, 2015

To add to this: I've decided I'm going to support the native service management systems of the supported platforms for this role, rather than using PM2, so, soon you'll be able to handle Rocket.Chat like any other services running on your system(s) :)

@jeansnkicks
Copy link
Author

Makes sense about PM2.

I reset the VM and re-ran using the new role. Success, except for a checksum issue with the rocket.chat master package sitting on S3. And for some reason, rocket.chat had to be started (via PM2 restart) - meteor was complaining about the node version, except, it was the correct version. Manual update for checksum fixed the role issue. Restart fixed the meteor issue.

Any reason why you're pulling the package from S3 there and not from github sources?

@cmacrae
Copy link

cmacrae commented Dec 15, 2015

@jeansnkicks Yeah, I've literally just updated the hashsum for the master tarball.
The tarball is pulled from S3 because this is now the official deployment method. It means we don't have to build the NPM bundle using Meteor.

That's a bit concerning about having to start Rocket.Chat manually via PM2... I do continuous testing with my changes, on completely fresh systems to ensure no regression has been introduced, I don't seem to run into this issue. I don't suppose you have the output of the Ansible run? Or if you could try doing a run through of the playbook on a completely fresh system?

@jeansnkicks
Copy link
Author

@cmacrae LOL, re: hashsum. I've got a clean snapshot of the CentOS 7 VM for just this reason. I'll reinstall the role from scratch, run the role again and report back.

@cmacrae
Copy link

cmacrae commented Dec 15, 2015

The hash sum even changed again since my comment on it! Must be some anomalies going on with the stable binary deployment, haha. So, you'll need to make sure you have the latest code again - I wish the Galaxy cli tool had an update/pull feature...

About the re-deployment; nice one. I appreciate it!

@jeansnkicks
Copy link
Author

OK, nuts. Moving backwards somehow. Ansible rough output, yml file. No error from PM2, but on the server:

pm2 list

Is empty. And meteor/rocket chat isn't running.

I'm still learning ansible, is there a debug file I can pull from somewhere? Or that I can force the creation of, if I re-run it?

rocket_chat.txt
ansible.txt

@jeansnkicks
Copy link
Author

OK, if I force the registration manually:

cd /var/www/rocket.chat
pm2 start pm2-rocket-chat.json --watch

All is good. I expect I could simply re-run the role and get the same result. Not sure why that step is balking on my image the first time around.

@cmacrae
Copy link

cmacrae commented Dec 15, 2015

Hmmm, strange, it seems to skip the startup - I'm just testing this on my CentOS system at the moment, to be sure I didn't introduce any regression with today's commits.

Also, I'm going to get started on systemd support tonight, shouldn't take me long at all, so, perhaps that'll be the saving factor :)

@jeansnkicks
Copy link
Author

Sounds good. Happy to test. I'm going to work on a vagrant setup for testing to automate the test process.

@cmacrae
Copy link

cmacrae commented Dec 15, 2015

Awesome 👍
Right, yep, looks like there's a problem... it skipped for me too!
I'm working on systemd support right now - stay tuned, I'll update when it's ready, which will be at some point in the next few hours :)

@cmacrae
Copy link

cmacrae commented Dec 15, 2015

Well, I'm happy to say, I've written the systemd support, and it works great!
Just got to put together an upstart init script for Ubuntu 14.04, confirm that works, then I'll commit and push

@jeansnkicks
Copy link
Author

Cool. I'll be on a bit tonight (US Eastern) to test if you post it in the next few hours.

@cmacrae
Copy link

cmacrae commented Dec 16, 2015

Phew! That was quite the session!
So, 83f52c2 brings native service management :)

I've tested on CentOS 7, Ubuntu 14.04 & 15.04 - all works great for me.
Take it for a spin, let me know how you get on.

@jeansnkicks
Copy link
Author

You've done a ton of work today! Impressed for sure. So... first pass, something is failed. Meteor/RC isn't loaded. Systemctl reports:

rocketchat.service loaded failed failed Rocket.Chat Server

Not sure where it got off the rails - /var/www didn't get created. Unless that changed? Something seems pretty off. I'm going to reset and try again.

@jeansnkicks
Copy link
Author

Full reset and re-attempt same result. Somehow rc isn't getting installed at all. I'll start digging through the logs. If there is something specific I should pull first let me know.

@cmacrae
Copy link

cmacrae commented Dec 16, 2015

Yeah, quite a bit of stuff added/shifted around!
Hmm, that's strange, I've been doing fresh deploys on all the supported platforms each time.
Could you check the output of journalctl -xn and systemctl rocketchat status? It should give some indication as to what the problem was.

Yeah, I moved the rocket_chat_application_path to /var/lib/rocket.chat/.

With your Ansible runs, are there any failing tasks?

@jeansnkicks
Copy link
Author

No failing tasks, which I thought off. Systemctl:

rocketchat.service - Rocket.Chat Server
Loaded: loaded (/usr/lib/systemd/system/rocketchat.service; enabled)
Active: failed (Result: start-limit) since Tue 2015-12-15 20:06:41 EST; 28min ago
Main PID: 4774 (code=exited, status=1/FAILURE)
CGroup: /system.slice/rocketchat.service

journalctl -xn has nothing helpful (success or error).

Node --version is reporting it is v0.10.36 which certainly won't make Meteor happy. I'll do some more digging.

System clock is also way off, I just noticed, but that's probably unrelated.

@cmacrae
Copy link

cmacrae commented Dec 16, 2015

Weird, could you show me the contents of the /usr/lib/systemd/system/rocketchat.service file?

@jeansnkicks
Copy link
Author

[Unit]
Description=Rocket.Chat Server
After=syslog.target
After=network.target

[Service]
Type=simple
Restart=always
StandardOutput=syslog
SyslogIdentifier=RocketChat
User=rocketchat
Group=rocketchat
Environment=MONGO_URL=mongodb://127.0.0.1:27017/rocketchat
Environment=MONGO_OPLOG_URL=mongodb://127.0.0.1:27017/local
Environment=ROOT_URL=http://localhost.localdomain
Environment=PORT=3000
WorkingDirectory=/var/lib/rocket.chat
ExecStart=/bin/node /var/lib/rocket.chat/bundle/main.js

[Install]
WantedBy=multi-user.target

Where do the Meteor logs dump out?

@jeansnkicks
Copy link
Author

It's definitely a node setup issue. From /var/log/messages:

Dec 16 10:02:54 localhost RocketChat: Meteor requires Node v0.10.40 or later.

And:

# node --version
v0.10.36

If I try to force nave:

# nave usemain 0.10.40
curl: (22) The requested URL returned error: 404 Not Found
######################################################################## 100.0%
installed from binary

But the node version is stuck on 0.10.36. The step shows "changed" in the Ansible role logs, but something isn't quite right there.

@cmacrae
Copy link

cmacrae commented Dec 16, 2015

Alright, nice one for narrowing it down.
I'm getting a Vagrant environment set up for testing. I've been using LX branded zones under SmartOS so far for testing, so perhaps there's some inconcistencies.

@cmacrae
Copy link

cmacrae commented Dec 16, 2015

Right, I think I've found the problem!
Nave is placing the desired version of the node binary in /usr/local/bin/node - this actually makes things a bit easier, because this seems to be commonplace across all distros :)

I'll implement a change, test it, then push.
Thanks for helping me track this down! 👍

@cmacrae
Copy link

cmacrae commented Dec 16, 2015

Right, 2006cfe should fix this! Try it out :)

@jeansnkicks
Copy link
Author

Rock and roll! It's fixed! You're the man. I'm closing this issue.

@cmacrae
Copy link

cmacrae commented Dec 16, 2015

Eyyyy, good to hear! Thanks for the help with the investigation :)
Don't hesitate to raise any other issues in the future regarding any problems or even any thoughts or general opinions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants