-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unifi-os doesn't restart #29
Comments
I've seen this once or twice myself but never captured what is causing it to fail. I don't suppose you have any logs or anything that we might be able to use to figure out why it didn't restart? Without anything to go on the only thing I can think of would be some kind of Until loop (provided this shell supports it) that does a restart until `unifi-os | grep 'is already running' |
In my case unfortnately it happens every time udm-pro reboots. I have ssh access to udm-pro, I didn't paste any logs in my original message because I'm not very familiar yet with that system and don't know which log files are relevant to this problem. Is it just /mnt/data/log/messages or some other files too? In /mnt/data/log/messages there's this at the exact moment udm-le script runs: |
This might do the trick, I'll see about working into the master branch:
|
Here's /mnt/data/log/messages from the moment udm-pro rebooted to the moment I ran |
Ok, I pushed up a possible fix. It'll try to restart the container, and if that fails retry 5 times before it bails out. This might catch whatever edge case we're seeing. Try that on for size and if it looks good I'll cut a new release at some point soon. |
I tried it, updated the file and rebooted udm-pro, no luck. It's exactly the same as before. |
Maybe I will just add another script to on-boot.d that restarts the container 6 or 7 minutes after reboot. Will give it a try tomorrow. |
That's really odd, you can try increasing the sleep in |
Another thing to try, edit this line: https://github.com/kchristensen/udm-le/blob/master/on_boot.d/99-udm-le.sh#L10 Change it to something like: |
Another thing that stands out to me: On reboot, if you don't need to reissue a certificate because it is > 60 days old, udm-le doesn't try to restart the unifi-os container because there's no reason to (unless you're using a captive portal certificate). If you have a valid certificate, and you reboot it effectively does nothing but exit. So, if you know you have a valid certificate and on a reboot unifi-os isn't starting, I feel like there's something else going on we don't know about. |
The output of
Yes, I was trying to set that up, will try without the captive portal and see if it works. |
Ok that's a clue, see if it works w/o the captive portal and we can go from there. I don't personally use it, so there might be some bug there I haven't run into. |
I tried without the captive portal. Still same problem.
It's not that it isn's starting at all, when UDM-PRO reboots unifi-os starts properly but then stops after 5 minutes when
Not really. Lines 104 to 106 in a3f2b92
but bootrenew doesn't, it runs add_captive and restart_unifi_os regardless:Line 110 in a3f2b92
I looked it up and it seems it was discussed and agreed on in #8. So it seems it fails on the unifi-os restart bit.
|
Hm, I guess the route to take to get to the bottom of it is to add some more logging to |
I' facing this problem as well. My UDMP also refuses to work after five minutes and has to be revived with a "unifi-os restart". UDM-PRO firmware 1.9.1 Is there anything i can provide beyond that to help solve this? |
Yes, I was going to try but haven't found the time recently. Will report here once I have tested more. |
I am also going to test a simple on boot script containing |
I'm on 1.9.1 / on-boot-script 1.0.4 and oddly I haven't run into this lately so figuring out how to get more verbose logs here is going to be key. |
Oh, I think unifi-os does something wonky with stdout, try |
Yeah, that was never going to work. From /usr/sbin/unifi-os case "$cmd" in
stop)
stop 2>&1 | logger -st unifi-os
;;
start)
start 2>&1 | logger -st unifi-os
;;
reset)
reset 2>&1 | logger -st unifi-os
;;
restart)
restart 2>&1 | logger -st unifi-os
;;
running)
container_is_running "${CONTAINER_NAME}"
;;
shell)
podman exec -ti "${CONTAINER_NAME}" bash
;;
update)
update "${2}" 2>&1 | logger -st unifi-os
;;
*)
echo "Usage: $0 [stop start restart shell 'update url']"
;;
esac |
Go ahead and modify /usr/sbin/unifi-os directly, just make a solid backup first. Or you can just have a look at |
Okay, after extensive testing, I came to the following conclustion: Editing unifi-os directly works - as long as you do not reboot. After rebooting, those changes made are reverted and nothing gets logged anymore. |
Not really, I was wrong there. The I did some further testing. I created an on boot script that contained only wait 600
unifi-os restart It failed the same way udm-le fails. So it's clearly something with on boot scripts execution on my udm-pro. I narrowed down the problem further by modifying my script to
stop() {
if ! container_is_running "${CONTAINER_NAME}"; then
echo "${CONTAINER_NAME} is not running"
else
echo "Stopping ${CONTAINER_NAME}"
podman stop "${CONTAINER_NAME}"
fi
stop_dropbear_daemon
} According to the logs I saw,
You can see that just after |
I have been experiencing unifi-os dying since I installed and thought it was something flaky with the utils / OS. Now, I understand that it was this little restart fail causing the issue. My understanding is that the udm-le script will be called from the context of the container, so when you call restart from inside, then it shuts down and gets killed and doesn't restart, as shown in the logs above. I think there are some ways to solve this:
For now, I have disabled the restart in my scripts and will restart it when I need to manually - I'd rather have the certs fail than the whole unifi-os not working after a power outage. |
The on-boot-script actually executes the init script on boot on the host, not in the container. The The issue is that when whatever causes the restart on boot to fail, the controller and captive portal and what have you won't be running. The actual routing stuff happens on the host and should be unaffected by the failure to restart. As for #3, that happens already, provided |
I thought the same but then realised not everyone experiences this issue, which is strange. |
Indeed. I have no issues with the way this is setup and I've had this running on my UDMP since more or less the Early Access days of the udmp. That said I only use Cloudflare for DNS and none of the captive portal things that people have added over time, so it is hard to say if that is possibly causing issues. |
Routing won't be affected, but protect will not be up - which is a problem. My setup has udm-le, cloudflare-ddns, pihole, cloudflared and homebridge containers. I don't have captive portal set up. |
I have disabled the captive portal and have nothing else installed besides on-boot and udm-le. And as I have recently tested it's just the restart command that fails, it has nothing to do with udm-le itself. |
Does restart work from the shell? |
Yes, always. |
But I think you are on the right lines that when on-boot script runs |
So peacey over at the udm-utilities has a fix for this issue. I took his second suggestion and replaced The last case is now:
Seems to have resolved the issue |
Finally found the time to test 1.0.9. It fixed the issue, thanks! |
Expected
With the on-boot script, udm-le waits 5 minutes then installs certs and restarts unifi-os
Actual
The
unifi-os restart
command that the script runs doesn't work. Unifi-os is stopped but doesn't restart, everything is back to normal only after manually executingunifi-os restart
.UDM-PRO firmware 1.8.6
on-boot-script 1.0.4
udm-le 1.0.7
The text was updated successfully, but these errors were encountered: