2023-05-14

VPS Troubleshooting: A Quick Guide

Many problems may occur on a VPS, and solving them is no easy task. It’s like being a doctor, you can only see the symptoms and have to give educated guesses about what the cause of the problem may be.

First you diagnose the problem by different tools you have. Then, depending on your guess about the root cause, you try different remedies until you find one that actually solves the problem.

This article will go over some general problems that VPSes may have. We will also learn how one may solve them. Let’s go!

Unable to connect to the server via SSH

Use the web console.

All VPS providers give you the ability to manage your VPS even if it doesn’t have an Internet connection. Find that section (usually called web console) in your VPS provider’s site and log in to your server, follow the rest of the article then.

Are you connecting to the correct port and IP?

Check the IP you’re trying to connect to; is it the same as the one on your VPS provider site? If so, the problem may be that you’re trying to connect to the wrong port.

First, try to see the port that your SSH server listens to. You can use this command on your server to find that out: sudo ss -tulpn | grep ssh if it shows any port other than 22, try connecting to that port from your PC, for example for port 2222, we’ll have ssh -p 2222 user@IP

Is your sever up and running?

If you haven’t blocked ICMP packets with your firewall, you can use ping (or if you only have IPv6, ping6) to see whether your server is accessible over the Internet. Try: ping x or ping6 y where x is your IPv4 and y your IPv6 address.

Can you ping your server? Or, equivalently, can you open a service hosted on it? If so, it’s good news! At least now you know that the OS on the server is up and is responding to the packets it receives. If it’s not, ping your server with online tools such as ping.pe to make sure your server is really down, and it’s not a problem specific to your PC. If your server is down, you must reach out to your VPS provider. It’s most likely that they have turned off your VPS because you haven’t payed the bills, if it’s not that, then open a ticket and ask them for help.

Use another server or computer for SSH connection

If your server’s up, but you can’t SSH to it, it may be your PC’s Internet connection that has problems. Try SSHing on other servers or computers and see if the problem’s on your server’s side. Ideally, you can use tcp.ping.pe to verify that your server’s SSH port is open from multiple locations.

Use the verbose mode of SSH

Your SSH client itself can provide you with more details if you ask it to, use the -vvv

option for the SSH command, it’ll become more verbose and you may find the problem just by doing that.

Is your firewall allowing TCP packets to that port?

It may be that you have enabled a firewall and it’s drops any packets sent to the port SSH server is currently listening to. It all really depends on your distro, or the firewall you have chosen to use, but on most distros, iptables is the default firewall, so try: sudo iptables -L

If you have installed nftables, try this one: sudo nft list ruleset

There must be nothing that indicates dropping of packets on the port that your SSH server is listening to. If there is, just change the configuration so that it allows TCP packets to that port.

Is your SSH server configured well? Is it running?

Sometimes it’s not about the firewall, it’s just that there’s no SSH server listening on any port, or if there is, it’s not allowing you to log in. Most GNU/Linux distros use OpenSSH as their SSH server and manage it with systemd. I’ll assume these tools for this article, but for example Alpine Linux and Gentoo use openrc and not systemd, or some may use dropbear as their SSH server. Keep these differences in mind if you’re not using mainstream distros such as Ubuntu and Debian.

First let’s see if your SSH server is running with no problems. As before, ss -tulpn | grep ssh will show if there’s a process with ssh in its name is running and listening to any port. OpenSSH’s server part is called “sshd” so if it’s listening it will show up.

If it’s running but you can’t connect to it; or if it’s not running, you can look for some simple logs using this command: journalctl -u ssh.service

If you can’t diagnose and solve the problem just by reading those logs, try looking up OpenSSH logs directly. You can do this by executing cat /var/log/auth.log if this doesn’t work, try cat /var/log/secure

This book will help you understand what the log is saying.

After reading logs and searching about them, you should have a general sense of what’s wrong with your sshd configuration, if you need to change anything, try using sudo nano /etc/ssh/sshd_config After saving your changes to this configuration file, run sudo systemctl restart sshd for the changes to take place, repeat reading logs and making changes until you have a working SSH server.

Some lines that you must pay attention to are:

PermitRootLogin If it’s “no”, then obviously you can’t log in as root and should try logging in as other users.
PasswordAuthentication If it’s set to no, you have to always log in using your private key (a good thing!). This is done by -i option in your ssh client: ssh -i /path/to/privkey
PubkeyAuthentication This must be set to yes if you’re trying to authenticate yourself by a private key as shown above.
ListenAddress This part specifies on which ports the SSH server listens to. Any IP other than 0.0.0.0 (and it’s IPv6 counterpart ::) or the IP you’re currently SSHing to will cause the packets to drop since the IP you request and the IP the SSH server is listening to don’t match.

To check if the SSH server is running you can try: sudo systemctl status sshd

High CPU usage

A one-shot or a chronic problem?

Look at the CPU usage graph of your VPS. You either have server monitoring tools such as Prometheus, or if you don’t, just use the one your VPS provider (probably) gives you.

In the graph, if you see a sudden bump, it may be that you have suddenly too many people using your services, your VPS has a malware, or that you’re under a DDoS attack. For the latter, using a CDN is a good idea.

If the graph is smooth and showing consistently high CPU usage, follow the steps below. If the processes that are using lots of CPU are legitimate, think about upgrading your VPS resources.

Find the CPU-consuming processes.

You can see which processes are using the most CPU by executing: sudo top -o %CPU. You can find the culprit here. Once found, you can kill it or renice it. If the process is critical, you shouldn’t do either, just upgrade your VPS resources.

To kill a process, press k while top is open. It will ask you for a process ID (PID) and it will close that process for you.

Instead of killing processes, you can renice them. Linux comes with the ability to change the priority of processes in CPU scheduling. It’s called renicing. While top is running press r , then enter a PID and change its nice value. The nice value indicates the priority of the processes in getting CPU time. Nice can be any value from +19 to -20, the higher it is, the less CPU resource that process will get. When you’re done, press q to quit.

Kill, restart, or renice the processes

If you don’t find the CPU consuming processes critical, kill them. To kill a process, press k while top is open. It will ask you for a process ID (PID) and it will close that process for you. If they don’t go away gracefull, press q to quit from top, then execute kill -9 x where x is the PID. This will force kill the process. You can try restarting the services that run the processes (e.g. if nginx uses too much CPU: systemctl restart nginx).

Instead of killing or restarting processes, you can renice them. Linux comes with the ability to change the priority of processes in CPU scheduling, it’s called renicing. While top is running press r , then enter a PID and change its nice value. The nice value indicates the priority of the processes in getting CPU time. Nice can be any value from +19 to -20, the higher it is, the less CPU resource that process will get. When you’re done, press q to quit.

Case-by-case solutions

Sometimes the problem is more subtle, a misconfiguration in, for example, your MySQL server, or a badly written Wordpress plugin might be the whole reason your VPS is consuming so much CPU. If you have a guess about the culprit (that shows up in the result of top command), try searching. Other users may have the same problem with the service, and you can ask them about how they have solved the issue.

Updating everything

If the problem persists, try updating everything (from your entire OS to every docker container) as a last shot. If you have recently installed or upgraded something and guess that may be the cause, you can downgrade or reinstall that too.

Unexpected reboot

This one’s hard to catch! But let’s try.

How frequent are these reboots? and when do they happen?

First see your uptime, use the uptime command, it will show you the current time and following that, how long your system has been up.

Then try to see the frequency of the reboots. You can do this by: sudo last -x | tac the | tac part is optional and only shows you the latest info at the bottom of the terminal, instead of at the top.

You may understand at this level that your system is not facing a reboot, it may just be that OOM killer is killing your processes because you’re running out of memory and your process doesn’t restart automatically.

Look through the logs

After you get a rough idea of the times your server reboots, it’s time to dive into the logs. Try checking out either /var/log/messages or /var/log/syslog. Since you know exactly when you have had reboots and these files have timestamps, you know where you should look at. You can also use cat and grep to look for specific keywords.

Mae sure you look into journalctl logs at a little bit before and after reboots using this command: journalctl -S "2023-01-01 00:00:00" -U "2023-01-01 01:00:00"

Things get a lot easier once you use the journalctl command, first list the boots you have had by journalctl --list-boots there will be a list of latest boots with a descending number assigned to each of them. If you want to see the logs for your current boot (which has the number 0) You would type: journalctl -b 0 and for the one before that journalctl -b -1 and so on (src). Please note that journalctl uses a pager. Use arrow keys to navigate, but if you want to jump to the latest logs just press Shift+g.

Check for common culprits

Linux sometimes panics, and if you make it to, it will reboot on panics. You can see if this is the case by executing sysctl kernel.panic. If the result is a negative number, Linux will immediately reboot after a panic, if it’s a positive number, it will reboot after that many seconds, and if it’s 0, it won’t reboot.

If you’re using a Debian based distro, and have enabled Unattended-Upgrade::Automatic-Reboot in your apt configuration, then apt will reboot the system on its own after upgrades.

Are you running out of resources? Check for that too, in the following part of the article you’ll learn how.

Ask for your VPS provider’s uptime

There may be nothing with your VPS after all, maybe it’s just that the dedicated server hosting your server is not configured well, or that it’s having reboots, ask your VPS provider about this.

Slow response time / network connection instability

Do you have enough CPU/Disk/RAM resources?

Network packets need to be processed, if your system is overwhelmed and there’s not enough RAM and CPU, your VPS may have problem with processing the incoming packets. So use the following commands to see how much available resource you have.

free -h use this to see how much RAM and swap you have left.
top see which processes are hogging resources
df -h see if you have enough space left. The usage percentage better be less than 98%.
df -hi see how much inodes you have left on each partition. The usage percentage better be less than 98%.

You may wonder what disk has got to do with response time. Two things come to mind. First, your programs need disk space to function well (for example a SQL server needs to write to disk) and second, sometimes your RAM is full and the OS starts writing to swap, if there’s too much I/O going on, writing and reading may be slow and it will hurt all the running processes that use swap. You should not lean on swap and if it’s always full, just buy more RAM.

Is your hard disk’s I/O speed okay?

Sometimes the root of slow response time lies within slow disk read and write. Check for that as well.

Is your network okay?

Sometimes it may not be that your provider is having problems, but that your server is being flooded with network requests. You can check that out an app called nethogs. If you’re using too much network resources, ask for a dedicated Internet port from your server provider. But if you’re not, try to see what’s happening under the hood.

Try pinging yourself and some other IP: ping -c 10 your_ip and ping -c 1.1.1.1. Ping shows you how long (in milliseconds) a server takes to respond to your packets, the less it is, the better. Pinging yourself should take less than a millisecond and pinging 1.1.1.1, less than 200ms at most, although it usually is a lot less than 200ms. Other than average round trip time that is displayed at the bottom, there’s also another factor you should pay attention to: packet loss. Ideally you should have a 0% packet loss.

Try testing your Internet speed, first install speedtest-cli and then run it. Depending on your VPS provider’s network and what plan you have bought, the speed you get will vary, but if it’s too slow (and other processes are not downloading anything either), open a ticket with your provider.

Check for software issues

If your network connection is okay, but you’re still having slow response time, then it’s time that you look into the configuration of the specific service that is slow, try searching about that specific piece of software, and then optimize it.

Summary

In this article we went through some of the common problems that you may face by owning a VPS, introduced some diagnostic tools for the problems and offered solutions. I hope you find them useful. If you have any questions or comments, don’t hesitate to send me an email.