Netplan Route-Policy Troubleshooting On Ubuntu 18.04

by Felix Dubois 53 views

Hey guys! Ever wrestled with getting multiple route-policy rules to play nice on your Ubuntu 18.04 server? It can be a real head-scratcher when you've got your netplan YAML all set, but only one rule seems to be doing the heavy lifting. Let's dive into this networking puzzle, focusing on Ubuntu Server 18.04 LTS with multiple network interface cards (NICs). We’ll break down the common pitfalls and how to ensure all your route-policy rules work together harmoniously.

When dealing with Ubuntu 18.04 server networking, specifically configuring multiple route-policy rules using Netplan, you might encounter a situation where only one rule seems to be functioning. This issue can arise due to a variety of reasons, often related to the order of rule application, conflicting configurations, or syntax errors in your Netplan YAML file. Understanding how Netplan processes these rules and how the Linux kernel's routing tables work is crucial for effective troubleshooting. This guide will walk you through the common causes and solutions to ensure all your route-policy rules are correctly applied and functioning as expected.

We’ll explore everything from the basic setup to the nitty-gritty details of debugging. We'll tackle common issues, share best practices, and get those routes behaving like they should. By the end of this guide, you'll be a route-policy rockstar, confidently managing your network traffic like a pro.

So, you've set up your Ubuntu 18.04 server with multiple NICs, maybe something like eno1 and eno2, each on different networks. You’ve crafted what you think are perfect netplan route-policy rules, but alas, only one seems to be active. What gives?

The core issue often lies in how Netplan and the underlying networking stack, like networkd, interpret and apply these rules. Route-policy rules, in essence, tell your server how to handle traffic based on certain criteria – source, destination, and so on. These rules are translated into Linux kernel routing table entries, which dictate the actual path network packets take. If these rules aren't correctly processed, packets might take unexpected routes, or worse, get dropped altogether.

To effectively troubleshoot this, it’s essential to understand the order of operations and the interactions between different rules. Netplan reads the YAML configuration files and applies the settings. However, the order in which these settings are applied can significantly impact the final routing table. Conflicting rules or overly broad rules can shadow more specific ones, leading to the behavior where only one rule appears to work. Additionally, syntax errors in the Netplan configuration file, though sometimes subtle, can prevent rules from being applied correctly. Therefore, a meticulous review of the YAML file is always a good starting point.

For example, let’s say you want traffic from a specific subnet to go through eno2, and all other traffic through eno1. If the default route is set via eno1 without a specific rule for the subnet, the specific rule might be ignored. This is where understanding rule precedence and specificity becomes crucial. We'll dive deeper into these concepts and how to diagnose them in the following sections.

Let’s break down the common culprits behind this routing conundrum and how to fix them. Think of this as your troubleshooting toolbox – we're arming you with the knowledge to diagnose and resolve these issues effectively.

1. Rule Order Matters

Route-policy rules are applied in the order they appear in your Netplan configuration. The first matching rule wins. This is a crucial concept. If you have a general rule that matches all traffic before a more specific rule, the specific rule will never be applied.

Solution: Reorder your rules from most specific to most general. Think about the criteria each rule uses to match traffic. For instance, a rule that matches a specific subnet should come before a rule that matches all traffic (the default route). Consider the following example:

network:
  version: 2
  renderer: networkd
  ethernets:
    eno1:
      addresses: [ 172.22.1.1/20 ]
      gateway4: 172.22.1.1
      routes:
        - to: 0.0.0.0/0
          via: 172.22.1.1
          metric: 100
    eno2:
      addresses: [ 10.11.1.34/30 ]
      routes:
        - to: 192.168.1.0/24
          via: 10.11.1.33
          metric: 200
        - to: 0.0.0.0/0
          via: 10.11.1.33
          metric: 300

In this example, the route to 192.168.1.0/24 via 10.11.1.33 should be placed before the default route via 10.11.1.33 to ensure that traffic destined for the 192.168.1.0/24 network is correctly routed.

2. Conflicting Rules

Sometimes, rules can overlap or contradict each other. For example, you might have two rules trying to route the same traffic through different interfaces. This creates a conflict, and the system might choose one rule arbitrarily, or worse, create a routing loop.

Solution: Carefully review your rules for overlaps. Use tools like ip route show table all to inspect the routing tables and identify conflicting entries. Ensure that your rules are mutually exclusive or that the more specific rule takes precedence. If you find conflicts, adjust the metric values to prioritize the correct routes. A lower metric value indicates a higher priority.

3. Syntax Errors in Netplan YAML

A single typo in your Netplan YAML can prevent a rule from being applied. YAML is very sensitive to indentation and spacing. A misplaced space or a missing colon can render an entire section of your configuration invalid.

Solution: Use a YAML validator (many online tools are available) to check your configuration for syntax errors. Pay close attention to indentation and spacing. Netplan is strict about these things, so even a small mistake can cause problems. Always run sudo netplan apply after making changes, and check the output for errors. If you encounter issues, the netplan try command can be invaluable as it rolls back changes if they cause connectivity problems.

4. Missing Default Route

If you don’t have a default route configured, traffic that doesn’t match any specific rule won’t know where to go. This can lead to packets being dropped or going through the wrong interface.

Solution: Ensure you have a default route (usually 0.0.0.0/0) configured. This route acts as a catch-all for traffic that doesn’t match any other rule. Typically, you’ll configure the default route via your primary internet-facing interface. In the Netplan YAML, this is usually set within the primary interface's configuration, specifying the gateway4 or gateway6 directive.

5. Incorrect Interface Names

It sounds basic, but it’s easy to mistype an interface name. If you specify the wrong interface in your rule, traffic won’t be routed as expected.

Solution: Double-check your interface names. Use ip addr show to list all available interfaces and their names. Make sure the names in your Netplan YAML match exactly. This is especially important in environments where interface names might not be consistent across reboots or system updates.

6. Firewall Rules

Firewall rules can interfere with routing. If a firewall rule blocks traffic on a particular interface, it doesn’t matter how your routing rules are configured; the traffic won’t pass.

Solution: Review your firewall rules (using iptables, nftables, or ufw) to ensure they aren’t blocking the traffic you’re trying to route. Make sure that your firewall rules allow traffic on the interfaces you expect them to. It’s often helpful to temporarily disable the firewall to test if it’s the cause of the issue. If disabling the firewall resolves the problem, you’ll need to adjust your firewall rules accordingly.

7. Metric Values

The metric value in a route-policy rule determines its priority. Lower metric values have higher priority. If two rules match the same traffic, the rule with the lower metric will be used. If the metrics are not set correctly, the desired rule might not be applied.

Solution: Review the metric values of your rules. Ensure that the more specific rules have lower metric values than the more general rules. This ensures that specific rules are preferred over default routes. Use the metric option in your Netplan configuration to set the metric value for each route.

Okay, so you've checked the usual suspects, but your route-policy rules are still misbehaving. Time to put on your detective hat and dig deeper. Here are some techniques to help you uncover the mystery.

1. netplan try

netplan try is your best friend when making changes to your network configuration. This command applies your new configuration and then waits for a specified timeout (usually 120 seconds). If connectivity is lost during this time, it automatically rolls back the changes. This prevents you from locking yourself out of your server with a bad configuration.

How to use it: Simply run sudo netplan try after making changes to your Netplan YAML. Watch the output for any error messages. If the changes break connectivity, the configuration will automatically revert to the previous state.

2. netplan apply and Checking for Errors

The netplan apply command applies your network configuration. Unlike netplan try, it doesn’t roll back changes automatically. However, it does provide error messages if there are problems with your configuration.

How to use it: Run sudo netplan apply after making changes. Pay close attention to any error messages that are displayed. These messages can often pinpoint syntax errors or other issues in your configuration.

3. Inspecting Routing Tables with ip route

The ip route command is your window into the Linux kernel’s routing tables. It shows you how packets are being routed on your system. You can use this command to verify that your route-policy rules have been correctly applied.

How to use it:

  • ip route show: Shows the main routing table.
  • ip route show table all: Shows all routing tables, including policy routing tables.
  • ip route get <destination_ip>: Shows the route that would be used for a specific destination IP address.

By examining the output of these commands, you can see which routes are active and whether they match your intended configuration.

4. Using tcpdump or wireshark to Capture Traffic

Sometimes, the best way to understand what’s happening with your network traffic is to capture it and examine it. tcpdump is a command-line packet analyzer, while Wireshark is a graphical tool for capturing and analyzing network traffic.

How to use it:

  • sudo tcpdump -i <interface> -n: Captures traffic on a specific interface without resolving hostnames.
  • Run Wireshark and select the interface you want to capture traffic on.

By capturing traffic, you can see exactly how packets are being routed, including source and destination IP addresses, ports, and protocols. This can help you identify if traffic is taking the wrong path or being dropped.

5. Logging and Monitoring

Setting up logging and monitoring can help you catch routing issues before they cause major problems. You can use tools like systemd-journald to log network-related events, and monitoring tools like Nagios or Zabbix to track network performance.

How to use it:

  • Check systemd-journald logs for Netplan and networkd errors: journalctl -u netplan -u systemd-networkd.
  • Set up monitoring to track network latency, packet loss, and interface utilization.

By regularly reviewing logs and monitoring network performance, you can identify trends and anomalies that might indicate routing problems.

Let’s walk through a real-world scenario to solidify our understanding. Imagine you have an Ubuntu 18.04 server with two NICs:

  • eno1: 172.22.1.1/20 (Default gateway for internet access)
  • eno2: 10.11.1.34/30 (Connected to a private network)

You want all traffic destined for the 192.168.1.0/24 network to go through eno2, and all other traffic to go through eno1.

Here’s a Netplan configuration that might seem correct at first glance:

network:
  version: 2
  renderer: networkd
  ethernets:
    eno1:
      addresses: [ 172.22.1.1/20 ]
      gateway4: 172.22.1.1
    eno2:
      addresses: [ 10.11.1.34/30 ]
      routes:
        - to: 0.0.0.0/0
          via: 10.11.1.33

In this configuration, the default route via eno1 is set correctly. However, there’s a catch! The default route via eno2 is also configured, but it’s missing the specific route for 192.168.1.0/24. This means all traffic, including that destined for 192.168.1.0/24, will go through eno1.

The Solution:

We need to add the specific route for 192.168.1.0/24 to the eno2 configuration and ensure it has a higher priority (lower metric) than the default route. Here’s the corrected Netplan configuration:

network:
  version: 2
  renderer: networkd
  ethernets:
    eno1:
      addresses: [ 172.22.1.1/20 ]
      gateway4: 172.22.1.1
      routes:
        - to: 0.0.0.0/0
          via: 172.22.1.1
          metric: 100
    eno2:
      addresses: [ 10.11.1.34/30 ]
      routes:
        - to: 192.168.1.0/24
          via: 10.11.1.33
          metric: 200
        - to: 0.0.0.0/0
          via: 10.11.1.33
          metric: 300

In this corrected configuration:

  • We’ve added a route for 192.168.1.0/24 via 10.11.1.33 on eno2.
  • We’ve set the metric for the 192.168.1.0/24 route to 200, which is lower than the default route’s metric of 300. This ensures that traffic destined for 192.168.1.0/24 will be routed through eno2.

By applying this configuration, we’ve successfully routed traffic for a specific subnet through a specific interface, while all other traffic goes through the default gateway.

To wrap things up, let’s talk about some best practices for managing your Netplan route-policy rules. These tips will help you avoid common pitfalls and keep your network running smoothly.

  1. Keep it organized: Use meaningful names for your interfaces and routes. This makes your configuration easier to understand and maintain.
  2. Comment your YAML: Add comments to your Netplan YAML to explain the purpose of each rule. This is especially helpful when you have complex configurations.
  3. Use a YAML validator: Always validate your YAML configuration before applying it. This can catch syntax errors and prevent unexpected behavior.
  4. Test your changes: Use netplan try to test your changes before applying them permanently. This can save you from locking yourself out of your server.
  5. Monitor your network: Set up monitoring to track network performance and catch routing issues early.
  6. Document your configuration: Keep a record of your Netplan configuration and any changes you make. This will help you troubleshoot issues and revert to previous configurations if necessary.
  7. Start simple: When configuring complex routing policies, start with a simple configuration and gradually add more rules. This makes it easier to identify and fix problems.
  8. Review regularly: Periodically review your Netplan configuration to ensure it’s still correct and optimal for your needs. Network requirements can change over time, so it’s important to keep your configuration up-to-date.

So, there you have it! Navigating multiple Netplan route-policy rules on Ubuntu 18.04 Server might seem daunting at first, but with a solid understanding of the underlying principles and debugging techniques, you can conquer any routing challenge. Remember, the key is to understand rule order, avoid conflicts, and use the right tools to diagnose issues. By following the best practices outlined in this guide, you’ll be well-equipped to manage your network traffic effectively. Happy routing, folks!

Remember, networking can be tricky, but with a bit of patience and the right approach, you can get those packets flowing exactly where you want them. Keep experimenting, keep learning, and don't be afraid to dive into the details. You've got this!