Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nftables + refactoring of firewall code #3221

Merged
merged 44 commits into from
Oct 8, 2024

Conversation

narrieta
Copy link
Member

Adding support for nftables.

The code for setting up the firewall was scattered across multiple places and was not straightforward to follow up. Adding support for nftables on top of the existing code would have made things worse, so this PR also includes significant refactoring of that code and associated tests.

I also moved the code from the 'common' subdirectory to 'ga', since it is not needed by the provisioning agent.

@@ -35,6 +35,7 @@
from azurelinuxagent.ga.cgroupcontroller import AGENT_LOG_COLLECTOR
from azurelinuxagent.ga.cpucontroller import _CpuController
from azurelinuxagent.ga.cgroupapi import get_cgroup_api, log_cgroup_warning, InvalidCgroupMountpointException
from azurelinuxagent.ga.firewall_manager import FirewallManager
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FirewallManager is the new class that includes all the firewall functionality previously on osutil, AddFirewallRules, etc

print("Successfully set the firewall rules")
firewall_manager = FirewallManager.create(endpoint)
firewall_manager.setup()
event.info(event.WALAEventOperation.Firewall, "Successfully set the firewall rules")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I create some convenience functions to log and send telemetry (see event.py)

else:
cmd = AgentCommands.Help
break
regex_cmd = regex_cmd_format.format("{0}=(?P<endpoint>[\\d.]{{7,}})".format(AgentCommands.SetupFirewall))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now the command line takes only "-setup-firewall="

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I think the SetupFirewall regex check should be in a separate elif condition for readability.
That way the only thing in the else condition is Help command, and it will make it easier to add more commands later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, though I need the result of the match, so I do not think I can use an elif (unless I evaluate twice). Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, maybe do the match before the first if condition. Although that may defeat the intent of making this more readable. If moving it creates confusion, i'm ok with it as is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, moving it to the top of the if does not help in terms of readability. On the other hand, evaluating twice is not such a big deal in this case, so if it helps for readability in your opinion, I can add it to the next iteration of this code (I still need to add unit test for the env thread)

@@ -331,10 +329,10 @@ def report_dropped_events_error(count, errors, operation_name):
report_dropped_events_error(self.__unicode_error_count, self.__unicode_errors, self.__unicode_error_event)

@staticmethod
def _update_errors_and_get_count(error_count, errors, error):
def _update_errors_and_get_count(error_count, errors, error_msg):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename needed because now there is a global "error" function

@@ -750,9 +772,9 @@ def dump_unhandled_err(name):
last_type = getattr(sys, 'last_type')
last_value = getattr(sys, 'last_value')
last_traceback = getattr(sys, 'last_traceback')
error = traceback.format_exception(last_type, last_value,
trace = traceback.format_exception(last_type, last_value,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename needed because now there is a global "error" function

self._heartbeat_id, dropped_packets,
self._heartbeat_update_goal_state_error_count,
auto_update_enabled, update_mode)
heartbeat_msg = "HeartbeatCounter: {0};HeartbeatId: {1};UpdateGSErrors: {2};AutoUpdate: {3};UpdateMode: {4};".format(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use the DroppedPackets telemetry. It could be added to nftables too, but since we don't use it, I am deleting it instead.

@@ -643,327 +639,6 @@ def test_conf_sudoer(self, mock_dir):
print("WRITING TO {0}".format(waagent_sudoers))
self.assertEqual(1, count)

@staticmethod
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to test_firewall_manager, or deleted, since some of those test don't make sense for the new code

@@ -64,34 +60,3 @@ def test_nic_ordinary(self):
self.assertEqual(str(nic), '{ "name": "test0", "link": "link INFO", "ipv4": ["ipv4-1"], "ipv6": ["ipv6-1"] }')


class TestAddFirewallRules(AgentTestCase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to test_firewall_manager

if command -v wget >/dev/null 2>&1; then
wget --tries=3 "http://$WIRE_IP/?comp=versions" --timeout=5 -O "/tmp/wire-versions-$USER.xml"
else
if command -v curl >/dev/null 2>&1; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, on Azure Linux 3, wget returns success (0) on timeout. I am giving preference to curl instead.

elif command -v wget >/dev/null 2>&1; then
wget --tries=3 "http://$WIRE_IP/?comp=versions" --timeout=5 -O "/tmp/wire-versions-$USER.xml"else
else
http_get.py "http://168.63.129.16/?comp=versions" --timeout 5 --delay 5 --tries 3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this simple tool by doing surgery on some of the networking code in the agent. I used it to debug the above issue with wget. Since I already created it, I'm leaving it here as a fallback to curl/wget (initially it was meant as a replacement for them, but just changing the order of curl/wget fixed the original issue)

narrieta@microsoft added 2 commits September 20, 2024 19:21

def warn(op, fmt, *args):
"""
Creates a telemetry event and logs the message as INFO; convenience wrapper over add_event(log_level=logger.LogLevel.WARNING...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: WARN

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks; fixed


def error(op, fmt, *args):
"""
Creates a telemetry event and logs the message as INFO; convenience wrapper over add_event(log_level=logger.LogLevel.ERROR...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:ERROR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks; fixed

"""
Creates a telemetry event and logs the message as INFO; convenience wrapper over add_event(log_level=logger.LogLevel.ERROR...)
"""
logger.warn(fmt, *args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks; fixed

azurelinuxagent/ga/env.py Show resolved Hide resolved
azurelinuxagent/ga/firewall_manager.py Show resolved Hide resolved
azurelinuxagent/ga/firewall_manager.py Show resolved Hide resolved
azurelinuxagent/ga/firewall_manager.py Show resolved Hide resolved
azurelinuxagent/ga/persist_firewall_rules.py Show resolved Hide resolved
@nagworld9
Copy link
Contributor

I didn't complete my review, will give another round trmw

else:
cmd = AgentCommands.Help
break
regex_cmd = regex_cmd_format.format("{0}=(?P<endpoint>[\\d.]{{7,}})".format(AgentCommands.SetupFirewall))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I think the SetupFirewall regex check should be in a separate elif condition for readability.
That way the only thing in the else condition is Help command, and it will make it easier to add more commands later

azurelinuxagent/common/event.py Show resolved Hide resolved
@@ -55,20 +59,17 @@ def _cleanup_metadata_protocol_certificates():
_ensure_file_removed(lib_directory, _LEGACY_METADATA_SERVER_TRANSPORT_CERT_FILE_NAME)
_ensure_file_removed(lib_directory, _LEGACY_METADATA_SERVER_P7B_FILE_NAME)

def _reset_firewall_rules(osutil):

def _reset_firewall_rules():
"""
Removes MetadataServer firewall rule so IMDS can be used. Enables
WireServer firewall rule based on if firewall is configured to be on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be updated since we are no longer enabling wiereserver firewall rule


def _report(self, report_function, message, *args):
# Report the first 6 messages, then stop reporting for 12 hours
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment says 6 messages, but limit is 3 (line 133)

azurelinuxagent/ga/firewall_manager.py Show resolved Hide resolved
azurelinuxagent/ga/env.py Show resolved Hide resolved
azurelinuxagent/ga/firewall_manager.py Show resolved Hide resolved

self._setup_network_setup_service()
firewall_manager.remove_legacy_rule()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need try/catch error, so that in all cases agent continue setup the persistent firewall rules with firewalld

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good point, will add it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

azurelinuxagent/ga/firewall_manager.py Show resolved Hide resolved
@narrieta narrieta merged commit 2f6ab89 into Azure:develop Oct 8, 2024
11 checks passed
@narrieta narrieta deleted the firewall branch October 8, 2024 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants