-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for nftables + refactoring of firewall code #3221
Conversation
@@ -35,6 +35,7 @@ | |||
from azurelinuxagent.ga.cgroupcontroller import AGENT_LOG_COLLECTOR | |||
from azurelinuxagent.ga.cpucontroller import _CpuController | |||
from azurelinuxagent.ga.cgroupapi import get_cgroup_api, log_cgroup_warning, InvalidCgroupMountpointException | |||
from azurelinuxagent.ga.firewall_manager import FirewallManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FirewallManager is the new class that includes all the firewall functionality previously on osutil, AddFirewallRules, etc
print("Successfully set the firewall rules") | ||
firewall_manager = FirewallManager.create(endpoint) | ||
firewall_manager.setup() | ||
event.info(event.WALAEventOperation.Firewall, "Successfully set the firewall rules") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I create some convenience functions to log and send telemetry (see event.py)
else: | ||
cmd = AgentCommands.Help | ||
break | ||
regex_cmd = regex_cmd_format.format("{0}=(?P<endpoint>[\\d.]{{7,}})".format(AgentCommands.SetupFirewall)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now the command line takes only "-setup-firewall="
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, I think the SetupFirewall regex check should be in a separate elif condition for readability.
That way the only thing in the else condition is Help command, and it will make it easier to add more commands later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, though I need the result of the match, so I do not think I can use an elif (unless I evaluate twice). Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, maybe do the match before the first if condition. Although that may defeat the intent of making this more readable. If moving it creates confusion, i'm ok with it as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, moving it to the top of the if does not help in terms of readability. On the other hand, evaluating twice is not such a big deal in this case, so if it helps for readability in your opinion, I can add it to the next iteration of this code (I still need to add unit test for the env thread)
@@ -331,10 +329,10 @@ def report_dropped_events_error(count, errors, operation_name): | |||
report_dropped_events_error(self.__unicode_error_count, self.__unicode_errors, self.__unicode_error_event) | |||
|
|||
@staticmethod | |||
def _update_errors_and_get_count(error_count, errors, error): | |||
def _update_errors_and_get_count(error_count, errors, error_msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename needed because now there is a global "error" function
@@ -750,9 +772,9 @@ def dump_unhandled_err(name): | |||
last_type = getattr(sys, 'last_type') | |||
last_value = getattr(sys, 'last_value') | |||
last_traceback = getattr(sys, 'last_traceback') | |||
error = traceback.format_exception(last_type, last_value, | |||
trace = traceback.format_exception(last_type, last_value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename needed because now there is a global "error" function
self._heartbeat_id, dropped_packets, | ||
self._heartbeat_update_goal_state_error_count, | ||
auto_update_enabled, update_mode) | ||
heartbeat_msg = "HeartbeatCounter: {0};HeartbeatId: {1};UpdateGSErrors: {2};AutoUpdate: {3};UpdateMode: {4};".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use the DroppedPackets telemetry. It could be added to nftables too, but since we don't use it, I am deleting it instead.
@@ -643,327 +639,6 @@ def test_conf_sudoer(self, mock_dir): | |||
print("WRITING TO {0}".format(waagent_sudoers)) | |||
self.assertEqual(1, count) | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to test_firewall_manager, or deleted, since some of those test don't make sense for the new code
@@ -64,34 +60,3 @@ def test_nic_ordinary(self): | |||
self.assertEqual(str(nic), '{ "name": "test0", "link": "link INFO", "ipv4": ["ipv4-1"], "ipv6": ["ipv6-1"] }') | |||
|
|||
|
|||
class TestAddFirewallRules(AgentTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to test_firewall_manager
if command -v wget >/dev/null 2>&1; then | ||
wget --tries=3 "http://$WIRE_IP/?comp=versions" --timeout=5 -O "/tmp/wire-versions-$USER.xml" | ||
else | ||
if command -v curl >/dev/null 2>&1; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, on Azure Linux 3, wget returns success (0) on timeout. I am giving preference to curl instead.
elif command -v wget >/dev/null 2>&1; then | ||
wget --tries=3 "http://$WIRE_IP/?comp=versions" --timeout=5 -O "/tmp/wire-versions-$USER.xml"else | ||
else | ||
http_get.py "http://168.63.129.16/?comp=versions" --timeout 5 --delay 5 --tries 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this simple tool by doing surgery on some of the networking code in the agent. I used it to debug the above issue with wget. Since I already created it, I'm leaving it here as a fallback to curl/wget (initially it was meant as a replacement for them, but just changing the order of curl/wget fixed the original issue)
azurelinuxagent/common/event.py
Outdated
|
||
def warn(op, fmt, *args): | ||
""" | ||
Creates a telemetry event and logs the message as INFO; convenience wrapper over add_event(log_level=logger.LogLevel.WARNING...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: WARN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks; fixed
azurelinuxagent/common/event.py
Outdated
|
||
def error(op, fmt, *args): | ||
""" | ||
Creates a telemetry event and logs the message as INFO; convenience wrapper over add_event(log_level=logger.LogLevel.ERROR...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:ERROR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks; fixed
azurelinuxagent/common/event.py
Outdated
""" | ||
Creates a telemetry event and logs the message as INFO; convenience wrapper over add_event(log_level=logger.LogLevel.ERROR...) | ||
""" | ||
logger.warn(fmt, *args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks; fixed
I didn't complete my review, will give another round trmw |
else: | ||
cmd = AgentCommands.Help | ||
break | ||
regex_cmd = regex_cmd_format.format("{0}=(?P<endpoint>[\\d.]{{7,}})".format(AgentCommands.SetupFirewall)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, I think the SetupFirewall regex check should be in a separate elif condition for readability.
That way the only thing in the else condition is Help command, and it will make it easier to add more commands later
@@ -55,20 +59,17 @@ def _cleanup_metadata_protocol_certificates(): | |||
_ensure_file_removed(lib_directory, _LEGACY_METADATA_SERVER_TRANSPORT_CERT_FILE_NAME) | |||
_ensure_file_removed(lib_directory, _LEGACY_METADATA_SERVER_P7B_FILE_NAME) | |||
|
|||
def _reset_firewall_rules(osutil): | |||
|
|||
def _reset_firewall_rules(): | |||
""" | |||
Removes MetadataServer firewall rule so IMDS can be used. Enables | |||
WireServer firewall rule based on if firewall is configured to be on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can be updated since we are no longer enabling wiereserver firewall rule
azurelinuxagent/ga/env.py
Outdated
|
||
def _report(self, report_function, message, *args): | ||
# Report the first 6 messages, then stop reporting for 12 hours |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says 6 messages, but limit is 3 (line 133)
|
||
self._setup_network_setup_service() | ||
firewall_manager.remove_legacy_rule() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need try/catch error, so that in all cases agent continue setup the persistent firewall rules with firewalld
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good point, will add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Adding support for nftables.
The code for setting up the firewall was scattered across multiple places and was not straightforward to follow up. Adding support for nftables on top of the existing code would have made things worse, so this PR also includes significant refactoring of that code and associated tests.
I also moved the code from the 'common' subdirectory to 'ga', since it is not needed by the provisioning agent.