Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

635 beaconfqdn slow #638

Merged
merged 11 commits into from
Jun 2, 2021
Merged

635 beaconfqdn slow #638

merged 11 commits into from
Jun 2, 2021

Conversation

fullmetalcache
Copy link
Contributor

Credits to @lisaSW as well

We modified how we are performing queries when performing beacon FQDN analysis.

Previously, we would:

  1. Get a list of resolved IPs for an FQDN
  2. Get a list of Src IPs that connected to any of the resolved IPs
  3. For each Src IP, we queried the UCONN table for entries where the Src IP was a Src and any of the resolved IPs were a Dst
  4. We then performed the beacon FQDN analysis for the set of timestamps

Now, we do the following:

  1. Get a list of resolved IPs for an FQDN
  2. Query the UCONN table for all entries where any of the resolved IPs appear as a Dst and then group the results based up Src IP/UUID
  3. We then iterate over the array of results such that each array entry contains, among other information, time stamps between a Src IP/UUID and any of the resolved IPs for an FQDN. Beacon FQDN analysis is performed on each of these sets of time stamps

This subtle difference resulted in over an 8x speed up from the initial implementation.

@fullmetalcache
Copy link
Contributor Author

Closes #635

@william-stearns
Copy link
Contributor

In pkg/beaconfqdn/dissector.go, there appears to be a hardcoded IP:
"$or": [{"dst":"172.217.4.226"}]
Any chance this is leftover from testing?

@fullmetalcache
Copy link
Contributor Author

fullmetalcache commented Apr 27, 2021

It's part of the example Mongo query for that piece of code. The IP is from the dnscat2-ja3-strobe-agent set. It's not actually used in the code; just there so that the query is valid without modification. Good check though!

Copy link
Contributor

@Zalgo2462 Zalgo2462 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good. We need to tweak one thing regarding the network_names.

Great work y'all!

@@ -95,7 +191,7 @@ func (d *dissector) start() {
"icerts": bson.M{"$anyElementTrue": []interface{}{"$dat.icerts"}},
}},
{"$group": bson.M{
"_id": "$src",
"_id": bson.M{"src": "$src", "uuid": "$src_network_uuid", "network": "$src_network_name"},
Copy link
Contributor

@Zalgo2462 Zalgo2462 Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot group on network name as multiple network names may be associated with the same network uuid.
This is a downstream result of how winlogbeat identifies Sysmon agents.

Instead, please use "network_name": bson.M{"$first": "$src_network_name"} "network_name": bson.M{"$first": "$src_network_name"}. Unfortunately, this will need to be added to most of the clauses below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, queries used $last to group on network uuids so that we'd be referencing the last known netbios name in case it ever changed. Is there a reason why we should use $first here instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Great catch. I'm just a big dummy. 🤡 Please use $last.

@Zalgo2462
Copy link
Contributor

Merged master back into this branch. Solved merge conflicts by removing the icerts calculations in fqdn beacons since it was removed in master.

Also, the fqdn analysis is like wicked fast yo 🙌

@fullmetalcache
Copy link
Contributor Author

Believe I have the comments addressed and this is ready for the final approval.

Thanks!

"tbytes": {"$sum": "$dat.tbytes"},
}},
{"$group": {
"_id": {"src": "$src", "uuid": "$src_network_uuid", "network": "$src_network_name"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment needs to be changed to match the recent changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR is ready to pull in otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think i've got it fixed, thanks!

Copy link
Contributor

@Zalgo2462 Zalgo2462 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested and it seemed to work well.

@Zalgo2462 Zalgo2462 merged commit 5895fc4 into master Jun 2, 2021
@Zalgo2462 Zalgo2462 deleted the 635-beaconfqdn-slow branch June 2, 2021 17:24
@bekirk
Copy link

bekirk commented Jun 29, 2021

I have a question on this. I just tried this on a large dataset, and it is really slow, might finish in (2 hours of Bro data took over 30 minutes) 6-8 hours, i'm not sure yet.

Are there any configuration changes I can make? I tried upping the Threshold to 80 and even 200, and got a little improvement.

Is it possible to run all the analysis beside FDQN Beacon, then come back and run just the FQDN Beacon so that I might be able to start looking at the other data while the FDQN Beacon is still processing?

Thank you,
Brian Kirk

@Zalgo2462
Copy link
Contributor

Hello, thank you for letting us know about the issue. The code patch here is set to be released as part of RITA v4.3.0 https://github.com/activecm/rita/releases/download/v4.3.0/install.sh. We are currently running quality control testing on the new release, but it is available as a pre-release. We expect that v4.3.0 will have a formal release sometime in the coming week.

Would you be interested in giving version 4.3.0 a try and letting us know if the amount of time required to process the FQDN Beacons goes down?

Note: RITA v4.3.0 requires upgrading MongoDB to version 4.2. Unfortunately, this means that previous versions of RITA will no longer continue to work as the maximum version of MongoDB supported by previous versions is 3.6.

@bekirk
Copy link

bekirk commented Jun 29, 2021

Sure I can test, I might need figure out a backout plan if I need to go back to mongodb 3.6

I have all my current Rita data indexed into splunk so I typically don't need to go back to previous data.

@bekirk
Copy link

bekirk commented Jun 30, 2021

I installed 4.3.0 and tested. I had some problems with the zeek and mongodb installs but since I already installed mongodb and I don't run bro nor zeek on this system, I was able to run the install with no zeek and no mongo and it worked.

The beacons-FQDN is really fast now! I will see what the runtime is in the morning for a full day's worth of logs. A difference between the beacons-fqdn and beacons, it doesn't have the total bytes like the beacons does now. Also I notice that I have almost 10 times as many rows in the beacons-fqdn as I do in the beacons, is there some way to put all the rows for the same beacon activity into one row with a multivalue field like the "Port:Protocol:Service" in long-conns? Maybe you can even add the dest-ip, unless I am missing something with the what the beacons-fqdn is doing, I think it just help see what the possible dns request of the traffic was using a reverse lookup, like if it was login.microsoftonline.com you might be able to filter out the result. However for one connection to google I see all these FQDN's all from one source IP with the same values in every other field.:
clients1.google.com
sb-ssl.l.google.com
music.youtube.com
clients5.google.com
play.google.com
photos.google.com
sb-ssl.google.com
clients2.google.com
www.youtube.com
admin.google.com
maps.google.com
ytimg.l.google.com
clients6.google.com
clients.l.google.com
img.youtube.com
clients4.google.com
clients3.google.com
youtube-ui.l.google.com
calendar.google.com
maps-api-ssl.google.com
android.clients.google.com

Maybe give an option to include the Dest-IP and another option if you want to condense the data with a multi-value field.

Thank you,
Brian

@bekirk
Copy link

bekirk commented Jun 30, 2021

So the Import command ran in a reasonable time, however the rita show-beacons-fqdn took almost 6 hours to run and is 5.1G in size:

/opt/rita/DataPreSplunk/2021-06-29$ ls -lh
total 5.2G
-rw-rw-r-- 1 rita rita 5.1G Jun 30 12:13 beacons-fqdn.log
-rw-rw-r-- 1 rita rita  19M Jun 30 12:13 beacons.log

Does this seam right?

Thank you,
Brian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants