Generating report with parsing a big file (Snort) taking to long

Bachi-code · November 20, 2020, 8:22pm

Hi!
So I’m trying to do a report view with a data from multiple snort log files, which can contain about 100K+ records. The problem is that this task takes to long. For a single file I’m taking information from a single line assign it to dictionary and then to OrderedDict. In my view I can access data in DataTable and ECharts. What would be the best approach to optimize process? I heard about asynchronous support in Django 3, but it’s a mystery for me
Here is my view function:

    @login_required
    def generate_report_fast(request):
        if UploadFile.objects.filter(user=request.user):
            files = UploadFile.objects.filter(user=request.user)
            dict_alert = OrderedDict()
            count = 1
            for file in files:
                path = file.upload_file.path
                date = str(file.get_year())
                with open(path) as alertfile:
                    for line in alertfile:
                        if line is None:
                            break
                        dict_alert[count] = read_data(line, date)
                        count += 1

            proto_count_data = protocount(dict_alert)
            ip_count_data = ipcount(dict_alert)
            classi_count_data = classicount(dict_alert)
            priority_count_data = prioritycount(dict_alert)
            time_count_data = timecount(dict_alert)

            return render(request, 'report.html', {'pkts': dict_alert, 'proto': proto_count_data, 'ip_data': ip_count_data,
                                                    'classi_data': classi_count_data, 'priority_data': priority_count_data,
                                                    'time_data': time_count_data})
        else:
            return render(request, 'details_error.html')

KenWhitesell · November 21, 2020, 12:14am

There might be a couple different ways to improve this, but I’m not sure that any of them are going to make this “fast” if you’re processing hundreds of thousands of lines.

First, I’d take a look at how you’re loading the data for internal use. You don’t show the details of your read_data method, so I can’t offer any suggestions there.

But, in your processing block

proto_count_data = protocount(dict_alert)
ip_count_data = ipcount(dict_alert)
classi_count_data = classicount(dict_alert)
priority_count_data = prioritycount(dict_alert)
time_count_data = timecount(dict_alert)

I’m guessing that you’re iterating over your dict_alert one time for each of these statistics you’re gathering. You’re going through all your collected data five time. You can reduce the time spent iterating over that data by collecting your stats in one loop. (You didn’t include the definitions for protocount, ipcount, classicount, prioritycount, or timecount, so I can’t be more specific than that.) In theory, that will reduce the amount of time spent in that section of the code by 80%.

Also, if you’re going to be running multiple sets of reports over a period of time with effectively the same data, there may be value in storing this in a database so that you’re not reading and parsing the data every time you need it. However, the value of this is extremely sensitive to your eventual usage of the data being handled.

Ken

Topic		Replies	Views
Post Reponse is too slow Using Django	6	824	August 10, 2021
Server Time Out when creating a CSV file from Django model Using Django	2	754	October 19, 2021
generate pdf/excel files using only one view Getting Started	2	441	July 22, 2023
Maybe it's just a blind spot, when it comes to async django Async	22	5669	May 5, 2024
Generate Printed Report file Getting Started	1	339	July 31, 2023

Generating report with parsing a big file (Snort) taking to long

Related topics