Parse logs and report alert when 'n' errors within 'm' minutes
-
Hello, I have a log file with the following format:
2019-09-02 10:11:12 ERROR 10.12.46.2 user1 ... 2019-09-02 10:11:12 INFO message ... 2019-09-02 10:11:12 ERROR unexpected ... 2019-09-02 10:11:12 ERROR 10.12.46.2 user1 ... 2019-09-02 10:11:13 INFO message ... 2019-09-02 10:11:13 ERROR unexpected ... 2019-09-02 10:11:13 ERROR 10.12.46.3 user1 ... 2019-09-02 10:11:14 INFO message ... 2019-09-02 10:11:14 ERROR unexpected ... 2019-09-02 10:11:14 ERROR 10.12.46.4 user1 ... 2019-09-02 10:11:15 INFO message ... 2019-09-02 10:11:15 ERROR unexpected ... 2019-09-02 10:11:15 ERROR 10.12.46.4 user1 ... 2019-09-02 10:11:16 INFO message ... 2019-09-02 10:11:16 ERROR unexpected ... 2019-09-02 10:11:16 ERROR 10.12.46.5 user1 ... 2019-09-02 10:11:17 INFO message ... 2019-09-02 10:11:17 ERROR unexpected ... 2019-09-02 10:11:17 ERROR 10.12.46.5 user1 ... 2019-09-02 10:11:18 INFO message ... 2019-09-02 10:11:18 ERROR unexpected ...
I am able to read from the file and count how many errors logged by the same ip: this is standard counting.
In addition, I am asked to "log a message when an ip have too many errors (let's say 50) in a period of time (let's say 5 minutes)"
I am stuck at figuring out how I can parse all possible 5 minutes of logs: is brute force the only option? Should I convert the date in absolute time (epochs) so that is easier to calculate the 5 minutes sliding window?
Is there a strategy to follow for this computation?
Lastly: while the first part of the question can be coded in 10 minutes, how long it could reasonably take to whiteboard a decent answer to such question during a coding interview onsite?
Thanks in advance for any suggestion!
dom
-
Can you share how you are counting log outputs by ip addresses?
-
Hello Avan, thanks for your reply.
Here below is my tested solution for the first partimport re from collections import Counter count_counter = [] with open(filename) as f: for line in f: if "ERROR" in line.split()[2]: ip = ip = re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}',line.split()[3]) if ip: count_counter.append(ip.group()) count = Counter(count_counter) top_count = count.most_common(3) for k,v in top_count: print(f'{k} ----> {v}')
Any suggestion on how to tackle the second part?
Thanks!
-
Hmm.. . I wish I could be of help but this does not look like something I can help with. Python is not my strong suit. Sorry.
-
it's fine, An: glad to share...