Parse logs and report alert when 'n' errors within 'm' minutes


  • Hello, I have a log file with the following format:

    2019-09-02 10:11:12 ERROR 10.12.46.2 user1 ...
    2019-09-02 10:11:12 INFO message ...
    2019-09-02 10:11:12 ERROR unexpected ...
    2019-09-02 10:11:12 ERROR 10.12.46.2 user1 ...
    2019-09-02 10:11:13 INFO message ...
    2019-09-02 10:11:13 ERROR unexpected ...
    2019-09-02 10:11:13 ERROR 10.12.46.3 user1 ...
    2019-09-02 10:11:14 INFO message ...
    2019-09-02 10:11:14 ERROR unexpected ...
    2019-09-02 10:11:14 ERROR 10.12.46.4 user1 ...
    2019-09-02 10:11:15 INFO message ...
    2019-09-02 10:11:15 ERROR unexpected ...
    2019-09-02 10:11:15 ERROR 10.12.46.4 user1 ...
    2019-09-02 10:11:16 INFO message ...
    2019-09-02 10:11:16 ERROR unexpected ...
    2019-09-02 10:11:16 ERROR 10.12.46.5 user1 ...
    2019-09-02 10:11:17 INFO message ...
    2019-09-02 10:11:17 ERROR unexpected ...
    2019-09-02 10:11:17 ERROR 10.12.46.5 user1 ...
    2019-09-02 10:11:18 INFO message ...
    2019-09-02 10:11:18 ERROR unexpected ...
    

    I am able to read from the file and count how many errors logged by the same ip: this is standard counting.

    In addition, I am asked to "log a message when an ip have too many errors (let's say 50) in a period of time (let's say 5 minutes)"

    I am stuck at figuring out how I can parse all possible 5 minutes of logs: is brute force the only option? Should I convert the date in absolute time (epochs) so that is easier to calculate the 5 minutes sliding window?

    Is there a strategy to follow for this computation?

    Lastly: while the first part of the question can be coded in 10 minutes, how long it could reasonably take to whiteboard a decent answer to such question during a coding interview onsite?

    Thanks in advance for any suggestion!

    dom



  • @neuromancer

    Can you share how you are counting log outputs by ip addresses?



  • Hello Avan, thanks for your reply.
    Here below is my tested solution for the first part

    import re
    from collections import Counter
    
    count_counter = []
    with open(filename) as f:
         for line in f:
             if "ERROR" in line.split()[2]:
                ip = ip = re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}',line.split()[3])
                 if ip:
                    count_counter.append(ip.group())
    count = Counter(count_counter)
    top_count = count.most_common(3)
    for k,v in top_count:    
        print(f'{k} ----> {v}')
    

    Any suggestion on how to tackle the second part?

    Thanks!



  • Hmm.. . I wish I could be of help but this does not look like something I can help with. Python is not my strong suit. Sorry.



  • it's fine, An: glad to share...