Navigation

    ask avan logo
    • Register
    • Login
    • Search
    • Categories
    • Unsolved
    • Solved

    Parse logs and report alert when 'n' errors within 'm' minutes

    Python
    2
    5
    27
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N
      neuromancer last edited by

      Hello, I have a log file with the following format:

      2019-09-02 10:11:12 ERROR 10.12.46.2 user1 ...
      2019-09-02 10:11:12 INFO message ...
      2019-09-02 10:11:12 ERROR unexpected ...
      2019-09-02 10:11:12 ERROR 10.12.46.2 user1 ...
      2019-09-02 10:11:13 INFO message ...
      2019-09-02 10:11:13 ERROR unexpected ...
      2019-09-02 10:11:13 ERROR 10.12.46.3 user1 ...
      2019-09-02 10:11:14 INFO message ...
      2019-09-02 10:11:14 ERROR unexpected ...
      2019-09-02 10:11:14 ERROR 10.12.46.4 user1 ...
      2019-09-02 10:11:15 INFO message ...
      2019-09-02 10:11:15 ERROR unexpected ...
      2019-09-02 10:11:15 ERROR 10.12.46.4 user1 ...
      2019-09-02 10:11:16 INFO message ...
      2019-09-02 10:11:16 ERROR unexpected ...
      2019-09-02 10:11:16 ERROR 10.12.46.5 user1 ...
      2019-09-02 10:11:17 INFO message ...
      2019-09-02 10:11:17 ERROR unexpected ...
      2019-09-02 10:11:17 ERROR 10.12.46.5 user1 ...
      2019-09-02 10:11:18 INFO message ...
      2019-09-02 10:11:18 ERROR unexpected ...
      

      I am able to read from the file and count how many errors logged by the same ip: this is standard counting.

      In addition, I am asked to "log a message when an ip have too many errors (let's say 50) in a period of time (let's say 5 minutes)"

      I am stuck at figuring out how I can parse all possible 5 minutes of logs: is brute force the only option? Should I convert the date in absolute time (epochs) so that is easier to calculate the 5 minutes sliding window?

      Is there a strategy to follow for this computation?

      Lastly: while the first part of the question can be coded in 10 minutes, how long it could reasonably take to whiteboard a decent answer to such question during a coding interview onsite?

      Thanks in advance for any suggestion!

      dom

      Reply Quote 0
        1 Reply Last reply

      • avan
        avan last edited by

        @neuromancer

        Can you share how you are counting log outputs by ip addresses?

        Reply Quote 0
          1 Reply Last reply

        • N
          neuromancer last edited by

          Hello Avan, thanks for your reply.
          Here below is my tested solution for the first part

          import re
          from collections import Counter
          
          count_counter = []
          with open(filename) as f:
               for line in f:
                   if "ERROR" in line.split()[2]:
                      ip = ip = re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}',line.split()[3])
                       if ip:
                          count_counter.append(ip.group())
          count = Counter(count_counter)
          top_count = count.most_common(3)
          for k,v in top_count:    
              print(f'{k} ----> {v}')
          

          Any suggestion on how to tackle the second part?

          Thanks!

          Reply Quote 0
            1 Reply Last reply

          • avan
            avan last edited by avan

            Hmm.. . I wish I could be of help but this does not look like something I can help with. Python is not my strong suit. Sorry.

            Reply Quote 0
              1 Reply Last reply

            • N
              neuromancer last edited by

              it's fine, An: glad to share...

              Reply Quote 0
                1 Reply Last reply

              • First post
                Last post