Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Is Hadoop appropriate for real-time log processing?

April 26, 2017appropriate hadoop log real-time

0

Posted

Is Hadoop appropriate for real-time log processing?

1 Answer

0

Posted

I’m still working through the training videos and doc, so I haven’t quite got into the Hadoop paradigm of thinking yet. Hopefully the community can help me determine if Hadoop is the right solution for what I’m looking to do. I’m considering Hadoop for processing various access logs for use by technical support at an independent ISP. The idea is to provide something like an index via username, so that a support technician can look up all activity across various logs associated with a particular customer. One of my concerns is that we don’t have the scale that would benefit by large scale data processing that Hadoop is targeted at. A month’s worth of logs weigh in at around 100 gigs, and we’d probably have 10 or so machines (at the most) to be able to throw at Hadoop. This seems like small potatoes compared to what Hadoop is meant to do. Another concern is with HDFS’s inability to update files. Ideally, support techs would have access to real-time log indexes, since often times they’ll