Is Hadoop appropriate for real-time log processing?
I’m still working through the training videos and doc, so I haven’t quite got into the Hadoop paradigm of thinking yet. Hopefully the community can help me determine if Hadoop is the right solution for what I’m looking to do. I’m considering Hadoop for processing various access logs for use by technical support at an independent ISP. The idea is to provide something like an index via username, so that a support technician can look up all activity across various logs associated with a particular customer. One of my concerns is that we don’t have the scale that would benefit by large scale data processing that Hadoop is targeted at. A month’s worth of logs weigh in at around 100 gigs, and we’d probably have 10 or so machines (at the most) to be able to throw at Hadoop. This seems like small potatoes compared to what Hadoop is meant to do. Another concern is with HDFS’s inability to update files. Ideally, support techs would have access to real-time log indexes, since often times they’ll