Are there tools to help aggregate logs or for getting a stream on the output?
The archive-mapred jar has classes that will help you aggregate the content of the userlogs directory across the cluster. To stream the content of one remote userlog directory, do the following: % ${HADOOP_HOME}/bin/hadoop jar archive-mapred-0.2.0-SNAPSHOT.jar org.archive.mapred.ArchiveTaskLog http://192.168.1.107:50060/logs/userlogs/task_0019_m_000000_0/syslog/ The archive-mapred has a primitive mapreduce job based on hadoop-1199 content for streaming all logging from a particular job.