Flume collector example from Cloudera's UserGuide does not work as expected

The bit in the UserGuide that shows you how to setup a collector and write to it http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors has this configuration:

host : console | agentSink("localhost",35853) ; collector : collectorSource(35853) | console ;

I changed this to:

dataSource : console | agentSink("localhost") ; dataCollector : collectorSource() | console ;

I spawned the nodes as:

flume node_nowatch -n dataSource flume node_nowatch -n dataCollector

I have tried this on two systems:

  1. Cloudera's own demo VM running inside VirtualBox with 2GB RAM. It comes with Flume 0.9.4-cdh3u2
  2. Ubuntu LTS (Lucid) with the debian package and openJDK (minus any hadoop packages installed) as a VM running inside VirtualBox with 2GB RAM Followed the steps here https://ccp.cloudera.com/display/CDHDOC/Flume+Installation#FlumeInstallation-InstallingtheFlumeRPMorDebianPackages

Here is what I did:

flume dump 'collectorSource()' leads to

$ sudo netstat -anp | grep 35853 tcp6 0 0 :::35853 :::* LISTEN 3520/java $ ps aux | grep java | grep 3520 1000 3520 0.8 2.3 1050508 44676 pts/0 Sl+ 15:38 0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console;

My assumption is that:

flume dump 'collectorSource()'

is same as running the config:

dump : collectorSource() | console ;

and starting the node with

flume node -1 -n dump -c "dump: collectorSource() | console;" -s

dataSource : console | agentSink("localhost") leads to

$ sudo netstat -anp | grep 35853 tcp6 0 0 :::35853 :::* LISTEN 3520/java tcp6 0 0 127.0.0.1:44878 127.0.0.1:35853 ESTABLISHED 3593/java tcp6 0 0 127.0.0.1:35853 127.0.0.1:44878 ESTABLISHED 3520/java $ ps aux | grep java | grep 3593 1000 3593 1.2 3.0 1130956 57644 pts/1 Sl+ 15:41 0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource

The observed behaviour is exactly the same in both the VirtualBox VMs:

Un-ending flow of this at dataSource

2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO durability.NaiveFileWALManager: File lives in /tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034 2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO durability.NaiveFileWALManager: opening log file 20111215-152748172-0500.1116926245855.00000034 2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO endtoend.AckListener$Empty: Empty Ack Listener began 20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO agent.WALAckManager: Ack for 20111215-152748172-0500.1116926245855.00000034 is queued to be checked 2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO durability.WALSource: end of file NaiveFileWALManager (dir=/tmp/flume-cloudera/agent/dataSource ) 2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager: Retransmitting 20111215-152657736-0500.1066489868855.00000034 after being stale for 60048ms 2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO durability.NaiveFileWALManager: opening log file 20111215-152657736-0500.1066489868855.00000034 2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO agent.WALAckManager: Ack for 20111215-152657736-0500.1066489868855.00000034 is queued to be checked 2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO durability.WALSource: end of file NaiveFileWALManager (dir=/tmp/flume-cloudera/agent/dataSource ) 2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO hdfs.SeqfileEventSink: closed /tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO endtoend.AckListener$Empty: Empty Ack Listener ended 20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO durability.NaiveFileWALManager: File lives in /tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034 2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO durability.NaiveFileWALManager: opening log file 20111215-152758253-0500.1127006668855.00000034 2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO endtoend.AckListener$Empty: Empty Ack Listener began 20111215-152808335-0500.1137089135855.00000034 2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO agent.WALAckManager: Ack for 20111215-152758253-0500.1127006668855.00000034 is queued to be checked 2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO durability.WALSource: end of file NaiveFileWALManager (dir=/tmp/flume-cloudera/agent/dataSource ) 2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO hdfs.SeqfileEventSink: closed /tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034 2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO endtoend.AckListener$Empty: Empty Ack Listener ended 20111215-152808335-0500.1137089135855.00000034 .. 2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager: Retransmitting 20111215-152707823-0500.1076576334855.00000034 after being stale for 60277ms 2011-12-15 15:35:24,763 [Heartbeat] INFO durability.NaiveFileWALManager: Attempt to retry chunk '20111215-152707823-0500.1076576334855.00000034' in LOGGED state. There is no need for state transition.

Un-ending flow of this at dataCollector:

localhost [INFO Thu Dec 15 15:31:09 EST 2011] { AckChecksum : (long)1323981059821 (string) ' 4Ck��' (double)6.54133557402E-312 } { AckTag : 20111215-153059819-0500.1308572847855.00000034 } { AckType : end }

How do I get the console <-> console communication via collectors working again correctly?

-------------Problems Reply------------

I'm not exactly sure what your expected behavior is.

But it looks like you may only be binding to the IPv6 interface. I know in the Hadoop config you have to work around this:

# Ubuntu wants us to use IPv6. Hadoop doesn't support that, but nevertheless binds to :::50010. Let's tell it we don't agree.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

You may need a similar option. To start with, why not set the hostname and port number explicitly, and then back off each in turn?

  • Go to /usr/lib/flume/bin
  • rename the file called : flume-env.sh.template to : flume-env.sh
  • add this line at the end of the file : export UOPTS=-Djava.net.preferIPv4Stack=true
  • restart your flume instances

=> you will only listen on IP v4 addresses.

Category:documentation Views:0 Time:2011-12-18

Related post

  • How can I run an agent node and a collector node on a same machine using flume-node command? 2012-03-16

    I'm trying to test aggregating logs using Flume. Installed CDH3u3 (name node, secondary name node, job tracker, data node, task tracker) and Flume (flume, flume-master, and flume-node) on ubuntu machine host1. For Flume installtion I followed https:/

  • Distributed Logging with flume 2010-11-13

    I have a mobile service distributed over 7 servers each of them doing a specific task. I want to log information from them and later derive business intelligence from them. I have rounded it to Flume. How can I use it to gather information? My system

  • Flumes installation on window OS 2012-03-20

    Hi I am very new to Flume. I have installed flumes 0.9.3 version and able to start the node in window OS. But to move forward I need certain basic things. Can any one help me to setup flumes completely? How to setup flumes agent and configure with an

  • Java Collections and Garbage Collector 2010-01-18

    A little question regarding performance in a Java web app. Let's assume I have a List<Rubrique> listRubriques with ten Rubrique objects. A Rubrique contains one list of products (List<product> listProducts) and one list of clients (List

  • Captivate 6 in Creative Cloud? 2014-10-30

    Will Creative Cloud (CS6 and beyond) include Captivate 6? All other applications in eLearning Suite 2.5 are already in the Creative Cloud so it seems reasonable that Captivate would go to the cloud too. If I get everything in Creative Cloud that is i

  • Twitter - Hadoop Data Streaming 2012-02-07

    How do we get the twitter(Tweets) into HDFS for offline analysis. we have a requirement to analyze tweets. --------------Solutions------------- I would look for solution in well developed area of streaming logs into hadoop, since the task looks somew

  • Apache Hadoop : Can it do "time-varying" input? 2010-12-03

    I haven't found an answer to this even after a bit of googling. My input files are generated by a process which chunks them out at say, when the file touches 1GB. Now, if I were to run a mapreduce job, which processes an input directory in the dfs, h

  • What to use for real-time log aggregation and quering? 2011-04-16

    I'm searching for tool/database/solution that can help me with aggregating real-time logs and can query them also in real-time. Basic requirement is ability to deliver results as soon as possible, keeping in mind, that there might be many of events t

  • rsync files to hadoop 2011-06-23

    I have 6 servers and each contains a lot of logs. I'd like to put these logs to hadoop fs via rsync. Now I'm using fuse and rsync writes directly to fuse-mounted fs /mnt/hdfs. But there is a big problem. After about a day, the fuse deamon occupies 5

  • What is better between Chukwa & Scribe? 2011-12-13

    I am using Hadoop , but for logging i need something. But i don't know which is better system for logging in between Scribe and Chukwa. Can you guys please tell me? And if there are any alternatives which are easy to mingle with Hadoop please let me

  • MVC - Passing Data with RedirectToAction() 2009-03-23

    I'd like to take data entered in an MVC user form and display it in a different view. The class has the following private variable: IList<string> _pagecontent = new List<string>(); The following action accepts a FormCollection object, val

  • HTTP Request performance for large volumes of requests 2009-10-13

    I'm looking for some advice on how to optimise the following process: App reads csv file. For each line in the file an XML message is created Each XML message is posted to a URL via a HTTPWebRequest This process was designed to handle low volumes of

  • Collect objects still in scope - GC.Collect 2011-06-21

    I have read this article: http://blogs.msdn.com/b/oldnewthing/archive/2010/08/10/10048149.aspx and honestly I don't understand every detail. As far as i understand in the code below c should be collected even if i don't set c to null. Another thing i

  • Generic companion object supertype in Scala 2011-07-17

    I have two classes, Guid and UserGuid. Guid has one type argument. UserGuid is a special case of Guid that represent an entity (User) for which there is no class, so I've implemented it as a Guid[Any]. I have several apply methods for Guid which I wo

  • How is it that the Mono sources are mostly C#? 2011-11-07

    I just looked at the source of Mono for the first time and I thought I would find a bunch of C or C++ code, instead I found 26,192 .cs files and 7 .cpp files. I am not totally shocked but it made me think of a quesiton I've always had in the back of

  • outlook file history loadup to office 365 2014-09-13

    I am a long time user of Outlook 2007 and have very complicated file structure that I use. I am considering Office 365 in order to migrate my personal systems to the cloud from my personal computer. I expect to upgrade to Office 2010 when I do. Can I

  • How to build Cloudera's Flume (on OS X)? Null errors on maven build 2011-09-09

    I downloaded Flume source from: https://github.com/cloudera/flume/tarball/release-0.9.4 I'm getting a NullPointerException, shown below. If anyone has any tips, would be much appreciated. I run mvn compile: [INFO] Scanning for projects... [INFO] Reac

  • How to install Cloudera Flume onto the linux Gentoo (EngineYard) 2011-11-30

    asking if anybody knows some options to install Cloudera Flume on the linux Gentoo EngineYard's instance through the portage (emerge) Or chef recipe to compile flume. Thanks! --------------Solutions------------- Gentoo does not support maven for buil

  • Flume "config" command stays in EXECING state when using hbase sink (Cloudera) 2012-02-08

    I'm trying to use flume to stream logs into hbase. I have a distributed flume setup running and an hbase cluster; both are using the same Zookeeper. Flume is working with various commands (text, tail, custom sink filter) but when I try and use the fo

Copyright (C) dskims.com, All Rights Reserved.

processed in 0.218 (s). 11 q(s)