【发布时间】:2018-03-03 18:58:29
【问题描述】:
我正在使用 kafka 连接分发。 命令是:bin/connect-distributed etc/schema-registry/connect-avro-distributed.properties
worker配置为:
bootstrap.servers=kafka1:9092,kafka2:9092,kafka3:9092 group.id=连接集群 key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=false value.converter.schemas.enable=falsekafka 连接重新开始,没有错误!
主题connect-configs,connect-offsets,connect-statuses 已创建。 主题 mysiteview 已创建。
然后我使用这样的 RESTful API 创建 kafka 连接器:
curl -X POST -H "Content-Type: application/json" --data '{"name":"hdfs-sink-mysiteview","config":{"connector.class":"io.confluent.connect. hdfs.HdfsSinkConnector","tasks.max":"3","topics":"mysiteview","hdfs.url":"hdfs://master1:8020","topics.dir":"/kafka/topics ","logs.dir":"/kafka/logs","format.class":"io.confluent.connect.hdfs.avro.AvroFormat","flush.size":"1000","rotate.interval. ms":"1000","partitioner.class":"io.confluent.connect.hdfs.partitioner.DailyPartitioner","path.format":"YYYY-MM-dd","schema.compatibility":"BACKWARD" ,"locale":"zh_CN","timezone":"Asia/Shanghai"}}' http://kafka1:8083/connectors当我为“mysiteview”主题生成数据时,如下所示:
{"f1":"192.168.1.1","f2":"aa.example.com"}java代码如下:
Properties props = new Properties();
props.put("bootstrap.servers","kafka1:9092");
props.put("acks","all");
props.put("retries",3);
props.put("batch.size", 16384);
props.put("linger.ms",30);
props.put("buffer.memory",33554432);
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String,String>(props);
Random rnd = new Random();
for(long nEvents = 0; nEvents < events; nEvents++) {
long runtime = new Date().getTime();
String site = "www.example.com";
String ipString = "192.168.2." + rnd.nextInt(255);
String key = "" + rnd.nextInt(255);
User u = new User();
u.setF1(ipString);
u.setF2(site+" "+rnd.nextInt(255));
System.out.println(JSON.toJSONString(u));
producer.send(new ProducerRecord<String,String>("mysiteview",JSON.toJSONString(u)));
Thread.sleep(50);
}
producer.flush();
producer.close();
奇怪的事情发生了。 我从 kafka-logs 获取数据,但在 hdfs 中没有数据(没有主题目录)。 我尝试连接器命令:
curl -X GET http://kafka1:8083/connectors/hdfs-sink-mysiteview/status输出是:
{"name":"hdfs-sink-mysiteview","connector":{"state":"RUNNING","worker_id":"10.255.223.178:8083"},"tasks":[{"state":" RUNNING","id":0,"worker_id":"10.255.223.178:8083"},{"state":"RUNNING","id":1,"worker_id":"10.255.223.178:8083"}, {"state":"RUNNING","id":2,"worker_id":"10.255.223.178:8083"}]}但是当我使用以下命令检查任务状态时:
curl -X GET http://kafka1:8083/connectors/hdfs-sink-mysiteview/hdfs-sink-siteview-1我得到结果:“错误 404”。三个任务是一样的错误!
怎么了?
【问题讨论】:
标签: apache-kafka connect confluent-platform