【问题标题】:How to test keyedbroadcastprocessfunction in flink?如何在flink中测试keyedbroadcastprocessfunction?
【发布时间】:2020-11-05 06:00:00
【问题描述】:

我是 flink 新手,我正在尝试编写 junit 测试用例来测试 KeyedBroadCastProcessFunction。下面是我的代码,我目前正在调用 TestUtils 类中的 getDataStreamOutput 方法,并将 inputdata 和 patternrules 传递给方法,一旦根据模式规则列表评估输入数据并且如果输入数据满足条件,我将获得信号并调用 sink 函数和在 getDataStreamOutput 方法中将输出数据作为字符串返回

 @Test
    public void testCompareInputAndOutputDataForInputSignal() throws Exception {
        Assertions.assertEquals(sampleInputSignal,
                TestUtils.getDataStreamOutput(
                        inputSignal,
                        patternRules));
    }



public static String getDataStreamOutput(JSONObject input, Map<String, String> patternRules) throws Exception {

            env.setParallelism(1);

            DataStream<JSONObject> inputSignal = env.fromElements(input);

            DataStream<Map<String, String>> rawPatternStream =
                    env.fromElements(patternRules);

            //Generate a key,value pair of set of patterns where key is pattern name and value is pattern condition
            DataStream<Tuple2<String, Map<String, String>>> patternRuleStream =
                    rawPatternStream.flatMap(new FlatMapFunction<Map<String, String>,
                            Tuple2<String, Map<String, String>>>() {
                        @Override
                        public void flatMap(Map<String, String> patternRules,
                                            Collector<Tuple2<String, Map<String, String>>> out) throws Exception {
                            for (Map.Entry<String, String> stringEntry : patternRules.entrySet()) {
                                JSONObject jsonObject = new JSONObject(stringEntry.getValue());
                                Map<String, String> map = new HashMap<>();
                                for (String key : jsonObject.keySet()) {
                                    String value = jsonObject.get(key).toString();
                                    map.put(key, value);
                                }
                                out.collect(new Tuple2<>(stringEntry.getKey(), map));
                            }
                        }
                    });

            BroadcastStream<Tuple2<String, Map<String, String>>> patternRuleBroadcast =
                    patternStream.broadcast(patternRuleDescriptor);


            DataStream<Tuple2<String, JSONObject>> validSignal = inputSignal.map(new MapFunction<JSONObject,
                    Tuple2<String, JSONObject>>() {
                @Override
                public Tuple2<String, JSONObject> map(JSONObject inputSignal) throws Exception {
                    String source =
                            inputSignal.getSource();
                    return new Tuple2<>(source, inputSignal);
                }
            }).keyBy(0).connect(patternRuleBroadcast).process(new MyKeyedBroadCastProcessFunction());
            
            
             validSignal.map(new MapFunction<Tuple2<String, JSONObject>,
                    JSONObject>() {
                @Override
                public JSONObject map(Tuple2<String, JSONObject> inputSignal) throws Exception {
                    return inputSignal.f1;
                }
            }).addSink(new getDataStreamOutput());

            env.execute("TestFlink");
        }
        return (getDataStreamOutput.dataStreamOutput);
    }


    @SuppressWarnings("serial")
    public static final class getDataStreamOutput implements SinkFunction<JSONObject> {
        public static String dataStreamOutput;

        public void invoke(JSONObject inputSignal) throws Exception {
            dataStreamOutput = inputSignal.toString();
        }
    }

我需要使用相同的广播规则测试不同的输入,但是每次我调用此函数时,它都会一次又一次地从开始获取输入信号广播数据进行处理,有没有办法我可以广播一次并继续将输入发送到我探索的方法我可以使用 CoFlatMapFunction 类似下面的东西来组合数据流并在方法运行时继续发送输入规则,但是对于这个数据流必须继续从 kafka 主题获取数据,它会在加载 kafka 的方法上负担过重实用程序和服务器

 DataStream<JSONObject> inputSignalFromKafka = env.addSource(inputSignalKafka);

    DataStream<org.json.JSONObject> inputSignalFromMethod = env.fromElements(inputSignal));
    
    DataStream<JSONObject> inputSignal = inputSignalFromMethod.connect(inputSignalFromKafka)
                .flatMap(new SignalCoFlatMapper());


   public static class SignalCoFlatMapper
            implements CoFlatMapFunction<JSONObject, JSONObject, JSONObject> {

        @Override
        public void flatMap1(JSONObject inputValue, Collector<JSONObject> out) throws Exception {
            out.collect(inputValue);

        }

        @Override
        public void flatMap2(JSONObject kafkaValue, Collector<JSONObject> out) throws Exception {
            out.collect(kafkaValue);

        }
    }

我在 stackoverflow How to unit test BroadcastProcessFunction in flink when processElement depends on broadcasted data 中找到了一个链接,但这让我很困惑

在测试用例中,我只能在 Before 方法中广播一次,并不断向我的广播函数发送不同类型的数据

【问题讨论】:

标签: junit apache-flink flink-streaming flink-cep


【解决方案1】:

您可以使用KeyedTwoInputStreamOperatorTestHarness 来实现此目的,例如假设您有以下KeyedBroadcastProcessFunction,您可以在其中为DataStream 频道定义一些业务逻辑

public class SimpleKeyedBroadcastProcessFunction extends KeyedBroadcastProcessFunction<String, String, String, String> {
    @Override
    public void processElement(String inputEntry,
                               ReadOnlyContext readOnlyContext, Collector<String> collector) throws Exception {
    //business logic for how you want to process your data stream records
    }

  @Override
    public void processBroadcastElement(String broadcastInput, Context
            context, Collector<String> collector) throws Exception {
   //process input from your broadcast channel
}

现在假设您的流程函数是有状态的并且正在修改 Flink 内部状态,您必须在测试类中创建一个TestHarness 以确保您能够在测试期间跟踪状态。

然后我会使用以下方法创建一些单元测试:

public class SimpleKeyedBroadcastProcessFunctionTest {
    private SimpleKeyedBroadcastProcessFunction processFunction;
    private KeyedTwoInputStreamOperatorTestHarness<String, String, String, String> testHarness;

  @Before
  public void setup() throws Exception {
    processFunction =  new SimpleKeyedBroadcastProcessFunction();
    testHarness = new KeyedTwoInputStreamOperatorTestHarness<>(
                new CoBroadcastWithKeyedOperator<>(processFunction, ImmutableList.of(BROADCAST_MAP_STATE_DESCRIPTOR)),
                (KeySelector<String, String>) string -> string ,
                (KeySelector<String, String>) string -> string,
                TypeInformation.of(String.class));
   testHarness.setup();
   testHarness.open();
  }

  @After
    public void cleanup() throws Exception {
        testHarness.close();
    }

  @Test
  public void testProcessRegularInput() throws Exception {
      //processElement1 send elements into your regular stream, second param will be the event time of the record
      testHarness.processElement1(new StreamRecord<>("Hello", 0));
      //Access records collected during processElement  
      List<StreamRecord<? extends String>> records = testHarness.extractOutputStreamRecords();
      assertEquals("Hello", records.get(0).getValue())
  }

    @Test
  public void testProcessBroadcastInput() throws Exception {
      //processElement2 send elements into your broadcast stream, second param will be the event time of the record
      testHarness.processElement2(new StreamRecord<>("Hello from Broadcast", 0));
      //Access records collected during processElement  
      List<StreamRecord<? extends String>> records = testHarness.extractOutputStreamRecords();
      assertEquals("Hello from Broadcast", records.get(0).getValue())
  }
}

【讨论】:

  • 如果你有一个无状态函数,你不需要使用测试工具,你可以参考官方文档来更好地了解 Apache Flink 中的测试工作原理flink.apache.org/news/2020/02/07/…
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-05-08
  • 1970-01-01
  • 1970-01-01
  • 2021-03-27
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多