【问题标题】:GridGain network connection: Is it possible to forward a node via SSH?GridGain 网络连接:是否可以通过 SSH 转发节点?
【发布时间】:2014-09-26 08:07:49
【问题描述】:

我想通过 SSH 连接到运行 gridgain 实例的远程机器并从本地 gridgain 实例连接到它。这个可以吗?

gridgain 网络连接是如何完成的?据我所知,节点启动并侦听 47100-47200 上的第一个可用端口。但它也打开了更多端口。

仅仅例如似乎是不够的。将远程机器上的 47100(远程机器的 gridgain 端口)转发到本地 47100。可能通信不仅仅是客户端服务器,而是与尝试连接到我的主节点的远程节点对称?

是否有关于网络协议的文档?


我尝试了对称转发

  • GridTcpCommunicationSpi.DFLT_PORTs (47100+) 和
  • GridTcpDiscoverySpi.DFLT_PORTs (47500+)

端口。

节点能够连接。在本地节点上,我首先收到此警告:

WARN  GridTcpCommunicationSpi - Connect timed out (consider increasing 'connTimeout' configuration property) [addr=/10.240.136.167:47100]
WARN  GridTcpDiscoverySpi - Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout. Current timeout: 5000.
WARN  GridDhtPreloader - <gg-utility-sys-cache> Failed to wait for initial partition map exchange. Possible reasons are: 
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.
WARN  GridTcpDiscoverySpi - Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node. Current timeout: 5000.

当以某种方式尝试连接到 10.240.136.167:47100 时,这是一个超时 - 这是远程机器的本地 IP,这显然是不可能的。

但是当我得到以下信息时它看起来不错:

INFO  GridDiscoveryManager - Topology snapshot [ver=2, nodes=2, CPUs=6, heap=2.7GB]

在执行以下广播测试时:

        grid.compute().broadcast(new GridRunnable() {
            @Override
            public void run() {
                System.out.println("hello!");
            }
        });

我在远程机器上收到这个致命错误,不管它是什么:

[SEVERE][gridgain-#9%pub-null%][GridJobProcessor] Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=at$
        at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
        at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
        at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
        at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
        at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
[19:58:02,237][SEVERE][gridgain-#11%pub-null%][GridJobProcessor] Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$1, taskClsName=at.a$
For more information see:
    Troubleshooting:      http://bit.ly/GridGain-Troubleshooting
    Documentation Center: http://bit.ly/GridGain-Documentation

class org.gridgain.grid.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$1, taskClsName=at.ac.ait.is.infrase$
For more information see:
    Troubleshooting:      http://bit.ly/GridGain-Troubleshooting
    Documentation Center: http://bit.ly/GridGain-Documentation

        at org.gridgain.grid.kernal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1107)
        at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
        at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
        at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
        at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)

在客户端我什么也没看到:

INFO  GridDeploymentLocalStore - Class locally deployed: class nix.GoogleGridRun$1
hello!

当我尝试通过调试器再次推送广播时,我在本地机器上得到以下信息,在远程机器上得到与以前相同的错误消息:

ERROR GridTaskWorker - Failed to obtain remote job result policy for result from GridComputeTask.result(..) method (will fail the whole task): GridJobResultImpl [job=o.g.g.kernal.processors.closure.GridClosureProcessor$10@7e89183d, sib=GridJobSiblingImpl [sesId=4c17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, jobId=0d17983b841-ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, nodeId=ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, isJobDone=false], jobCtx=GridJobContextImpl [jobId=0d17983b841-ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, attrs={}], node=GridTcpDiscoveryNode [id=ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, addrs=[10.240.136.167, 127.0.0.1], sockAddrs=[/10.240.136.167:47500, /10.240.136.167:47500, /127.0.0.1:47500], discPort=47500, order=1, loc=false, ver=6.5.0#20140925-sha1:6dc3d773], ex=class o.g.g.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=nix.GoogleGridRun$Test, codeVer=0, clsLdrId=eb17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, seqNum=1411761402302, depMode=SHARED, dep=null]
For more information see:
    Troubleshooting:      http://bit.ly/GridGain-Troubleshooting
    Documentation Center: http://bit.ly/GridGain-Documentation
, hasRes=true, isCancelled=false, isOccupied=true]
class org.gridgain.grid.GridException: Remote job threw user exception (override or implement GridComputeTask.result(..) method if you would like to have automatic failover for this exception).
    at org.gridgain.grid.compute.GridComputeTaskAdapter.result(GridComputeTaskAdapter.java:109)
    at org.gridgain.grid.kernal.processors.task.GridTaskWorker$3.apply(GridTaskWorker.java:819)
    at org.gridgain.grid.kernal.processors.task.GridTaskWorker$3.apply(GridTaskWorker.java:812)
    at org.gridgain.grid.util.GridUtils.wrapThreadLoader(GridUtils.java:6093)
    at org.gridgain.grid.kernal.processors.task.GridTaskWorker.result(GridTaskWorker.java:812)
    at org.gridgain.grid.kernal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:708)
    at org.gridgain.grid.kernal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:906)
    at org.gridgain.grid.kernal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1138)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
    at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: class org.gridgain.grid.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=nix.GoogleGridRun$Test, codeVer=0, clsLdrId=eb17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, seqNum=1411761402302, depMode=SHARED, dep=null]
For more information see:
    Troubleshooting:      http://bit.ly/GridGain-Troubleshooting
    Documentation Center: http://bit.ly/GridGain-Documentation

    at org.gridgain.grid.kernal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1107)
    at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
    at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
    at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    ... 1 more

在本地主机端,我在虚拟端口和真实端口之间建立了连接

tcp6       0      0 127.0.0.1:47100         127.0.0.1:38272         VERBUNDEN   12280/java      
tcp6       0      0 127.0.0.1:38272         127.0.0.1:47100         VERBUNDEN   12280/java 

还有一些往返于 ssh 客户端(也包括 java)

tcp6   45832      0 78.101.12.107:47101    146.148.119.62:51867    VERBUNDEN   12280/java      
tcp6     231      0 78.101.12.107:47501    146.148.119.62:46219    CLOSE_WAIT  12280/java      
tcp6      48      0 78.101.12.107:37129    146.148.119.62:22       VERBUNDEN   12280/java   
tcp6       1      0 78.101.12.107:47501    146.148.119.62:44391    CLOSE_WAIT  12280/java  

78.101.12.107 = 本地 IP 146.148.119.62 = 远程ip

我在一个成功的本地 2 节点网格上查看了 netstat,我看到正在建立以下连接:

tcp6       0      0 ::1:47501               ::1:43143               VERBUNDEN   10218/java      
tcp6       0      0 ::1:47500               ::1:34708               VERBUNDEN   9496/java       
tcp6       0      0 ::1:34708               ::1:47500               VERBUNDEN   10218/java      
tcp6       0      0 ::1:43143               ::1:47501               VERBUNDEN   9496/java 

这些在 GridTcpCommunicationSpi.DFLT_PORTs 和 GridTcpDiscoverySpi.DFLT_PORTs 之间 - 所以这些可能就足够了。

有什么想法可能是错的吗?

【问题讨论】:

    标签: ssh ssh-tunnel gridgain


    【解决方案1】:

    主节点也应该在集群中可用。你有两个选择:

    1. 设置 VPN
    2. 为所有节点实施和配置 GridAddressResolver,将其本地地址转换为外部地址。这需要在您的家庭网络中设置端口转发。

    【讨论】:

    • 感谢您的回复! 1 很明显,当你是 VPN 时,GridGain 就有一个 PN。你说的2是什么意思?我可以让 GridGain 相信,一个节点位于远程地址上,即使它是“虚拟本地的”?使用隧道时,这只是一种方式。我也可以设置一个反向路由端口转发。然后我想,对于一个双向连接,两台机器上的端口号必须相同,因为没有一台机器可能知道它们在哪台机器上。是否有关于 GridGain 需要的端口的文档?只有一个吗?
    • 我现在差不多明白你所说的 2 的意思了。我可以更改节点连接到的地址。但是“localhost”上的一个节点呢?这是 ssh 隧道场景?我想除非有人知道更多,否则我将不得不进行一些调试。