【发布时间】:2017-11-01 02:01:55
【问题描述】:
为了创建 Azure SF 测试环境,我在开发测试实验室中创建了三个 azure VM。这些将用 X509s 保护。
机器是:
- Windows 2016 数据中心
- 在同一个虚拟网络上
- 所有防火墙都被禁用(可以从对方ping每台机器)
- 全部使用同一个管理员帐户
我使用文档提供的 certsetup.ps1 文件创建了自签名证书。一个服务器和集群证书按照建议合并。
如果我运行 TestConfiguration.ps1,我会得到以下输出。
LocalAdminPrivilege : True
IsJsonValid : True
IsCabValid :
RequiredPortsOpen : True
RemoteRegistryAvailable : True
FirewallAvailable : True
RpcCheckPassed : True
NoConflictingInstallations : True
FabricInstallable : True
DataDrivesAvailable : True
Passed : True
显然 IsCabValid 字段是空白的,但“Passed”字段仍然表明可以安装。我继续运行下一个 powershell 命令开始安装。
.\CreateServiceFabricCluster.ps1 -ClusterConfigFilePath .\ClusterConfig.X509.MultiMachine.json
按照上面的命令,进程启动,控制台窗口填充了以下文本,表明节点间通信正常..
Creating Service Fabric Cluster...
If it's taking too long, please check in Task Manager details and see if Fabric.exe for each node is running. If not, please look at: 1. traces in DeploymentTraces directory and 2. traces in FabricLogRoot configured in ClusterConfig.json.
Trace folder already exists. Traces will be written to existing trace folder: C:\StandaloneCluster\DeploymentTraces
Running Best Practices Analyzer...
Best Practices Analyzer completed successfully.
Creating Service Fabric Cluster...
Processing and validating cluster config.
Configuring nodes.
Default installation directory chosen based on system drive of machine '10.0.0.4'.
Copying installer to all machines.
Configuring machine '10.0.0.4'.
Configuring machine '10.0.0.5'.
Configuring machine '10.0.0.6'.
Machine 10.0.0.6 configured.
Machine 10.0.0.5 configured.
Machine 10.0.0.4 configured.
Running Fabric service installation.
Successfully started FabricInstallerSvc on machine 10.0.0.4
Successfully started FabricInstallerSvc on machine 10.0.0.6
Successfully started FabricInstallerSvc on machine 10.0.0.5
会出现几分钟的长时间停顿,之后会显示超时错误,但没有真正说明原因。我已经搜索了节点上的窗口日志,但无法发现任何进一步的信息。 PS控制台显示的错误如下:
Timed out waiting for Installer Service to complete for machine 10.0.0.4. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
Timed out waiting for Installer Service to complete for machine 10.0.0.6. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
CreateCluster Error: System.AggregateException: One or more errors occurred. ---> System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeploye
r -> Fabric
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Ac
tion`1 localFinally)
at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, Action`1 body)
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.RunFabricServices(List`1 machines, FabricPackageType fabricPackageType)
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.<CreateClusterAsyncInternal>d__7.MoveNext()
---> (Inner Exception #0) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---
---> (Inner Exception #1) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.6. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---
---> (Inner Exception #2) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.4. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---
Trace folder already exists. Traces will be written to existing trace folder: C:\StandaloneCluster\DeploymentTraces
Cleaning up faulted installation.
Removing configuration from machine 10.0.0.5
Removing configuration from machine 10.0.0.4
Removing configuration from machine 10.0.0.6
有没有 Azure SF 爱好者可以解释这个问题,或者提供任何关于我哪里出错的建议?
【问题讨论】:
-
您是否尝试过按照此处所述卸载 SDK:stackoverflow.com/questions/38106961/…
-
@Oliver 尝试安装时机器上不存在 SDK,否则 TestConfiguration.ps1 将失败。
-
您的虚拟机有多大?您可能需要更快的安装程序或更改安装程序的超时时间(我相信有一个开关可以做到这一点)
-
使用 -NoCleanupOnFailure 标志运行部署并检查“应用程序和服务日志 > Microsoft-Service Fabric > 管理员”下的事件日志。错误/警告日志应指示读取证书是否存在问题,或者是否存在任何其他阻塞问题。检查证书是否在每台机器上都被 ACLed 到 NETWORK SERVICE,因为这是文档中列出的要求之一。
标签: windows azure x509certificate azure-service-fabric azure-virtual-machine