【发布时间】:2018-10-10 19:01:09
【问题描述】:
为了测试和评估 SF 以供生产使用,我在具有三个节点的生产机器上创建了一个(单机)测试集群,效果很好。但是,我未能创建具有三个节点的多机集群。
所有机器:
- 位于具有以下 IP 的同一(安全)网络上:10.0.10.12、10.0.11.12、10.0.12.12。
- 是虚拟的,并且是从同一个图像新创建的。
- 不是域的一部分。设置是在所有机器上使用具有相同密码的管理员帐户完成的。
- 将 Windows Server 2012 R2 与 PowerShell 4.0 结合使用。
- 已禁用防火墙(公共和私有)。
这是 clusterConfig.json:
{
"name":"SampleCluster",
"clusterManifestVersion":"1.0.0",
"apiVersion":"2015-01-01-alpha",
"nodes":[
{
"nodeName":"vm1",
"iPAddress":"10.0.10.12",
"nodeTypeRef":"NodeType0",
"faultDomain":"fd:/dc1/fd1",
"upgradeDomain":"UD0"
},
{
"nodeName":"vm2",
"iPAddress":"10.0.11.12",
"nodeTypeRef":"NodeType0",
"faultDomain":"fd:/dc1/fd2",
"upgradeDomain":"UD1"
},
{
"nodeName":"vm3",
"iPAddress":"10.0.12.12",
"nodeTypeRef":"NodeType0",
"faultDomain":"fd:/dc1/fd3",
"upgradeDomain":"UD2"
}
],
"diagnosticsFileShare": {
"etlReadIntervalInMinutes": "5",
"uploadIntervalInMinutes": "10",
"dataDeletionAgeInDays": "7",
"etwStoreConnectionString": "file:c:\\ProgramData\\SF\\FileshareETW",
"crashDumpConnectionString": "file:c:\\ProgramData\\SF\\FileshareCrashDump",
"perfCtrConnectionString": "file:c:\\ProgramData\\SF\\FilesharePerfCtr"
},
"properties":{
"reliabilityLevel": "Bronze",
"nodeTypes": [
{
"name": "NodeType0",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpoint": "19001",
"httpGatewayEndpointPort": "19080",
"applicationPorts": {
"startPort": "20001",
"endPort": "20031"
},
"ephemeralPorts": {
"startPort": "20032",
"endPort": "20062"
},
"isPrimary": true
}
],
"fabricSettings": [
{
"name": "Setup",
"parameters": [
{
"name": "FabricDataRoot",
"value": "C:\\ProgramData\\SF"
},
{
"name": "FabricLogRoot",
"value": "C:\\ProgramData\\SF\\Log"
}
]
}
]
}
}
当我从其中一台机器(它是 10.0.10.12)启动集群设置时,这会写入 PowerShell 控制台:
Cab extracted.
Creating Service Fabric Cluster...
If it's taking too long, please check in Task Manager details and see if Fabric.exe for each node is running. If not, p
lease look at: 1. traces in DeploymentTraces directory and 2. traces in FabricLogRoot configured in ClusterConfig.json.
Trace folder doesn't exist. Creating trace folder: C:\copy\DeploymentTraces
Verifying remote procedure call access against cluster machines.
Processing and validating cluster config.
Creating FabricSettingsMetadata from C:\copy\ServiceFabricPackage\bin\Fabric\Fabric.Code\Configurations.csv
Configuring nodes.
Copying installer & package to all machines.
Configuring machine 10.0.10.12
Configuring machine 10.0.11.12
这里的设置会保留几分钟。然后发生超时:
Timed out waiting for Installer Service to start for machine 10.0.11.12.
CreateCluster Error: System.InvalidOperationException: Cannot start service FabricInstallerSvc on computer '10.0.11.12'.
---> System.ComponentModel.Win32Exception: The system cannot find the file specified
--- End of inner exception stack trace ---
at System.ServiceProcess.ServiceController.Start(String[] args)
at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()
Errors occurred during cluster creation.
CreateCluster Exception 0: System.AggregateException: One or more errors occurred. ---> System.InvalidOperationException
: Cannot start service FabricInstallerSvc on computer '10.0.11.12'. ---> System.ComponentModel.Win32Exception: The syste
m cannot find the file specified
--- End of inner exception stack trace ---
at System.ServiceProcess.ServiceController.Start(String[] args)
at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()
--- End of inner exception stack trace ---
---> (Inner Exception #0) System.InvalidOperationException: Cannot start service FabricInstallerSvc on computer '10.0.11
.12'. ---> System.ComponentModel.Win32Exception: The system cannot find the file specified
--- End of inner exception stack trace ---
at System.ServiceProcess.ServiceController.Start(String[] args)
at System.Fabric.DeploymentManager.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController i
nstallerSvc)
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Fabric.DeploymentManager.<CreateClusterAsyncInternal>d__a.MoveNext()<---
当我检查特定机器 (10.0.11.12) 上的服务时,我在列表中找到了 Service Fabric 安装程序服务,但它没有运行。此外,我可以在 Windows 事件日志中发现一个错误(这与上面的错误消息一致):
The Service Fabric Installer Service service failed to start due to the following error:
The system cannot find the file specified.
在特定机器上,我找到了以下日志文件:C:\ProgramData\SF\Log\traces\FabricInstallerService_5.1.150.9590_131111077992093094.trace。它包含以下内容:
2016-06-22 22:23:19.224,Info ,708,General.FabricInstallerServiceImpl,FabricInstallerService starting ...
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b4480bcf0,Attempting to attach child AsyncOperation 3b4480bdf0.
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b4480bdf0,Calling OnStart
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b4480bdf0,Attempting to attach child AsyncOperation 3b4480c9b0.
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b4480c9b0,Calling OnStart
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b4480c9b0,FinishComplete called with S_OK
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b44811270,Attempting to attach child AsyncOperation 3b44811630.
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b44811630,Calling OnStart
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b4480bdf0,FinishComplete called with S_OK
2016-06-22 22:23:19.224,Noise ,1652,General.FabricInstallerServiceImpl,FabricUpgradeManager open returned S_OK
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b4480bcf0,Detaching child AsyncOperation 3b4480bdf0.
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b4480bdf0,Detaching child AsyncOperation 3b4480c9b0.
2016-06-22 22:23:19.224,Info ,1652,FabricInstallerService.FabricUpgradeManager,Upgrade started with FabricDataRoot:C:\ProgramData\SF, FabricLogRoot:C:\ProgramData\SF\Log, FabricCodePath:C:\Program Files\Microsoft Service Fabric\bin\fabric\fabric.code, FabricRoot:C:\Program Files\Microsoft Service Fabric, TargetInformationFilePath:C:\ProgramData\SF\TargetInformation.xml, TargetInformationDescription:TargetInformationFileDescription { CurrentInstallation = WindowsFabricDeploymentDescription { IsValid = true, InstanceId = 0, MSILocation = , ClusterManifestLocation = , InfrastructureManifestLocation = , NodeName = , UpgradeEntryPointExe = , UpgradeEntryPointExeParameters = , UndoUpgradeEntryPointExe = FabricSetup.exe, UndoUpgradeEntryPointExeParameters = /operation:Uninstall , }TargetInstallation = WindowsFabricDeploymentDescription { IsValid = false, InstanceId = , MSILocation = , ClusterManifestLocation = , InfrastructureManifestLocation = , NodeName = , UpgradeEntryPointExe = , UpgradeEntryPointExeParameters = , UndoUpgradeEntryPointExe = , UndoUpgradeEntryPointExeParameters = , }}
2016-06-22 22:23:19.224,Info ,1652,FabricInstallerService.FabricUpgradeManager,Stopping fabric host
2016-06-22 22:23:19.224,Info ,1652,FabricInstallerService.FabricUpgradeManager,Error 0x80070424 while waiting for fabric host service to stop.
2016-06-22 22:23:19.224,Error ,1652,FabricInstallerService.FabricUpgradeManager,Unable to stop fabric host service; error
2016-06-22 22:23:19.224,Error ,1652,FabricInstallerService.FabricUpgradeManager,Error E_FAIL while trying to stop fabric host service
2016-06-22 22:23:19.224,Noise ,1652,General.AsyncOperation@3b44811630,FinishComplete called with E_FAIL
2016-06-22 22:23:19.224,Warning ,1652,FabricInstallerService.FabricUpgradeManager,Upgrade finished with error E_FAIL
2016-06-22 22:23:19.224,Info ,1636,General.FabricInstallerServiceImpl,service stopping (shutdown = false) ...
2016-06-22 22:23:19.224,Info ,1636,General.FabricInstallerServiceImpl,Stop FabricUpgradeManager called
2016-06-22 22:23:19.240,Info ,2472,General.FabricInstallerServiceImpl,Close FabricUpgradeManager, with timeout 5:00.000
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation@3b4480be00,Attempting to attach child AsyncOperation 3b4480c4d0.
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation@3b4480c4d0,Calling OnStart
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation@3b4480c4d0,Attempting to attach child AsyncOperation 3b4480c5d0.
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation@3b4480c5d0,Calling OnStart
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation@3b4480c5d0,FinishComplete called with S_OK
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation@3b4480c4d0,FinishComplete called with S_OK
2016-06-22 22:23:19.240,Noise ,2472,General.FabricInstallerServiceImpl,Close FabricUpgradeManager returned S_OK
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation@3b4480be00,Detaching child AsyncOperation 3b4480c4d0.
2016-06-22 22:23:19.240,Noise ,2472,General.AsyncOperation@3b4480c4d0,Detaching child AsyncOperation 3b4480c5d0.
这是我卡住的地方。我的想法是:
- 机器之间的通信和可访问性似乎没问题,因为设置文件已复制,设置过程已开始。
- Service Fabric 安装程序服务似乎在这里发挥了重要作用。
- Service Fabric 安装程序服务似乎在我开始设置过程的计算机上正常工作,但在远程计算机上却失败了。
有什么想法吗?谢谢。
【问题讨论】:
-
您找到解决方案了吗?我们在工作中遇到了类似的问题