Kotlin中递归协程的问题答案

【问题标题】：Questions on recursive coroutines in KotlinKotlin中递归协程的问题
【发布时间】：2022-01-02 19:11:08
【问题描述】：

最近我一直在尝试更多地熟悉 Kotlin，所以我决定编写一个使用协程的网络爬虫。我想要完成的是拉每个页面，获取链接和内容或帖子，然后将链接反馈给流程，直到无处可去。到目前为止，它有一些明显的缺点，例如在调用或保存地址之间没有延迟，并且只访问新地址。但是我的问题是关于协程的，在这里。

考虑下面的类。我添加了一些玩具类来模拟它的工作原理，我不会详细说明，但你可以想象它们是如何工作的。

class Scraper(
    private val client: Client = ToyClient(delayMillis = 1000, alwaysFindBody = "Test body"),
    private val extraction: Extraction = ToyExtraction(
        alwaysFindLinks = listOf("https://google.com"),
        alwaysFindPosts = listOf("Test post")
    ),
    private val repository: Repository = ToyRepository()
) {

    //  I could manage my own coroutine scope's lifecycle, but how would I go about this?
    //  private val scope = CoroutineScope(Dispatchers.Default + SupervisorJob())
    private val seed = "https://google.com"
    private val log = KotlinLogging.logger {}

    fun start() = runBlocking {
        log.info { "Scraping started!" }
        scrape(seed).join()
        log.info { "Scraping finished!" }
    }

    private fun CoroutineScope.scrape(address: String): Job = launch(Dispatchers.Default) {
        log.info { "A scraping coroutine has started" }
        val page = request(address)
        val contents = extract(page)
        save(contents)
        contents.links.forEach { scrape(it) }
        //  Job would not progress here after submitting new jobs, only after each children have been completed
        //  log.info { "A scraping coroutine has finished" }
    }

    private suspend fun request(address: String): Page {
        log.info { "Getting page: $address" }
        return client.get(address)
    }

    private suspend fun extract(page: Page): PageContents {
        log.info { "Extracting page: ${page.address}" }
        return extraction.extract(page)
    }

    private suspend fun save(contents: PageContents) {
        log.info { "Processing contents of: $contents" }
        repository.save(contents.posts)
    }
}

主要的递归操作是CoroutineScope.scrape()，它启动一个作业，它本身也可以启动子作业等等。

我的主要问题是：

如果我自己将范围作为一个属性来管理，我怎么能做到这一点并实现相同的行为？也就是说，我会等待所有动态生成的作业也完成，然后在所有完成后返回。
我使用 3rd 方库编写了我的 webclient 函数，如下所示： fun suspend get(address: String): Page { ... } 我是否可以将此方法标记为 suspend 以从中获得协程方面的所有好处？

提前致谢！

【问题讨论】：

看起来有点类似于stackoverflow.com/q/70557950/2071828。仍然问了几个问题，而且注意力不集中。

标签： kotlin kotlin-coroutines

【解决方案1】：

您甚至不需要一个范围，launch 是一个顶级工作，并使用 job.join() 等待它及其所有子项完成。如果您想在等待该情况发生时阻止，那么您已经使用runBlocking 正确地做到了。
不，将函数标记为suspend 不会影响其阻塞行为。它只允许函数暂停自身，这必须在您的代码或您正在调用的代码中明确显示。

【讨论】：

谢谢，这正是我希望得到的答案！