更快地制作基于标签对照片进行排序的脚本答案

【问题标题】：Making a script for sorting photos based on tags faster更快地制作基于标签对照片进行排序的脚本
【发布时间】：2021-11-13 14:05:12
【问题描述】：

您好，我不知道这是否是 SO 的合法问题。但是我有一个脚本，用于根据标签将照片分类到文件夹中。它使用exifr 包来执行此操作。

但是，它的运行速度非常慢。我尝试使用guides 对其进行改进，但我所做的最终无法正常工作。是否有了解矢量化和/或优化的人可以提出一些建议。谢谢！

#----- Imports ----
library(exifr)

# ---------- Functions ----------
'%!in%' <- function(x,y)!('%in%'(x,y))
tagcatcher <- function(dat){
  tags <- c()
  for (tagNameTry in keywords_names )  {
    if (tagNameTry %in% names(dat)) {
      xs <- dat[tagNameTry]
      if (typeof(xs) == "list") {
        xs <- xs[[1]]
        l <- length(xs[[1]])
        x <- c()
        for (i in 1:l) {
          x <- c(x,xs[[1]][i])
        }
      } else {
        x <- xs
      }
      tags <- c(tags,x)
    }
  }
  tags <- unique(tags)
  return(tags)
}

# ----------- Settings ----------
ss <- "/"
haystacks <- c("H:MyPhotos")
organizedMediaPhotos <- "V:/Photos"
all_files <- list.files(haystacks,recursive = TRUE, full.names = TRUE)
keywords_names <- c("Category","XPKeywords","Keywords")
ctags <- list.dirs(organizedMediaPhotos)[list.dirs(organizedMediaPhotos) %!in% organizedMediaPhotos]
current_tags <- c()

for (ctag in ctags) {
  x <- strsplit(ctag,"/")
  x <- x[[1]]
  x <- x[length(x)]
  current_tags <- c(current_tags,x)
}

# Main Loop - That Needs to be faster
for (cur_file in all_files) {
  print(cur_file)
  cur_dat <- read_exif(cur_file,tags=keywords_names)
  tags <- tagcatcher(cur_dat)
  for (tag in tags) {
    tag_folder <- paste(organizedMediaPhotos,ss,tag,sep="")
    if (tag %!in% current_tags) {
      dir.create(tag_folder)
      print(paste("creating tag folder: ",tag_folder))
    }
    pic_path <- paste(tag_folder,ss,basename(cur_file),sep="")
    if (!file.exists(pic_path)) {
      file.copy(cur_file,pic_path)
      print(paste("moved file from ",cur_file, " to ", pic_path))
    }
  }
}

【问题讨论】：

一个普遍的评论是你正在增长一个（可能非常大？）向量；这确实（不必要地）很慢。另一个不相关的评论：请不要将非语法变量名写成字符串。 R 允许这样做的事实是一个错误，只会导致混乱。相反，请按照documentation（在“名称和标识符”下）中的建议使用反引号。此外，您可以将`%!in%` 的定义缩短为`%!in%` = Negate(`%in%`)。
必须进行任何基准测试才能查看循环中的哪些步骤最慢？
必须是“read_exif(cur_file,tags=keywords_names)”
read_exif 可以接受文件名向量。因此，如果调用函数有明显的开销，则将调用移到“处理当前文件”循环之外可能是有益的。

标签： r optimization tags vectorization photo

【解决方案1】：

你可以试试这个

for x in *.jpg; do
  d=$(date -r "$x" +%Y-%m-%d)
  mkdir -p "$d"
  mv -- "$x" "$d/"
done

对于 powershell：

Param(
    [string]$source, 
    [string]$dest, 
    [string]$format = "yyyy/yyyy_MM/yyyy_MM_dd"
)

$shell = New-Object -ComObject Shell.Application

function Get-File-Date {
    [CmdletBinding()]
    Param (
        $object
    )

    $dir = $shell.NameSpace( $object.Directory.FullName )
    $file = $dir.ParseName( $object.Name )

    # First see if we have Date Taken, which is at index 12
    $date = Get-Date-Property-Value $dir $file 12

    if ($null -eq $date) {
        # If we don't have Date Taken, then find the oldest date from all date properties
        0..287 | ForEach-Object {
            $name = $dir.GetDetailsof($dir.items, $_)

            if ( $name -match '(date)|(created)') {
            
                # Only get value if date field because the GetDetailsOf call is expensive
                $tmp = Get-Date-Property-Value $dir $file $_
                if ( ($null -ne $tmp) -and (($null -eq $date) -or ($tmp -lt $date))) {
                    $date = $tmp
                }
            }
        }
    }
    return $date
}

function Get-Date-Property-Value {
    [CmdletBinding()]

    Param (
        $dir,
        $file,
        $index
    )

    $value = ($dir.GetDetailsof($file, $index) -replace "`u{200e}") -replace "`u{200f}"
    if ($value -and $value -ne '') {
        return [DateTime]::ParseExact($value, "g", $null)
    }
    return $null
}

Get-ChildItem -Attributes !Directory $source -Recurse | 
Foreach-Object {
    Write-Host "Processing $_"

    $date = Get-File-Date $_

    if ($date) {
    
        $destinationFolder = Get-Date -Date $date -Format $format
        $destinationPath = Join-Path -Path $dest -ChildPath $destinationFolder   

        # See if the destination file exists and rename until we get a unique name
        $newFullName = Join-Path -Path $destinationPath -ChildPath $_.Name
        if ($_.FullName -eq $newFullName) {
            Write-Host "Skipping: Source file and destination files are at the same location. $_"    
            return
        }

        $newNameIndex = 1
        $newName = $_.Name

        while (Test-Path -Path $newFullName) {
            $newName = ($_.BaseName + "_$newNameIndex" + $_.Extension) 
            $newFullName = Join-Path -Path $destinationPath -ChildPath $newName  
            $newNameIndex += 1   
        }

        # If we have a new name, then we need to rename in current location before moving it.
        if ($newNameIndex -gt 1) {
            Rename-Item -Path $_.FullName -NewName $newName
        }

        Write-Host "Moving $_ to $newFullName"

        # Create the destination directory if it doesn't exist
        if (!(Test-Path $destinationPath)) {
            New-Item -ItemType Directory -Force -Path $destinationPath
        }

        robocopy $_.DirectoryName $destinationPath $newName /mo

PS：几年前我曾尝试过，效果很好

【讨论】：

【解决方案2】：

你可以把你的 if 命令改成这样：

if [[ "$t" =~ IMG_+[0-9]{8}[a-zA-Z]*$ ]]

=~ 是正则表达式比较运算符，在 bash 版本 3 及更高版本中引入。

通过使用这个 if 语句，您可以捕获像 IMG_11111111alphabets.ext 这样的名称。您可以使用它并根据您的需要对其进行自定义。有关更多信息，请查看：Bash 的正则表达式

【讨论】：