如何从汇编中的标准输入读取输入，逐个字符答案

【问题标题】：How to read input from stdin in assembly, character by character如何从汇编中的标准输入读取输入，逐个字符
【发布时间】：2021-06-18 14:48:18
【问题描述】：

我希望下面的程序从stdin 中读取一些字符（最多 9 个），并将它们放在内存中的指定位置。

实际发生的情况：当我按Enter时，如果我的字符少于9个，它只是转到下一行；这将一直发生，直到我输入 9 个字符。如果我输入超过 9 个，多余的字符将被解释为 shell 命令。为什么我按 Enter 时它不终止？

在 Ubuntu 上使用 nasm 2.14.02。

  global _start
  section .bss
    buf resb 10
  section .text
    ; Read a word from stdin, terminate it with a 0 and place it at the given address.
    ; - $1, rdi: *buf - where to place read bytes
    ; - $2, rsi: max_count, including the NULL terminator
    ; Returns in rax:
    ; - *buf - address of the first byte where the NULL-terminated string was placed
    ; - 0, if input too big
    read_word: ; (rdi: *buf, rsi: max_count) -> *buf, or 0 if input too big
      mov r8, 0      ; current count
      mov r9, rsi    ; max count
      dec r9         ; one char will be occupied by the terminating 0

      ; read a char into the top of the stack, then pop it into rax
      .read_char:
        push rdi       ; save; will be clobbered by syscall
        mov rax, 0     ; syscall id = 0 (read)
        mov rdi, 0     ; syscall $1, fd = 0 (stdin)
        push 0         ; top of the stack will be used to place read byte
        mov rsi, rsp   ; syscall $2, *buf = rsp (addr where to put read byte)
        mov rdx, 1     ; syscall $3, count (how many bytes to read)
        syscall
        pop rax
        pop rdi

      ; if read character is Enter (aka carriage-return, CR) - null-terminate the string and exit
      cmp rax, 0x0d ; Enter
      je .exit_ok

      ; not enter ⇒ place it in the buffer, and read another one
      mov byte [rdi+r8], al ; copy character into output buffer
      inc r8                ; inc number of collected characters
      cmp r8, r9            ; make sure number doesn't exceed maximum
      je .exit_ok           ; if we have the required number of chars, exit
      jb .read_char         ; if it's not greater, read another char

      .exit_ok: ; add a null to the end of the string and return address of buffer (same as input)
        add r8, 1
        mov byte [rdi+r8], 0
        mov rax, rdi
        ret

      .exit_err: ; return 0 (error)
        mov rax, 0
        ret

  _start:
    mov rdi, buf     ; $1 - *buf
    mov rsi, 10      ; $2 - uint count
    call read_word

    mov rax, 60  ; exit syscall
    mov rdi, 0   ; exit code
    syscall

【问题讨论】：

syscall 可能会破坏任何和所有 rax、rcx、rdx、rsi、rdi、r8、r9、r10 和 r11。您正在使用其中一些来保存数据，因此数据可能会丢失。
@ChrisDodd 欢呼；在这个特定的程序中，我认为只有 r8 和 r9 有问题
@ChrisDodd: stackoverflow.com/a/2538212/634919 说只有 rax, rcx, r11 被破坏了。您可能正在考虑函数调用约定？
@ChrisDodd：系统调用调用约定与函数调用约定不同。在 ISA 中，可以假设传递 arg 的 regs 没有被 Linux 系统调用修改，而不是返回值。（对于 x86-64，RCX 和 R11 也被 syscall 指令本身所破坏。）

标签： linux assembly x86-64 nasm system-calls

【解决方案1】：

首先，当用户按 Enter 键时，您将看到 LF（\n，0xa），而不是 CR（\r，0xd）。这可以解释为什么您的程序没有在您认为应该退出时退出。

至于为什么额外的字符会进入 shell，这是关于操作系统如何进行终端输入的。它将来自终端的击键累积到内核缓冲区中，直到按下 Enter，然后使整个缓冲区可供read() 读取。这允许诸如退格之类的东西透明地工作，而无需应用程序对其进行显式编码，但这确实意味着您不能一次真正地读取一个键击，正如您所注意到的那样。

如果您的程序在缓冲区仍然包含字符时退出，那么这些字符将被下一个尝试从设备读取的程序读取，在您的情况下将是外壳程序。大多数读取标准输入的程序通过继续读取和处理数据直到看到文件结尾（read() 返回 0）来避免这种情况，当用户按下 Ctrl-D 时，终端会发生这种情况。

如果确实需要逐个字符处理输入，则需要将终端设置为non-canonical mode，但这种情况下很多东西会有所不同。

【讨论】：

是的，将LF 换成CR 就成功了——程序终止，对shell 没有任何副作用