Import.io - REGEX 从 th 获取表 td 值答案

【问题标题】：Import.io - REGEX get table td value from thImport.io - REGEX 从 th 获取表 td 值
【发布时间】：2017-10-09 14:07:48
【问题描述】：

我需要帮助。我正在使用 import.io 并想使用正则表达式抓取表数据并希望从某个列中获取值，这是代码

<table>
   <tr>
      <td class='label'>No. Urut</td>
      <td class='titikdua'>:</td>
      <td>201</td>
   </tr>
   <tr>
      <td class='label'>Kode</td>
      <td class='titikdua'>:</td>
      <td>DF 045</td>
   </tr>
   <tr>
      <td class='label'>Warna</td>
      <td class='titikdua'>:</td>
      <td>HITAM</td>
   </tr>
   <tr>
      <td class='label'>Bahan</td>
      <td class='titikdua'>:</td>
      <td>KULIT</td>
   </tr>
   <tr>
      <td class='label'>Berat</td>
      <td class='titikdua'>:</td>
      <td>0 gr</td>
   </tr>
   <tr>
      <td class='label'>Info</td>
      <td class='titikdua'>:</td>
      <td>SOL : FIBER</td>
   </tr>
</table>
</div>
<div id='fr-stok'>
<table id='t01'>
   <tr>
      <th>Size</th>
      <th>Stok</th>
      <th>Pesanan</th>
      <th>Last Update</th>
   </tr>
   <tr>
      <td>38</td>
      <td>4</td>
      <td>0</td>
      <td></td>
   </tr>
   <tr>
      <td>39</td>
      <td>5</td>
      <td>0</td>
      <td>05 Oct 17, 15:39:53</td>
   </tr>
   <tr>
      <td>40</td>
      <td>11</td>
      <td>0</td>
      <td></td>
   </tr>
   <tr>
      <td>41</td>
      <td>4</td>
      <td>0</td>
      <td>08 Oct 17, 12:24:28</td>
   </tr>
   <tr>
      <td>42</td>
      <td>0</td>
      <td>0</td>
      <td>07 Oct 17, 14:22:07</td>
   </tr>
   <tr>
      <td>43</td>
      <td>6</td>
      <td>0</td>
      <td>04 Oct 17, 15:52:41</td>
   </tr>
</table>

我想获得 size 值并将其转换为 38,39,40,41,42,43 我如何使用正则表达式来做到这一点

【问题讨论】：

对不起，我的意思是从 38-43，我一直在编辑

标签： php html regex web-scraping

【解决方案1】：

<tr>\s*<td>\s*\K\d+

<tr> matches the characters <tr> literally (case sensitive)
\s*
matches any whitespace character (equal to [\r\n\t\f\v ])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
<td> matches the characters <td> literally (case sensitive)
\s*
matches any whitespace character (equal to [\r\n\t\f\v ])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
\d+
matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

【讨论】：

感谢您的回答，使用 xpath 怎么样？这个 html 6

Size Stok Pesanan 上次更新

36 4 0

37 0 09 Oct 17, 16:20:34

38 5 0 08 Oct 17, 15:41:15

39 6 0

40 7 0 17 月 4 日 17:27:02

Size	Stok	Pesanan	上次更新
36	4	0
37	0	09 Oct 17, 16:20:34
38	5	0	08 Oct 17, 15:41:15
39	6	0
40	7	0	17 月 4 日 17:27:02