JS：从可变长度的字符串（段落）中提取部分答案

【问题标题】：JS: Extracting Parts from a String (Paragraph) on variable lengthJS：从可变长度的字符串（段落）中提取部分
【发布时间】：2015-07-21 03:55:20
【问题描述】：

My question is similar to this 但它有点复杂，而且我太菜鸟，无法更改那里提供的方法。

我试过substring方法它不能工作，因为字符串的长度是可变的。

我有一个类似的字符串：

Booking:
2 people

User Details:
Firstname Lastname
123456789 
email@domain.com
facebook.com/username

Extras:
Service1
Service2

Pricing:
$1500/-

Comments:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus elementum ultricies pellentesque. Sed ullamcorper orci urna, et sagittis orci rhoncus quis.

Donec laoreet neque lectus, nec congue felis cursus non. Sed ac pulvinar nunc, vel cursus nulla. Curabitur at nisl ipsum. Etiam efficitur quam tortor, id malesuada lacus laoreet ac. Cras varius felis sem, id interdum enim accumsan et.

我需要将以下值存储为变量：

var people = 2
var name   = firstname + lastname
var phone  = 123456789
var email  = email@domain.com
var fbook  = facebook.com/username
var extras = Service1, Service2
var price  = $1500
var comments = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus elementum ultricies pellentesque. Sed ullamcorper orci urna, et sagittis orci rhoncus quis.
    Donec laoreet neque lectus, nec congue felis cursus non. Sed ac pulvinar nunc, vel cursus nulla. Curabitur at nisl ipsum. Etiam efficitur quam tortor, id malesuada lacus laoreet ac. Cras varius felis sem, id interdum enim accumsan et."

请记住，在某些情况下可能缺少一些变量.. 即用户没有输入电子邮件和/或 facebook URL，因此这些行可能是空的，甚至没有空换行符。

【问题讨论】：

谷歌“正则表达式”。

标签： javascript split substring

【解决方案1】：

如果你觉得正则表达式太复杂（即使你不是初学者也很难正确），你可以像这样使用更简单的 javascript：

var input = "Booking:\n2 people\n\nUser Details:\nFirstname Lastname\n123456789\n\nfacebook.com/username\n\nExtras:\nService1\nService2\n\nPricing:\n$1500/-\n\nComments:\nLorem ipsum\n\ndolor sit amet";

// the input string is split into seperate lines and stored in array "lines":

var lines = input.split("\n");

// lines[0]="Booking:", lines[1]="2 people", lines[2]="", lines[3]="User Details" ...

// The lines are split per section, and stored in 2D-array "result":
// With expect=0 we look for sections[0], which is "Bookings".
// If the line "Bookings:" is found, "expect" is incremented to 1, so that
// we're now looking for sections[1], which is "User Details", and so on...
// If a line is found that is not the expected section title, and it's not empty,
// we add the line to the current section with push().

var sections = ["Booking", "User Details", "Extras", "Pricing", "Comments"];
var expect = 0, result = [];

for (var i = 0; i < lines.length; i++) {
    if (lines[i] == sections[expect] + ":") result[expect++] = []
    else if (result.length && lines[i] != "") result[result.length - 1].push(lines[i]);
}

// result[0][0]="2 people" (first line under "Booking")
// result[1][0]="Firstname Lastname" (first line under "User Details")
// result[1][1]="123456789" (second line under "User Details")
// result[1][2]="facebook.com/username" (second line under "User Details")
// ...
// result[4][0]="Lorem ipsum" (first line under "Comments")
// result[4][1]="dolor sit amet" (third line under "Comments", empty line is skipped)

// If all 5 sections have been found, we extract the variables:

var people, name, phone = "", email = "", fbook = "", extras = "", price, comments = "";

if (result.length == 5)
{

// people = the integer number at the beginning of the 1st line of the 1st section:

    people = parseInt(result[0].shift());

// name = the 1st line of the 2nd section:

    name = result[1].shift();

// The rest of the 2nd section is searched for the phone number, email and facebook.
// Because some of these lines may be missing, we cannot simply use the 
// 1st line for phone, the 2nd line for email and the 3rd for facebook.

    while (result[1].length) {
        var temp = result[1].shift();
        if (temp.search("facebook.com/") == 0) fbook = temp
        else if (temp.search("@") > -1) email = temp
        else phone = temp;
    }

// All the lines in the 3rd section are added to string "extras".
// If the string is not empty, we put a comma between the parts:

    while (result[2].length) {
        if (extras.length) extras += ", ";
        extras += result[2].shift();
    }

// price = the floating-point number at the start of the 1st line of the 4th section:

    price = parseFloat(result[3][0].substring(1));

// All the lines in the 5th section are added to string "comments".
// If the string is not empty, we put a newline between the parts:

    while (result[4].length) {
        if (comments.length) comments += "\n";
        comments += result[4].shift();
    }
}

alert("people: " + people + "\nname: " + name + "\nphone: " + phone + "\nemail: " + email + "\nfbook: " + fbook + "\nextras: " + extras + "\nprice: " + price + "\ncomments: " + comments);

【讨论】：

我非常严格地遵循了这个例子，但是真正的输入可能有额外的空格，或者 \r\n 样式的换行符......可以将脚本更改为更宽松格式化。在出现问题之前，您能说出脚本与您的输入相比能走多远吗？（您可以简单地使用例如 alert(lines.length) 来检查拆分功能是否有效，...）
我更深入地研究了它并比较了行输出。结果发现我的输入有一些不同之处，尤其是因为我添加了价格明细。在我使它与您的输入字符串 100% 相似之后，它就起作用了！现在我正在尝试进一步研究它以使其适用于我更新的字符串。你能坚持吗？我需要你解释一下你的脚本的一些工作原理。
我会在脚本中添加更多的 cmets。
我会用正则表达式写一个答案；对于这类事情，它真的更实用。

【解决方案2】：

此方法使用正则表达式。它非常灵活，特别是如果您不确定输入的格式将如何，但它可能会变得非常复杂。这个版本应该没问题，有多余的空格、缺失的数据、不同格式的电话号码、带逗号和小数点的价格、空行......

var input = "Booking:\n2 people\n\nUser Details:\nFirstname Lastname\n+32 (0)9 123.456.789\nme@example.com\nfacebook.com/username\n\nExtras:\nService1\nService2\n\nPricing:\n$1500/-\n\nComments:\nLorem ipsum\n\ndolor sit amet";

var people, name, phone, email, fbook, extras, price, comments, temp;

// split input into 2 parts: data and comments (because the comments could contain any 
// text, including names of sections and other things which may complicate the regex).
var parts = input.match(/^((?:.|\n)*?)\n\s*\n\s*Comments\s*:\s*\n((?:.|\n)*)/i);

if (parts && parts.length > 1)
{
    temp = parts[1].match(/\s*Booking\s*:\s*\n\s*(\d+)\s*(?:person|people)/i);
    if (temp && temp.length == 2) people = temp[1];

    temp = parts[1].match(/\s*User\s*Details\s*:\s*\n\s*(.*?)\n/i);
    if (temp && temp.length == 2) name = temp[1];

    temp = parts[1].match(/\s*User\s*Details\s*:\s*\n(?:.*\n){0,1}\s*([\s\d./()+-]+?)\s*\n/i);
    if (temp && temp.length == 2) phone = temp[1];

    temp = parts[1].match(/\s*User\s*Details\s*:\s*\n(?:.*\n){0,2}\s*(.+?@.+?)\s*\n/i);
    if (temp && temp.length == 2) email = temp[1];

    temp = parts[1].match(/\s*User\s*Details\s*:\s*\n(?:.*\n){0,3}\s*(facebook.com\/.+?)\s*\n/i);
    if (temp && temp.length == 2) fbook = temp[1];

    temp = parts[1].match(/\s*Extras\s*:\s*\n((?:.*\n?)*?)\n\s*Pricing:\s*\n/i);
    if (temp && temp.length == 2) extras = temp[1].replace(/\n+/, ", ").replace(/\n+$/, "");

    temp = parts[1].match(/\s*Pricing\s*:\s*\n\s*([$\d,.]+)/i);
    if (temp && temp.length == 2) price = temp[1];

    if (parts.length > 2) comments = parts[2];
}

alert("people: " + people + "\nname: " + name + "\nphone: " + phone + "\nemail: " + email + "\nfbook: " + fbook + "\nextras: " + extras + "\nprice: " + price + "\ncomments: " + comments);

【讨论】：

每次看到\s* 时都会出现，以防输入中有多余的空格。
这是对 javascript 中正则表达式的一个不错的概述：developer.mozilla.org/en/docs/Web/JavaScript/Guide/…
嗨！好吧，感谢您对（以前的）代码的广泛解释，我已经对其进行了更改以使用我更新的字符串。现在我理解了其中的 90%，除了最初的 for loop { for (var i in lines) } 打破了部分成二维数组..和另一个while循环，我们将逗号放在extras { if (extras.length) extras += ","; } 导致 extras.length 基本上 = extras 中的字母总数.. 那么如何.. 再次感谢您的时间和精力！还请建议我是否应该选择正则表达式解决方案，或者坚持第一个，看看我对什么感到满意。
我会使用正则表达式版本，因为它更灵活；看看预订线如何可以是“2 人”或“1 人”，“用户详细信息”可以是“用户详细信息”或“用户详细信息”，电话、电子邮件和 Facebook 线路可以丢失而不会造成问题。从长远来看，如果您了解正则表达式，您会很高兴。但是，如果您急于完成一个项目，请暂时坚持使用第一个版本。
正则表达式有一个缺点：它可能比其他 javascript 函数更慢；因此，如果您正在运行数千个正则表达式，可能需要几秒钟。