使用 string

C++ 標準函式庫提供 string，可以使用這個類別來建立字串，便於進行高階的字串操作，像是字串指定、串接等，若要表現字串，C++ 建議使用 string，這要先包含 string 標頭檔：

#include <string>

可以使用以下方式來建立 string 實例，例如：

string str1;                 // 內容為空字串
string str2("caterpillar");  // 內容為指定的字串常量
string str3(str2);           // 以 str1 實例建立字串
string str4 = "Justin";      // 內容為指定的字串常量

第一個方式建立了空字串，長度為 0；第二個方式會以字面常量內容來建立 string 實例；第三個方法會複製 str2，並建立一個新的 string 實例，第四個方式也是以字面常量內容來建立 string 實例。

可以使用 size() 或 length() 來取得字串長度，使用 empty() 測試字串是否為空，使用 == 比較兩個字串的內容是否相同，例如：

#include <iostream> 
#include <string> 
using namespace std; 

int main() { 
    string str1; 
    string str2 = "caterpillar"; 
    string str3(str2); 

    cout << "str1 是否為空：" << str1.empty() << endl;
    cout << "str1 長度: " << str1.size() << endl;
    cout << "str2 長度: " << str2.size() << endl;
    cout << "str3 長度: " << str3.size() << endl;

    cout << "str1 與 str2 內容是否相同：" << (str1 == str2) << endl;
    cout << "str2 與 str3 內容是否相同：" << (str3 == str3) << endl;

    return 0; 
}

執行結果：

str1 是否為空：1
str1 長度: 0
str2 長度: 11
str3 長度: 11
str1 與 str2 內容是否相同：0
str2 與 str3 內容是否相同：1

可以將字串指定給另一個字串，例如：

string str1 = "text1";
string str2 = "text2";
....
str1 = str2;

以上指定會將 str1 原本的字串記憶體空間釋放，並重新配置足夠容納 str2 的記憶體空間，然後將 str2 的內容逐一複製至 str1；也可以將一個 C 風格的字串指定給 string，例如：

string name = "caterpillar";
char str[] = "Justin";
name = str;

然而不能將一個 string 實例指定給字元陣列，例如：

char str[] = "Justin";
string name = "caterpillar";
str = name; // error

可以使用 + 運算子來串接字串，例如：

str1 = str1 + str2;
str1 = str1 + "\n";

string 實例可以使用 [] 指定索引來存取相對應位置的 char，就有如字元陣列的操作一般，例如：

#include <iostream> 
#include <string> 
using namespace std; 

int main() { 
    string name = "caterpillar";

    for(int i = 0; i < name.length(); i++) {
        cout << name[i] << endl;
    }

    return 0; 
}

對於循序走訪的需求，可以使用 for range 語法：

#include <iostream> 
#include <string> 
using namespace std; 

int main() { 
    string name = "caterpillar";

    for(auto ch : name) {
        cout << ch << endl;
    }

    return 0; 
}

那麼是該面對問題的時候了，對於 "caterpillar"，以上範例是會逐一顯示各個字母沒錯，若是中文呢？這就要回憶一下〈字元陣列與字串〉的內容了，例如以上會顯示什麼呢？

string name = "良葛格";
cout << name.length() << endl;

類似地，這要看你的原始碼編碼是什麼，以及編譯時下的參數是什麼，如果是這個的話：

string name = u8"良葛格";
cout << name.length() << endl;

就是顯示 9 了，因為一個中文字以 UTF-8 編碼的話，會有三個位元組，也就是說，對於 C++ 來說，這是個多位元組字元組成的字串，每個中文要使用三個 char，若要正確顯示中文的話，你的文字模式要是 UTF-8，然後如下撰寫程式：

#include <iostream> 
#include <string> 
using namespace std; 

int main() { 
    string name = u8"良葛格";
    for(int i = 0; i < name.length(); i += 3) {
        cout << name.substr(i, 3) << endl;
    }

    return 0; 
}

string 的 substr 方法可以指定索引與 char 長度來取得子字串，因為是 UTF-8 編碼，每次要取三個 char，然後索引加 3。

這當然是蠻麻煩的一件事，而且 UTF-8 編碼的字串中若有中英、數字夾雜的話就很麻煩了，〈字元陣列與字串〉中談過 wchar_t、char16_t、char32_t 與 char8_t，string 標頭中對應的版本是 wstring、u16string、u32string 與 u8string。

因此，若直接搭配〈字元陣列與字串〉中的 toUTF8 函式，可以直接使用 cout 來顯示中文，以 wstring 為例的話：

#include <iostream>
#include <string>

using namespace std;
string toUTF8(int cp);

int main(int argc, char *argv[]) {
    // 在 UTF-8 終端機下會顯示正確中文
    wstring name = L"良葛格（Justin）";
    for(int i = 0; i < name.length(); i++) {
        cout <<  toUTF8(name[i]) << endl;
    }

    return 0;
}

string toUTF8(int cp) {
    char ch[5] = {0x00};
    if(cp <= 0x7F) { 
        ch[0] = cp; 
    }
    else if(cp <= 0x7FF) { 
        ch[0] = (cp >> 6) + 192; 
        ch[1] = (cp & 63) + 128; 
    }
    else if(0xd800 <= cp && cp <= 0xdfff) {} // 無效區塊
    else if(cp <= 0xFFFF) { 
        ch[0] = (cp >> 12) + 224; 
        ch[1]= ((cp >> 6) & 63) + 128; 
        ch[2]= (cp & 63) + 128; 
    }
    else if(cp <= 0x10FFFF) { 
        ch[0] = (cp >> 18) + 240; 
        ch[1] = ((cp >> 12) & 63) + 128; 
        ch[2] = ((cp >> 6) & 63) + 128; 
        ch[3]= (cp & 63) + 128; 
    }
    return string(ch);
}

C++ 11 提供了 wstring_convert，便於在 string 與 wstring 間轉換，例如，若要將 wstring 轉換為 UTF-8 編碼的 string，可以如下：

#include <locale>
#include <codecvt>
#include <string>
#include <iostream>
using namespace std;

int main() {
    wstring_convert<codecvt_utf8<wchar_t>> utf8;
    wstring ws = L"良葛格";
    string s = utf8.to_bytes(ws);
    cout << s << endl; // 在 UTF-8 編碼終端機下可顯示中文
}

若只是要在標準輸出中顯示 wstring，可以使用 wcout，不過必須設定正確的 locale，才能顯示正確的字元，在許多文件中都會寫到可以這麼使用：

#include <iostream>
#include <locale>
using namespace std;

int main( void ) {
    locale loc("cht");
    wcout.imbue(loc);

    wcout << L"良葛格" << endl;

    return 0;
}

不過我使用 MinGW-w64，GNU 編譯器版本 8.1.0 編譯後，執行時會發生底下的問題：

terminate called after throwing an instance of 'std::runtime_error'
  what():  locale::facet::_S_create_c_locale name not valid

這似乎是 Windows 上 MinGW-w64 的問題，我找到以下的方式解決：

#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    locale utf8(std::locale(), new codecvt_utf8<wchar_t>);
    wcout.imbue(utf8);

    wcout << L"良葛格" << endl; // 在 UTF-8 編碼終端機下可顯示中文

    return 0;
}