FeedC# TutorialArchiveAbout

C# String

C# String

A string in C# is an object of type String whose value is text. The string object contains an array of Char objects internally. A string has a Length property that shows the number of Char objects it contains.

String is a reference type that looks like value type (for example, the equality operators == and != are overloaded to compare on value, not on reference).

In C#, you can refer to a string both as String and string. You can use whichever naming convention suits you. The string keyword is just an alias for the .NET Framework’s String.

Contents

String Interning

In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable.

Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned.

The distinct values are stored in a string intern pool.

String interning is supported by some modern programming languages, including .NET languages.

var s1 = "hello";
var s2 = "hello";

When creating two identical string literals in one compilation unit, the compiler ensures that only one string object is created by the CLR. This is string interning, which is done only at compile time. Like in the above example.

Doing it at runtime would incur too much of a performance penalty. Searching through all strings every time you create a new one is too costly. Below are some example where string interning does not work and creates new object in the heap.

var s3 = new String(new char []{ 'h', 'e', 'l', 'l', 'o'}); 
//string interning will not happen here

Note: The string object contains an array of Char objects internally. A string has a Length property that shows the number of Char objects it contains. You can also use a foreach loop or indexer with a string and access its individual Char objects.

var s4 = String.Copy(s1); //string interning will not happen here

The static String.Copy(string ...) method creates a new instance of a string by copying an existing instance.

Let’s run the program and see the result,

var s1 = "hello";
var s2 = "hello";
var s3 = new String(new char []{ 'h', 'e', 'l', 'l', 'o'});
var s4 = String.Copy(s1);

Console.WriteLine("s1 = {0}", s1);
Console.WriteLine("s2 = {0}", s2);
Console.WriteLine("s3 = {0}", s3);
Console.WriteLine("s4 = {0}", s4);

Console.WriteLine(ReferenceEquals(s1,s2));  //True
Console.WriteLine(ReferenceEquals(s1, s3)); //False     
Console.WriteLine(ReferenceEquals(s1, s4)); //False

Here, ReferenceEquals method determines whether the specified System.Object instances (String in our case) are the same instance.

Thus, ReferenceEquals(s1, s3) and ReferenceEquals(s1, s4) will be false in our case because, s1, s3, and s4 are three different objects in the heap.

However, ReferenceEquals(s1,s2) will be true because, s1 and s2 have reference to same object.

Now, the below condition will be true in our case because, String is a reference type that looks like value type. Here, the equality operators == and !=, for String type, are overloaded to compare on value and not on reference.

Console.WriteLine(s1 == s2); //True
Console.WriteLine(s1 == s3); //True
Console.WriteLine(s1 == s4); //True

String Immutability

One of the special characteristics of a string is that it is immutable, so it cannot be changed after it has been created. Every change to a string will create a new string. This is why all of the String manipulation methods return a string.

Immutability is useful in many scenarios. Reasoning about a structure if you know it will never change is easier. It cannot be modified so it is inherently thread-safe.

It is more secure because no one can mess with it. Suddenly something like creating undo-redo is much easier, your data structure is immutable and you maintain only snapshots of your state.

But immutable data structures also have a negative side. Let’s see how,

string s = string.Empty;
for (int i = 0; i < 10000; i++)
{
    s += "x";
}
Console.WriteLine(s);

The above looks innocent, but it will create a new string for each iteration in your loop. It uses a lot of unnecessary memory and shows why you have to use caution when working with strings.

This code will run 10,000 times, and each time it will create a new string. The reference s will point only to the last item, so all other strings are immediately ready for garbage collection.

When working with such a large number of string operations, you have to keep in mind that string is immutable and that the .NET Framework offers some special helper classes when dealing with strings. Let’s see one such class

var sb = new StringBuilder();

StringBuilder

The StringBuilder class can be used when you are working with strings in a tight loop. Instead of creating a new string over and over again, you can use the StringBuilder, which uses a string buffer internally to improve performance.

The StringBuilder class even enables you to change the value of individual characters inside a string.

Our previous example of concatenating a string 10,000 times can be rewritten with a StringBuilder,

var sb = new StringBuilder();
for (int i = 0; i < 10000; i++)
{
    sb.Append("x");
}
Console.WriteLine(sb.ToString());

One thing to keep in mind is that the StringBuilder does not always give better performance.

Note: When concatenating a fixed series of strings, the compiler can optimize this and combine individual concatenation operations into a single operation. When you are working with an arbitrary number of strings, such as in the loop example, a StringBuilder is a better choice (in this example, you could have also used new String('x', 10000) to create the string; when dealing with more varied data, this won’t be possible).

Console.WriteLine(new String('x', 10000));

Format Strings

The String.Format method allows a wide range of formatting options for string data. The first parameter of this method can be passed a string that may look similar to the following:

var i = 5;
var s = "The value of i is {0,5:G}";
Console.WriteLine(String.Format(s, i)); //Displays The value of i is     5
Console.WriteLine(s, i); //Equivalent to above code

Here, ‘The value of i is ‘ will be displayed as is, with no changes. The interesting part of this string is the section enclosed in braces. This section has the following form:

{index, alignment:formatString}

The section can contain the following three parts:

  • index: A number identifying the zero-based position of the section’s data in the args parameter array. The data is to be formatted accordingly and substituted for this section. This number is required.

  • alignment: The number of spaces to insert before or after this data. A negative number indicates left justification (spaces are added to the right of the data), and a positive number indicates right justification (spaces are added to the left of the data). This number is optional.

  • formatString: A string indicating the type of formatting to perform on this data. This section is where most of the formatting information usually resides. Tables Table 2-2 and Table 2-3 contain valid formatting codes that can be used here. This part is optional.

In our case, {0,5:G}, 0 is our index, 5 is the number of spaces to be added before value and G represents the standard numeric format which displays the value in its shortest form.

You can get a complete list of the standard format specifiers here.

String Interpolation

String interpolation available in C# 6.0 and later, interpolated strings are identified by the $ special character and include interpolated expressions in braces.

String interpolation achieves the same results as the String.Format method, but improves ease of use and inline clarity.

Use string interpolation to improve the readability and maintainability of your code.

Console.WriteLine($"The value of i is {i,5:G}"); //Equivalent to above code

Instead of passing i as second parameter we can directly use variable i within the string (in place of index)

ToString

In addition to the String.Format and the Console.WriteLine methods, the overloaded ToString instance method of a value type may also use the previous formatting characters.

Using ToString, the code would look like this:

var f = 3.1417f;
string value = f.ToString("Value of pi is 00.####");
Console.WriteLine(value); //Displays Value of pi is 03.1417

Here the overloaded ToString method accepts a single parameter of type IFormatProvider. The IFormatProvider provided for the f.ToString method is a string containing the formatting for the value type plus any extra text that needs to be supplied.

Overriding ToString

For user defined types we can override the ToString method. System.Object has a virtual ToString method. Object class supports all classes in the .NET class hierarchy and provides low-level services to derived classes. System.Object is the ultimate base class of all .NET classes; it is the root of the type hierarchy.

Overriding Object’s ToString method is a good practice. If you don’t do this, ToString will return by default the name of your type. When you override ToString, you can give it a more meaningful value, as below code shows.

public class Person
{
    public Person(string firstName, string lastName)
    {
        this.FirstName = firstName;
        this.LastName = lastName;
    }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public override string ToString()
    {
        return FirstName + " " + LastName;
    }
}
var p = new Person("John", "Doe");
Console.WriteLine(p); //Displays John Doe

You can also implement this custom formatting on your own types. You do this by creating a ToString(string) method on your type. When doing this, make sure that you are compliant with the standard format strings in the .NET Framework.

For example, a format string of G should represent a common format for your object (the same as calling ToString()) and a null value for the format string should also display the common format.

Here is an example,

public class Person
{
    ...
    ...

    public string ToString(string format)
        {
            if (string.IsNullOrWhiteSpace(format) || format == "G")
            {
                format = "FL";
            }

            format = format.Trim().ToUpperInvariant();
            switch (format)
            {
                case "FL":
                    return FirstName + " " + LastName;
                case "LF":
                    return LastName + " " + FirstName;
                case "FSL":
                    return FirstName + ", " + LastName;
                case "LSF":
                    return LastName + ", " + FirstName;
                default:
                    throw new FormatException(String.Format(
                    "The ‘{0}' format string is not supported.", format));
            }
        }
}
Console.WriteLine(p.ToString("G")); //Displays John Doe
Console.WriteLine(p.ToString("LSF")); //Displays Doe, John

Escape Sequences and Verbatim Strings

When declaring a string variable, certain characters can’t, for various reasons, be included in the usual way. C# supports two different solutions to this problem.

The first approach is to use ‘escape sequences’. For example, suppose that we want to set variable a to the value:

"Hello World
How are you"

We could declare this using the following command, which contains escape sequences for the quotation marks and the line break.

string a = "\"Hello World\nHow are you\"";

The following table gives a list of the escape sequences for the characters that can be escaped in this way:

Character Escape Sequence
' \'
" \"
\ \
Alert \a
Backspace \b
Form feed \f
New Line \n
Carriage Return \r
Horizontal Tab \t
Vertical Tab \v
A unicode character specified by its number e.g. \u200 \u
A unicode character specified by its hexidecimal code e.g. \xc8 \x
null \0 (zero)

The second approach is to use ‘verbatim string’ literals. These are defined by enclosing the required string in the characters @” and “. To illustrate this, to set the variable ‘path’ to the following value:

C:\My Documents\

we could either escape the back-slash characters

string path = "C:\\My Documents\\"

or use a verbatim string thus:

string path = @"C:\MyDocuments\"

Usefully, strings written using the verbatim string syntax can span multiple lines, and whitespace is preserved. The only character that needs escaping is the double-quote character, the escape sequence for which is two double-quotes together. For instance, suppose that you want to set the variable ‘text’ to the following value:

the word "big" contains three letters.

Using the verbatim string syntax, the command would look like this:

string text = @"the word ""big"" contains three letters."

Searching For Strings

When working with strings, you often look for a substring inside another string (to parse some content or to check for valid user input or some other scenario).

The String class offers a couple of methods that can help you perform all kinds of search actions. The most common are IndexOf, LastIndexOf, StartsWith, EndsWith, and SubString.

IndexOf returns the index of the first occurrence of a character or substring within a string. If the value cannot be found, it returns -1. The same is true with LastIndexOf, except this method begins searching at the end of a string and moves to the beginning.

string value = "My Sample Value";
int indexOfp = value.IndexOf('p'); // returns 6
int lastIndexOfm = value.LastIndexOf('m'); // returns 5

StartsWith and EndsWith see whether a string starts or ends with a certain value, respectively. It returns true or false depending on the result.

string value = "< mycustominput >";
Console.WriteLine(value.StartsWith("<"));
Console.WriteLine(value.EndsWith(">"));

Substring can be used to retrieve a partial string from another string. You can pass a start and a length to Substring. If necessary, you can calculate these indexes by using IndexOf or LastIndexOf.

string value = "My Sample Value";
string subString = value.Substring(3, 6); // Returns 'Sample'

Use the Replace method to replace all occurrences of a specified substring with a new string. Like the Substring method, Replace actually returns a new string and does not modify the original string.

string[] names = { "Mr. Henry Hunt", "Ms. Sara Samuels",
                    "Abraham Adams", "Ms. Nicole Norris" };
foreach (string name in names)
{
    Console.WriteLine(name.Replace("Mr. ", String.Empty)
        .Replace("Ms. ", String.Empty));
}

/*
Output:
Henry Hunt
Sara Samuels
Abraham Adams
Nicole Norris
*/

That’s all for now. Thanks for reading the post. Please let me know if there is any mistake or any modification needed in the comment below. Thanks in advance!


comments powered by Disqus