Home
Search
 
What's New
Index
Books
Links
Q & A
Newsletter
Banners
 
Feedback
Tip Jar
 
C# Helper...
 
XML RSS Feed
Follow VBHelper on Twitter Follow VBHelper on Twitter
 
 
 
MSDN Visual Basic Community
 
 
 
 
 
TitleUse regular expressions and LINQ to list the unique words contained in a text file in Visual Basic .NET
DescriptionThis example shows how to use regular expressions and LINQ to list the unique words contained in a text file in Visual Basic .NET.
Keywordsfiles, regular expressions, LINQ, strings, replace, unique words, words, count words, Visual Basic .NET, VB.NET , example, example program, Windows Forms programming
CategoriesSyntax, Files and Directories, Strings
 

When you enter the name of a file and click List Words, the following code executes.

 
' List the words in the file.
Private Sub btnGo_Click(ByVal sender As System.Object, ByVal _
    e As System.EventArgs) Handles btnGo.Click
    ' Get the file's text.
    Dim txt As String = File.ReadAllText(txtFile.Text)

    ' Use regular expressions to replace characters
    ' that are not letters or numbers with spaces.
    Dim reg_exp As New Regex("[^a-zA-Z0-9]")
    txt = reg_exp.Replace(txt, " ")

    ' Split the text into words.
    Dim words() As String = txt.Split( _
        New Char() {" "c}, _
        StringSplitOptions.RemoveEmptyEntries)

    ' Use LINQ to get the unique words.
    Dim word_query = _
        (From word As String In words _
         Order By word _
         Select word).Distinct()

    ' Display the result.
    lstWords.DataSource = word_query.ToArray()
    lblSummary.Text = lstWords.Items.Count & " words"
End Sub
 
The code first uses File.ReadAllText to copy the file's text into a string.

Next the code uses regular expressions to replace non-letter and non-number characters with spaces. It uses the pattern [^a-zA-Z0-9]. The ^ means "not the following characters." The a-zA-Z0-9 part means any lowercase or uppercase letter or a digit. The code uses the Regex object's Replace method to replace characters that match the pattern with a space.

The code then uses Split to break the text into an array of words, removing any adjacent duplicates.

The code uses LINQ to select all of the words from the array and sort them. It uses the Distinct method to remove duplicates.

Finally the code displays the words in a ListBox and displays the number of words in a Label.

 
 
 
 
Copyright © 1997-2010 Rocky Mountain Computer Consulting, Inc.   All rights reserved.
  Updated