|
|
Title | Grab images from a Web page in Visual Basic .NET |
Description | This example shows how to grab images from a Web page in Visual Basic .NET. It uses a WebBrowser control to go to a Web page. It read that object's Document property to get an HtmlDocument. That object's Images property returns information about the page's images. Finally the program uses a WebClient to download the images. The program provides some other handy features such as the ability to view the images and select those that should be saved into files. |
Keywords | grab images, Web, HTML, Visual Basic .NET, VB.NET, WebBrowser, WebClient, download, download images, screen scraping, HtmlDocument |
Categories | VB.NET, Utilities, Internet |
|
|
This description only touches on the most interesting parts of the program. Download it to see the details.
You can click the links on the WebBrowser to navigate to a Web page, or enter a URL and click the Go button to navigate there. The following code shows how the program navigates.
|
|
' Navigate to the entered URL.
Private Sub btnGo_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles btnGo.Click
Try
wbrWebSite.Navigate(txtUrl.Text)
Catch ex As Exception
MessageBox.Show("Error navigating to web site " & _
txtUrl.Text & vbCrLf & ex.Message, _
"Navigation Error", _
MessageBoxButtons.OK, _
MessageBoxIcon.Error)
End Try
End Sub
|
|
After you have navigated to the desired Web page, click the List Images button to execute the following code. The program removes all controls from the flpPictures FlowLayoutPanel control by setting their Parent properties to Nothing. This removes all references to those controls so they are destroyed when garbage collection runs.
Next the code gets the WebBrowser's Document property, which returns an HtmlDocument object representing the Web page, and loops through the HtmlDocument's Images collection. It gets the image object's src property, which contains the image's URL.
The code makes a new PictureBox, calls subroutine GetPicture to download the image into the PictureBox, and places the PictureBox in the flpPictures FlowLayoutPanel. That control automatically arranges its children in rows, wrapping when necessary, and displaying scroll bars if the pictures don't all fit. Notice that the code saves the image's URL in the PictureBox's Tag property.
Finally the code registers the pic_Click event handler to catch the PictureBox's Click events.
This routine also contains code to let you see new PictureBoxes as they are created and to stop the loop before it finishes. See the code for details.
|
|
' Show the images from the URL.
Private m_Running As Boolean = False
Private Sub btnListImages_Click(ByVal sender As _
System.Object, ByVal e As System.EventArgs) Handles _
btnListImages.Click
If btnListImages.Text = "List Images" Then
Me.Cursor = Cursors.WaitCursor
btnListImages.Text = "Stop"
btnGo.Enabled = False
btnSaveImages.Enabled = False
Application.DoEvents()
' Remove old images.
For i As Integer = flpPictures.Controls.Count - 1 _
To 0 Step -1
flpPictures.Controls(i).Parent = Nothing
Next i
' List the images on this page.
Dim doc As System.Windows.Forms.HtmlDocument = _
wbrWebSite.Document
m_Running = True
For Each element As HtmlElement In doc.Images
Dim dom_element As mshtml.HTMLImg = _
element.DomElement
Dim src As String = dom_element.src
Dim pic As New PictureBox()
pic.BorderStyle = BorderStyle.Fixed3D
pic.SizeMode = PictureBoxSizeMode.AutoSize
pic.Image = GetPicture(src)
pic.Parent = flpPictures
pic.Tag = src
tipFileName.SetToolTip(pic, src)
AddHandler pic.Click, AddressOf pic_Click
Application.DoEvents()
If Not m_Running Then Exit For
Next element
m_Running = False
btnListImages.Text = "List Images"
btnGo.Enabled = True
btnSaveImages.Enabled = True
Me.Cursor = Cursors.Default
Else
m_Running = False
End If
End Sub
|
|
The GetPicture function uses a WebClient to download a picture. It calls the WebClient's DownloadData method to pull the image down into a memory stream. It then uses the Image class's FromStream method to convert the stream into an image.
|
|
' Get the picture at a given URL.
Private Function GetPicture(ByVal url As String) As Image
Try
url = Trim(url)
If Not url.ToLower().StartsWith("http://") Then url _
= "http://" & url
Dim web_client As New WebClient()
Dim image_stream As New _
MemoryStream(web_client.DownloadData(url))
Return Image.FromStream(image_stream)
Catch ex As Exception
MessageBox.Show("Error downloading picture " & _
url & vbCrLf & ex.Message, _
"Download Error", _
MessageBoxButtons.OK, _
MessageBoxIcon.Error)
End Try
Return Nothing
End Function
|
|
After you display the images, click on any that you don't want to download. When you click on a PictureBox, the following code sets that control's Parent property to Nothing. That removes it from the FlowLayoutPanel, which automatically rearranges its remaining children.
|
|
' Remove the clicked PictureBox.
Private Sub pic_Click(ByVal sender As System.Object, ByVal _
e As System.EventArgs)
Dim pic As PictureBox = DirectCast(sender, PictureBox)
pic.Parent = Nothing
End Sub
|
|
When you click the Save Images button, the following code loops through the PictureBoxes that remain in the FlowLayoutPanel. It gets each image's file name from the PictureBox's Tag property and saves the control's image in an appropriately named file.
|
|
' Save the images that have not been removed.
Private Sub btnSaveImages_Click(ByVal sender As _
System.Object, ByVal e As System.EventArgs) Handles _
btnSaveImages.Click
Dim dir_name As String = txtDirectory.Text
If Not dir_name.EndsWith("\") Then dir_name &= "\"
For Each pic As PictureBox In flpPictures.Controls
Dim bm As Bitmap = pic.Image
Dim filename As String = pic.Tag
filename = _
filename.Substring(filename.LastIndexOf("/") + _
1)
Dim ext As String = _
filename.Substring(filename.LastIndexOf("."))
Dim full_name As String = dir_name & filename
Select Case ext
Case ".bmp"
bm.Save(full_name, Imaging.ImageFormat.Bmp)
Case ".gif"
bm.Save(full_name, Imaging.ImageFormat.Gif)
Case ".jpg", "jpeg"
bm.Save(full_name, Imaging.ImageFormat.Jpeg)
Case ".png"
bm.Save(full_name, Imaging.ImageFormat.Png)
Case ".tiff"
bm.Save(full_name, Imaging.ImageFormat.Tiff)
Case Else
MessageBox.Show( _
"Unknown file type " & ext & _
" in file " & filename, _
"Unknown File Type", _
MessageBoxButtons.OK, _
MessageBoxIcon.Error)
End Select
Next pic
Beep()
End Sub
|
|
This program still has a few weak spots. The error handling isn't perfect. For example, you can click the Save Images button even if you haven't listed any images. The program simply saves zero files so it doesn't hurt anything but it would make sense to disable that button unless some images were displayed.
The program also downloads images when it needs them rather than pulling them from cache so it isn't as fast as it might be. It also probably cannot save images that are generated on the fly by the Web server.
|
|
|
|
|
|