This article is focus on how to fetch any web page title, meta description and logo path from web site url by C# code.
In this code I have first fetch the html source form provided url and then based on read the string based on few logic fetch the related title and meta description by particular html tags.
Below are the code for how to read html code from url.
WebClient x = new WebClient(); string source = x.DownloadString(“www.nexuslinkservices.com”);
Now below are the code for read title tag:
string title = Regex.Match(source, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>", RegexOptions.IgnoreCase).Groups["Title"].Value;
Below are the code for find meta description:
string description = ""; if (source.ToUpper().Contains("<META NAME=\"DESCRIPTION\"") || source.ToUpper().Contains("<META NAME='DESCRIPTION'") || source.ToUpper().Contains("<META NAME=DESCRIPTION")) { int startpoint = source.ToUpper().IndexOf("<META NAME=\"DESCRIPTION\""); if (startpoint <= 0) { startpoint = source.ToUpper().IndexOf("<META NAME='DESCRIPTION'"); } if (startpoint <= 0) { startpoint = source.ToUpper().IndexOf("<META NAME=DESCRIPTION"); } int startvalue = source.ToLower().IndexOf("content=", startpoint) + 9; int endvalue = source.ToLower().IndexOf(@"""", startvalue + 1); if (endvalue <= 0) { endvalue = source.IndexOf(@"'", startvalue + 1); } description = source.Substring(startvalue, endvalue - startvalue); // break; //} } return description;
For logo the code is bit strange, here we are finding very first image from html and considering the first image is site logo. So that is not 100% sure about the first image is always logo.
GroupCollection matches = Regex.Match(source, @"\<img\b[^>]*\>\s*(?<Img>[\s\S]*?)", RegexOptions.IgnoreCase).Groups; if (matches.Count > 0) { string logotag = matches[0].Value; if (logotag.Length > 0) { int startvalue = logotag.ToLower().IndexOf("src=") + 4; int endvalue = logotag.ToLower().IndexOf("'", startvalue + 1); if (endvalue <= 0) { endvalue = logotag.IndexOf(@"""", startvalue + 1); } string logopath = logotag.Substring(startvalue + 1, endvalue - startvalue); if (!logopath.Contains("http")) { logopath = txtURL.Text.Trim("/".ToCharArray()) + "/" + logopath.Replace("~", "").Trim("/".ToCharArray()).Trim(@"\".ToCharArray()); ; idlog.ImageUrl = logopath.Substring(0, logopath.Length - 1); } } }
Thanks,
Amit Patel
“Enjoy Programming”
