Quantcast

Maximum PC

It is currently Wed Sep 17, 2014 2:40 am

All times are UTC - 8 hours




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: Regular Expressions
PostPosted: Tue Jul 13, 2010 9:12 am 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
Yes, yes I know... Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Need a little help getting this to work (.NET 4.0 / C# btw):

http://www.site.com/attachment.php?attachmentid=[series of numbers]
http://www.site.com/attachments/[some characters]/[some file name]

I am looking to download attachments. I already have code working to suck the text of the page and find <a> tags. Now I am looking for image tags and/or link tags that match the patterns above.

From my class called "LinkScrapper":
Code:
private string anchorRegEx = @"(<a.*?>.*?</a>)";
private string hrefRegEx = @"href=\""(.*?)\""";
private string anchorTextRegEx = @"\s*<.*?>\s*";


ImageScrapper:
Code:
private string anchorRegEx = @"(<img.*?>)";
private string hrefRegEx = @"src=\""(.*?)\""";


Considering a tool like: http://www.regexbuddy.com/

Thoughts?

I think I have just been staring at this too long and stuff is just blending together.....


Top
  Profile  
 
 Post subject:
PostPosted: Tue Jul 13, 2010 9:24 am 
Java Junkie
Java Junkie
User avatar

Joined: Mon Jun 14, 2004 10:23 am
Posts: 24224
Location: Granite Heaven
/me punches you in the face.


Top
  Profile  
 
 Post subject:
PostPosted: Tue Jul 13, 2010 10:51 am 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
Jipstyle wrote:
/me punches you in the face.


OUCH!

SRSLY? WTH!?!?!


Top
  Profile  
 
 Post subject:
PostPosted: Tue Jul 13, 2010 12:55 pm 
Java Junkie
Java Junkie
User avatar

Joined: Mon Jun 14, 2004 10:23 am
Posts: 24224
Location: Granite Heaven
I worked in perl for 2 years. Regex now drives me into a violent rage.

Sorry.


Top
  Profile  
 
 Post subject:
PostPosted: Tue Jul 13, 2010 6:39 pm 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
How else would you scan a website (html text) and suck images and links down?

I know I know, I have two problems now... but RegEx seemed to be the way to go.


Top
  Profile  
 
 Post subject:
PostPosted: Tue Jul 13, 2010 8:15 pm 
Java Junkie
Java Junkie
User avatar

Joined: Mon Jun 14, 2004 10:23 am
Posts: 24224
Location: Granite Heaven
Oh, it is the way I'd do it .. but I'd still punch the person responsible in the face.


Top
  Profile  
 
 Post subject:
PostPosted: Fri Jul 16, 2010 4:48 am 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
Here is a Java version... I know there are shortcut ways to specify "[a-zA-Z0-9]" and other character sets (ie the set of all ints, set of all lower case characters .... L* from formal language theory), but I was being lazy... and they'll probably be a bit different in C#.

Code:
public class RegexTest {

    static String test1 = "http://www.site.com/attachment.php?attachmentid=12345";
    static String test2 = "http://www.site.com/attachments/abc123foo/filename.jpg";   

    public static void test (String str, String rx) {               
        if (str.matches(rx))
            System.out.println("str and rx match");
        else
            System.out.println("fail");
    }
   
    public static void main(String[] args) {

        String r1 = "http://www\\.site\\.com/attachment\\.php\\?attachmentid=[0-9]+";
        test(test1, r1);
       
        String r2 = "http://www\\.site\\.com/attachments/[a-zA-Z0-9]+/[a-zA-Z0-9]+\\.(jpg|jpeg|gif|png)";   
        test(test2, r2);
       
        System.exit(0);
    }
}


Top
  Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group