First, let me say that I think TopCoder is great and their problems are some of the best I have seen. BUT, I think they got this one wrong. The problem is 'Unlinker' - SRM 203 DIV2 Hard. I won't post the entire problem statement here - just the definitions (you can sign up for free btw - TC is a great site, even if just to do practice problems). You can find a general description and a much more intelligent solution than my own
here.
Unlinker wrote:
For the purposes of this problem, a weblink is a string consisting of three parts. From left to right, these are the prefix,
domain, and suffix.
The prefix consists of one of the three following strings.
http://
http://www.
www.
The domain is a sequence of one or more characters, each of which is a letter (a character from 'a' to 'z' or from 'A' to 'Z'), a numeral ('0' to '9'), or a period (the character '.').
The suffix is one of the five following strings.
.com
.org
.edu
.info
.tv
There must be no space character within the weblink. The weblink may have any kind of character to the left and right of it. It may also occur at the beginning of the text, at the end of the text, or it may itself be the entire text.
My problem with this problem statement is that they say:
1) a web link is composed of 3 parts.
2) the domain is 1 or more characters long - no spaces.
3) notice that the prefix
http://www. and .tv both have a . in them
4) you might not have caught it from the TC solution, but you're suppose to replace web links with OMIT then a numeral. The first replacement is OMIT1, the second OMIT2, etc.
I did this problem in a practice room, and by completely disregarding the obvious use of regex (idiot - that's what pressure will do to you), came up with a protracted solution that passed the six example test cases. So I submit and run the system test and boom.... it fails. WTF?! Here is system test if failed on....
Code:
arguement: luCJ7www. xgz.tvAd.tvJCyAwww..http://www.tvDgHvH
expected: luCJ7www. Sxgz.tvAd.tvJCyAwww..OMIT1DgHvH
returned: luCJ7www. xgz.tvAd.tvJCyAwww..http://www.tvDgHvH
So what do you guys think? Stop here - think about it for a minute. Grab a scratch paper and go use the can.

My reading of the problem statement led me to believe that
http://www..tv is not a weblink. It contains a prefix and suffix, but no domain. The regex in the example solution is interesting too....
Code:
public String clean(String text) {
Pattern p = Pattern.compile(
"((http://)?www[.]|http://)([a-zA-Z0-9.]+)[.](com|org|edu|info|tv)");
}
Think my interpretation is right? Well, techinically, it is wrong. That test is sort of sneaky and the
http://www.tv is a weblink according to their definition.
prefix: http://
domain: www
suffix: .tv
Yeah, they made www the domain. But they're still wrong - IMO, it is wrong to do something that damn sneaky with the pressure on! Bastards!