Quantcast

Maximum PC

It is currently Thu Sep 18, 2014 7:02 pm

All times are UTC - 8 hours




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: Clusters
PostPosted: Tue Feb 12, 2008 1:58 am 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
Since it is late (and I have a paper due), I'm going to make this quick and come back and fill in any missing details later.

As many of you are probably aware, identifying clusters is a difficult task. If you're not familiar with the problem, simply close your eyes and randomly place dots on a white board. Now open your eyes and think of an algorithm that identifies the dots that are clustered near each other. It is a very difficult and elusive problem in 2 or more dimensions. However, I was reading a paper the other day that said a trivial solution exists for identifying clusters in a single dimension (ie the points are on a line).

Does anyone see a way to identify these 1d clusters!


Top
  Profile  
 
 Post subject: Re: Clusters
PostPosted: Tue Feb 12, 2008 5:58 am 
Bitchin' Fast 3D Z8000
Bitchin' Fast 3D Z8000
User avatar

Joined: Mon Jun 14, 2004 4:04 pm
Posts: 985
Location: Earth
Gadget wrote:
Since it is late (and I have a paper due), I'm going to make this quick and come back and fill in any missing details later.

As many of you are probably aware, identifying clusters is a difficult task. If you're not familiar with the problem, simply close your eyes and randomly place dots on a white board. Now open your eyes and think of an algorithm that identifies the dots that are clustered near each other. It is a very difficult and elusive problem in 2 or more dimensions. However, I was reading a paper the other day that said a trivial solution exists for identifying clusters in a single dimension (ie the points are on a line).

Does anyone see a way to identify these 1d clusters!


Stop asking us to do your homework for you. ;P


Top
  Profile  
 
 Post subject: Re: Clusters
PostPosted: Tue Feb 12, 2008 6:51 am 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
Gadget wrote:
Does anyone see a way to identify these 1d clusters!


Why can't you just use a distance formula or some matrix algebra to determine what is close/clustered based on a set distance, say 2 unit away?


Top
  Profile  
 
 Post subject: Re: Clusters
PostPosted: Tue Feb 12, 2008 7:32 am 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
CrashTECH wrote:
Gadget wrote:
Does anyone see a way to identify these 1d clusters!


Why can't you just use a distance formula or some matrix algebra to determine what is close/clustered based on a set distance, say 2 unit away?

How do you know a priori that 2 is the magic number? What if most points are 20 units apart, but there are two nice dense clusters of points that are 2.1 units apart? Alternatively, what if the graph is only 2 units wide with points distributed evenly apart from each other.

What is this matrix algebra technique that you speak of...


Top
  Profile  
 
 Post subject:
PostPosted: Tue Feb 12, 2008 9:51 am 
SON OF A GUN
SON OF A GUN
User avatar

Joined: Mon Nov 01, 2004 5:41 am
Posts: 11605
I don't know, I would assume that would be something to be specified. I would guess that you could look for patterns and it would depend on scale what the magic number was.

Linear Algebra + distance


Top
  Profile  
 
 Post subject:
PostPosted: Tue Feb 12, 2008 4:41 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
CrashTECH wrote:
I don't know, I would assume that would be something to be specified.

No, no, no. The point is to write a simple program to find the clusters in one dimension. I know that you can spot a cluster. =)

CrashTECH wrote:
I would guess that you could look for patterns and it would depend on scale what the magic number was.

Looking for patterns is another complex solution. This is supposed to be easy.


Top
  Profile  
 
 Post subject:
PostPosted: Tue Feb 12, 2008 7:11 pm 
Smithfield*
Smithfield*
User avatar

Joined: Fri Jul 09, 2004 9:17 am
Posts: 7159
Location: In HyperTransport
Data mining class?

I've done a fair amount of data mining research and implementation. Theres a multitude of ways to approach this as your probably know. For most of these methods you'll have to have some kind of extra input parameter corresponding to quality, closeness, distance, etc. some boundary depending on approach.

The suggest approach thus far of a definite distance is one way, if a naive one. This would be easy to code but not very adaptable.

What I would consider is another partitional clustering algorithm. A less distance restricted way is the "fuzzy" way which is fuzzy because of fuzzy logic. Roughly speaking, you either chose a number of clusters or percent or you can even make it dynamic, then you build up cluster sets based on some metric(an easy one is 1/d where d is the distance) and once you're happy with the set you've built, remove the points and try again and eventually end up with the best sets being the most clustered.

Another way is similar to an MST approach with the Prim algorithm. You build up cluster sets by combining the two closest points. Again, you have to have some kind of extra parameter determining amount per set, number of sets, distance or something else.

If you're more mathematically minded than I, there are also some other ways which I can't explain so well.

This is all really general since I know this is your homework. If you want me to get more specific I can.


Top
  Profile  
 
 Post subject:
PostPosted: Mon Feb 18, 2008 6:59 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
gramaton cleric wrote:
Data mining class?

Close... Dynamics and structure of Networked Data. I came across a statement that there was an easy solution while reviewing some papers related to my own research. Trust me... I wish that I had simple homework problems at this point. =)

gramaton cleric wrote:
I've done a fair amount of data mining research and implementation. Theres a multitude of ways to approach this as your probably know. For most of these methods you'll have to have some kind of extra input parameter corresponding to quality, closeness, distance, etc. some boundary depending on approach.

That is one of the points that I never got around to mentioning. In higher dimensional spaces, there never seems to be a single distance measure that works for all types of clusters. You can almost always come up with a trivial counter-example. I've been wondering if this isn't the case with a single dimension though.

gramaton cleric wrote:
What I would consider is another partitional clustering algorithm. A less distance restricted way is the "fuzzy" way which is fuzzy because of fuzzy logic. Roughly speaking, you either chose a number of clusters or percent or you can even make it dynamic, then you build up cluster sets based on some metric(an easy one is 1/d where d is the distance) and once you're happy with the set you've built, remove the points and try again and eventually end up with the best sets being the most clustered.

Again, the 1d case is supposed to have a simple solution. I don't think that it will require a priori knowledge of the number of clusters. Currently, my personal preference is the modularity based methods (they are almost certainly much faster and scalable).

gramaton cleric wrote:
Another way is similar to an MST approach with the Prim algorithm. You build up cluster sets by combining the two closest points. Again, you have to have some kind of extra parameter determining amount per set, number of sets, distance or something else.

I've heard people mention using an MST for this type of thing in the past. I just have this feeling that it fails with certain configurations.


Top
  Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group