搜档网
当前位置:搜档网 › Correlation clustering in general weighted graphs

Correlation clustering in general weighted graphs

Correlation clustering in general weighted graphs
Correlation clustering in general weighted graphs

Theoretical Computer Science361(2006)172–187

https://www.sodocs.net/doc/23786728.html,/locate/tcs Correlation clustering in general weighted graphs?

Erik D.Demaine a,?,Dotan Emanuel b,Amos Fiat b,Nicole Immorlica a,1

a Computer Science and Arti?cial Intelligence Laboratory,Massachusetts Institute of Technology,32Vassar Street,Cambridge,MA02139,USA

b Department of Computer Science,School of Mathematical Sciences,Tel Aviv University,Tel Aviv,Israel

Abstract

We consider the following general correlation-clustering problem[N.Bansal,A.Blum,S.Chawla,Correlation clustering,in:Proc. 43rd Annu.IEEE Symp.on Foundations of Computer Science,Vancouver,Canada,November2002,pp.238–250]:given a graph with real nonnegative edge weights and a + / ? edge labelling,partition the vertices into clusters to minimize the total weight of cut + edges and uncut ? edges.Thus, + edges with large weights(representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster while ? edges with large weights encourage the endpoints to belong to different clusters.In contrast to most clustering problems,correlation clustering speci?es neither the desired number of clusters nor a distance threshold for clustering;both of these parameters are effectively chosen to be the best possible by the problem de?nition. Correlation clustering was introduced by Bansal et al.[Correlation clustering,in:Proc.43rd Annu.IEEE Symp.on Foundations of Computer Science,Vancouver,Canada,November2002,pp.238–250],motivated by both document clustering and agnostic learning.They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete(full information)and every edge has the same weight.We give an O(log n)-approximation algorithm for the general case based on a linear-programming rounding and the“region-growing”technique.We also prove that this linear program has a gap of (log n),and therefore our approximation is tight under this approach.We also give an O(r3)-approximation algorithm for K r,r-minor-free graphs.On the other hand,we show that the problem is equivalent to minimum multicut,and therefore APX-hard and dif?cult to approximate better than (log n).

?2006Elsevier B.V.All rights reserved.

Keywords:Clustering;Partitioning;Approximation algorithms;Minimum multicut

1.Introduction

Clustering objects into groups is a common task that arises in many applications such as data mining,web analysis, computational biology,facility location,data compression,marketing,machine learning,pattern recognition,and computer vision.Clustering algorithms for these and other objectives have been heavily investigated in the literature. For partial surveys,see e.g.[8,14,19–21,25].

In a theoretical setting,the objects are usually viewed as points in either a metric space(typically?nite)or a general distance matrix,or as vertices in a graph.Typical objectives include minimizing the maximum diameter of a cluster

?Preliminary versions of this paper appeared in APPROX2003[6]and ESA2003[7].

?Corresponding author.Tel.:+16172536871;fax:+16172585429.

E-mail addresses:edemaine@https://www.sodocs.net/doc/23786728.html,(E.D.Demaine),dotan@https://www.sodocs.net/doc/23786728.html,(D.Emanuel),?at@tau.ac.il(A.Fiat),

nickle@https://www.sodocs.net/doc/23786728.html,(N.Immorlica).

1Research was supported in part by an NSF GRF.

0304-3975/$-see front matter?2006Elsevier B.V.All rights reserved.

doi:10.1016/j.tcs.2006.05.008

E.D.Demaine et al./Theoretical Computer Science361(2006)172–187173 (k-clustering)[11],minimizing the average distance between pairs of clustered points(k-clustering sum)[24],mini-mizing the maximum distance to a“centroid object’’chosen for each cluster(k-center)[11],minimizing the average distance to such a centroid object(k-median)[13],minimizing the average squared distance to an arbitrary centroid point(k-means)[14],and maximizing the sum of distances between pairs of objects in different clusters(maximum k-cut)[17].These objectives interpret the distance between points as a measure of their dissimilarity:the larger the distance,the more dissimilar the objects.Another line of clustering algorithms interprets the distance or weights be-tween pairs of points as a measure of their similarity:the larger the weight,the more similar the objects.In this case, the typical objective is to?nd a k-clustering that minimizes the sum of weights between pairs of objects in different clusters(minimum k-cut)[17].All of these clustering problems are parameterized by the desired number k of clusters. Without such a restriction,these clustering objective functions would be optimized when k=n(every object is in a separate cluster)or when k=1(all objects belong to a single cluster).

In the correlation-clustering problem introduced by Bansal et al.[1],the underlying model is that objects can be truly categorized,and we are given probabilities about pairs of objects belonging to common categories.For example, the multiset of objects might consist of all authors of English literature,and two authors belong to the same category if they correspond to the same real person.This task would be easy if authors published papers consistently under the same name.However,some authors might publish under several different names such as William Shakespeare, W.Shakespeare,Bill Shakespeare,Sir Francis Bacon,Edward de Vere,and Queen Elizabeth I.2Given some con?dence about the similarity and dissimilarity of the names,our goal is to cluster the objects to maximize the probability of correctness.There are two main objective functions for this problem:minimizing disagreements and maximizing agreements between the input estimates and the output clustering.The decision versions of these two optimization problems are identical and known to be NP-complete[1].However,they differ in their approximability.

As we consider both similarity and dissimilarity measures in our problem,it is in a sense a generalization of the typical clustering objectives mentioned above.In fact,an appropriate interpretation of our problem instance suggests that our objective is a combination of the minimum k-clustering sum and minimum k-cut clustering objectives.An interesting property of our problem is that the number k of clusters is no longer a parameter of the input;there is some “ideal’’k which the algorithm must?nd.Another clustering problem with this property is location area design,a problem arising in cell phone network design.As formulated by Bejerano et al.[2],this problem attempts to minimize the sum of the squared sizes of the clusters plus the weight of the cut induced by the clustering.The similarities between these two problems allow us to apply many of the same techniques.

1.1.Our contributions

In this paper,we focus on the goal of minimizing disagreements in(weighted)general graphs.We give an O(log n)-approximation algorithm for weighted general graphs and an O(r3)-approximation algorithm for weighted graphs excluding the complete bipartite graph K r,r as a minor(e.g.,graphs embeddable in surfaces of genus (r2)).Our O(log n)approximation is based on rounding a linear program using the region growing technique introduced in the seminal paper of Leighton and Rao[17]on multicommodity max-?ow min-cut https://www.sodocs.net/doc/23786728.html,ing ideas developed in Bejerano et al.[2],we are able to prove that this rounding technique yields a good approximation.Their paper also inspires our O(r3)approximation which uses a rounding technique introduced by Tardos and Vazirani[27]in a paper on max-?ow min-multicut ratio and based on a lemma of Klein et al.[16].We further prove that the gap in the linear program can be (log n),and therefore our bounds are tight for any algorithm based on rounding this linear program.We also prove that our problem is APX-hard because it is equivalent to the APX-hard problem of ?nding a minimum multicut[5],for which the current best approximation is O(log k)for a certain parameter k[9]. Any o(log n)-approximation algorithm for our problem would require improving the state-of-the-art for approximating minimum multicut.This equivalence also implies that any O(log k)algorithm for minimum multicut yields an O(log n) for correlation clustering.Our results are summarized in Table1.

We remark that even for unweighted complete graphs,our(more general)algorithm has a better approximation ratio than the constant-factor approximation algorithm of[1]for n 1081≈2270,a rough upper bound on the number of

2Some scholars believe William Shakespeare was actually a pen name for one of the wealthier and more renowned contemporaries of the Elizabethan https://www.sodocs.net/doc/23786728.html,mon candidates for the true authorship of Shakespeare’s works include Sir Francis Bacon,Edward de Vere,and even,on occasion,Queen Elizabeth I[23].

174 E.D.Demaine et al./Theoretical Computer Science361(2006)172–187

Table1

Our contributions

Problem class Approximation Hardness Equivalence Unweighted complete graphs4log2n

Unweighted general graphs O(log n)APX-hard Unweighted multicut Weighted general graphs O(log n)APX-hard Weighted multicut Table2

Previous results on minimizing disagreements from[1]

Problem class Approximation Hardness

Unweighted complete graphs c≈20,000NP-hard

Unweighted general graphs Open NP-hard

Weighted general graphs Open APX-hard

particles in the known universe.This is because their constant factor is approximately20,000,whereas our approximation factor is smaller than4log2n.

Most of our results were obtained simultaneously and independently by Charikar et al.[3].The results of their paper are detailed in the next section.

1.2.Related work

The only previous work on the correlation-clustering problem is that of Bansal et al.[1].Their paper considers correlation clustering in an unweighted complete graph,i.e.,every pair of objects has an estimate of either“similar’’or “dissimilar’’.For minimizing disagreements,they give a constant-factor approximation via a combinatorial algorithm. For maximizing agreements,they give a polynomial-time approximation scheme(PTAS).The limitation of these two algorithms is that they assume the input graph is complete,while in many applications,estimate information is incomplete.Table2summarizes the results of their paper.

The constant-factor algorithm of Bansal et al.[1]for unweighted complete graphs is combinatorial.It iteratively “cleans’’clusters until every cluster C is -clean(i.e.,for every vertex v∈C,v has at least(1? )|C|plus neighbors in C and at most |C|plus neighbors outside C).They bound the approximation factor of their algorithm by counting the number of“bad’’triangles(triangles with two + edges and one ? edge)in a -clean clustering and use the existence of these bad triangles to lower bound https://www.sodocs.net/doc/23786728.html,plete graphs have many triangles,and the counting arguments for counting bad triangles rely heavily on this fact.When we generalize the problem to graphs that are not necessarily complete,bad triangles no longer form a good lower bound on OPT.It may be possible to?nd a combinatorial algorithm for this problem that bounds the approximation factor by counting bad cycles(cycles with exactly one ? edge).However,in this paper,we formulate the problem as a linear program and use the region-growing technique to round it.

Simultaneously,and independent of our work,Charikar et al.[3]gave several algorithmic and hardness results for the correlation clustering problem.They studied both the minimization and maximization variants of the problem.For minimizing disagreements,they also formulated the problem as a linear program and used the region-growing technique to round it.For complete graphs,they prove that this approach gives a factor4approximation.For general graphs, their O(log n)approximation is identical to ours.For both results,they investigate the integrality gap of the linear-programming formulation,?nding examples with gaps of2and (log n),respectively.For maximizing agreements, the authors give a factor0.7664approximation in general graphs.As for hardness results,they prove APX-hardness of minimizing disagreements in complete graphs and maximizing agreements in general graphs.They also note that minimizing disagreements in general graphs is as hard as the minimum multicut problem.

Independent of Charikar et al.[3],the paper of Swamy[26]gives a0.7666-approximation algorithm for the problem of maximizing agreements in general graphs.Recently,Charikar and Wirth[4]studied the problem of maximizing the correlation,or weight of agreements minus disagreements,of a clustering.Their paper details an (1/log n) approximation for this problem.

E.D.Demaine et al./Theoretical Computer Science 361(2006)172–187175

1.3.Paper structure

The rest of the paper is organized as follows.Section 2formalizes the correlation-clustering problem and some necessary notation.Section 3presents explicit approximation algorithms for the problem in general and K r,r -minor-free graphs.Section 4proves that the problem is equivalent to the minimum multicut problem,thus yielding an APX-hardness result and another implicit O (log n)algorithm.Section 5discusses some applications of the correlation-clustering problem formulation.We conclude with open problems in Section 6.

2.Preliminaries

2.1.Problem de?nition

Bansal et al.[1]present the following clustering problem.We are given a graph on n vertices,where every edge (u,v)is labelled either + or ? according to whether u and v have been deemed to be similar or dissimilar.In addition,each edge has a weight c e ∈[0,∞)which can be interpreted as a con?dence measure of the similarity or dissimilarity of the edge’s endpoints (higher weight denotes higher con?dence).3The goal is to produce a partition of the vertices (a clustering)that agrees as much as possible with the edge labels (see,for example,Fig.1).More precisely,we want a clustering that maximizes the weight of agreements:the weight of + edges within clusters plus the weight of ? edges between clusters.Alternatively,the clustering should minimize the weight of disagreements:the weight of ? edges inside clusters plus the weight of + edges between clusters.Although the optimal solutions to these two problems are the same,they differ in their approximability.

In this paper,we focus on the problem of minimizing disagreements.In the rest of this paper when we refer to the “correlation-clustering problem’’or “the clustering problem’’we mean the problem of minimizing disagreements.We also say “positive edge’’when referring to an edge labelled + and “negative edge’’when referring to an edge labelled ? .Note that the terminology “positive’’and “negative’’here refers to the edge label and not the weight;edge weights are always nonnegative regardless of the label.

2.2.Notation

Let G =(V ,E)be a graph on n vertices with edge weights c e 0.Let e(u,v)denote the label ( + , ? )of the edge (u,v).Let E + be the set of positive edges and let G + be the graph induced by E + ,E + ={(u,v)|e(u,v)= + },G + =(V ,E + ).Likewise we de?ne E ? and G ? for negative edges.

In general,for a clustering S ={S 1,...,S k },let C(v)be the set of vertices in the same cluster as v .We call an edge (u,v)a positive mistake if e(u,v)= + and yet u /∈C(v).We call an edge (u,v)a negative mistake if e(u,v)= ? and u ∈C(v).The number of mistakes of a clustering S is the sum of positive and negative mistakes.The weight of the clustering is the sum of the weights of mistaken edges in S .We introduce the following notation for the weight of a clustering S :

w(S )=w p (S )+w m (S ),

w p (S )= {c e :e =(u,v)∈E + ,u /∈C(v)},

w m (S )= {c e :e =(u,v)∈E ? ,u ∈C(v)}.

We refer to the optimal clustering as OPT and its weight or cost as w(OPT ).For a general set of edges T ?E ,we de?ne the weight of T to be the sum of the weights in T ,w(T )= e ∈T c e .For a graph G =(V ,E)and a set of edges T ?E we de?ne the graph G \T to be the graph (V ,E \T ).

3For example,if there is a function f (u,v)that outputs the probability of u and v being similar,then a natural assignment of weight to edge e =(u,v)is c e =|log f (u,v)/(1?f (u,v))|[1].

176 E.D.Demaine et al./Theoretical Computer Science 361(2006)172–187

Graph G

Clustering G’

Graph H

Clustering H’<+> Labeled edge

<-> Labeled edge

Fig.1.Two clustering examples for unweighted and case we give an optimal clustering with two errors:one error on an edge labelled + and one error on an edge labelled ? .For the weighted case,we get a different optimal clustering with three errors on + edges and total weight 5.

3.O (log n)approximation

In this section,we use a linear-programming formulation of this problem to design an approximation algorithm.The algorithm ?rst solves the linear program.The resulting fractional values are interpreted as distances between vertices;close vertices are most likely similar,far vertices are most likely dissimilar.The algorithm then uses region-growing techniques to group close vertices and thus round the fractional https://www.sodocs.net/doc/23786728.html,ing ideas from Bejerano et al.[2],we are able to show that this approach yields an O (log n)approximation.We also present a modi?cation to this approach that yields an O (r 3)approximation for K r,r -minor-free graphs.

3.1.Linear-programming formulation

Consider assigning a zero-one variable x uv to each pair of vertices (hence x uv =x vu ).When (u,v)∈E ,we sometimes write x uv as x e where it is understood that e =(u,v).Given a clustering,set x uv =0if u and v are in a common cluster,and x uv =1otherwise.To express w(S )in this notation,note that 1?x e is 1if edge e is within a cluster and 0if edge e is between clusters.Thus

w(S )=

e ∈E ? c e (1?x e )+

e ∈E + c e x e .

Our goal is to ?nd a valid assignment of x uv ’s to minimize this cost.An assignment of x uv ’s is valid (corresponds to a clustering)if x uv ∈{0,1}and the x uv ’s satisfy the triangle inequality.

E.D.Demaine et al./Theoretical Computer Science 361(2006)172–187177

We relax this integer program to the following linear program:

minimize

e ∈E ?

c e (1?x e )+ e ∈E +

c e x e subject to x uv ∈[0,1],x uv +x vw x uw ,

x uv =x vu .

We will round this linear program to get an O (log n)approximation for our problem.This is in fact the best approximation factor that we can hope for with this linear program because it has an (log n )integrality gap.

Theorem 3.1.The integrality gap of the above linear program is (log n)in the worst case .

Proof.Recall that a d -regular graph G =(V ,E)is a (d,c)-expander if,for every subset of vertices U of size at most n/2,|N(U)| c |U |where N(U)is the set of vertices in V \U adjacent to vertices in U .Consider a (d,c)-expander G for some constants c and d ,whose existence is shown in e.g.[22].To construct our example G ,set the weight of every edge in G to 1and its label to + .For each vertex v in G ,add an edge of weight nd/2to all neighbors u whose distance from v is at least (log d (n/2)?1)/2.Set the label of these edges to ? .Note that since there are at most nd/2edges in G ,the weight of a single ? edge is at least the total weight of all + edges.Therefore,the optimal clustering S must cut all ? edges,and so the diameters of the resulting clusters must be at most log d (n/2)?1.Because the vertices of G have bounded degree d ,the size of the cluster of diameter r is bounded by d r +1.Therefore,the clusters of the optimal clustering have size at most n/2.By the expansion property of G ,the number of + edges we must cut is at least (c S ∈S |S |)/2=(cn)/2,and so the weight of the optimal clustering is (n).On the other hand,assigning x e =2/(log d (n/2)?1)for + edges and x e =1for ? edges is a feasible fractional solution of value at most (dn/2)×(2/(log d (n/2)?1)),and so the weight of the optimal fractional solution is O (n/log n).The theorem follows.

3.2.Rounding in general graphs

3.2.1.Region growing

We iteratively grow balls of at most some ?xed radius (computed according to the fractional x uv values)around nodes of the graph until all nodes are included in some ball.These balls de?ne the clusters in the ?nal approximate solution.As high x uv values hint that u and v should be in separate clusters,this approach seems plausible.The ?xed radius guarantees an approximation ratio on disagreements within clusters while the region-growing technique itself guarantees an approximation ratio on disagreements between clusters.

First we present some notation that we need to de?ne the algorithm.A ball B(u,r)of radius r around node u consists of all nodes v such that x uv r ,the subgraph induced by these nodes,and the fraction (r ?x uv )/x vw of edges (v,w)with only endpoint v ∈B(u,r).The cut of a set S of nodes,denoted by cut (S),is the weight of the positive edges with exactly one endpoint in S ,i.e.,

cut (S)=

|{v,w }∩S |=1,(v,w)∈E +

c vw .The cut of a ball is the cut induce

d by th

e set o

f vertices included in the ball.The volume of a set S of nodes,denoted by vol (S),is the weighted distance of the edges with both endpoints in S ,i.e.,

vol (S)= {v,w }?S,(v,w)∈E +

c vw x vw .

Finally,the volume of a ball is the volume of B(u,r)including the fractional weighted distance of positive edges leaving B(u,r).In other words,if (v,w)∈E + is a cut positive edge of ball B(u,r)with v ∈B(u,r)and w /∈B(u,r),then (v,w)contributes c vw ·(r ?x uv )weight to the volume of ball B(u,r).For technical reasons,we also include an initial volume I to the volume of every ball (i.e.,ball B(u,0)has volume I ).

178 E.D.Demaine et al./Theoretical Computer Science361(2006)172–187

3.2.2.Algorithm

We can now present the algorithm for rounding a fractional solution FRAC to an integral solution SOL.Suppose the volume of the entire graph is F,and thus w p(FRAC)=F.Let the initial volume I of the balls de?ned in the algorithm be F/n,and let c be some constant which we determine later.

Algorithm Round

(1)Pick any node u in G.

(2)Initialize r to0.

(3)Grow r by min{(d uv?r)>0:v/∈B(u,r)}so that B(u,r)includes another entire edge,and repeat until

cut(B(u,r)) c ln(n+1)×vol(B(u,r)).

(4)Output the vertices in B(u,r)as one of the clusters in S.

(5)Remove vertices in B(u,r)(and incident edges)from G.

(6)Repeat Steps1–5until G is empty.

This algorithm clearly runs in polynomial time and terminates with a solution that satis?es the constraints.We must show that the resulting cost is not much more than the original fractional cost.Throughout the analysis section,we refer to the optimal integral solution as OPT.We also use FRAC(x uv)and SOL(x uv)to denote the fractional and rounded solution of the variable x uv in the linear program.

3.2.3.Positive edges

The termination condition on the region-growing procedure guarantees an O(log n)approximation to the cost of positive edges(between clusters).Let B be the set of balls found by our algorithm in step3.Then, w p(SOL)=

(u,v)∈E +

c uv SOL(x uv)

=1

2

B∈B

cut(B)

c

2ln(n+1)×

B∈B

vol(B)

c

2ln(n+1)×

(u,v)∈E +

c uv FRAC(x uv)+

B∈B

F

n

c

2ln(n+1)×

w p(FRAC)+F

c ln(n+1)×w p(FRAC),

where the fourth line follows from the fact that the balls found by the algorithm are disjoint.

The rest of our analysis hinges on the fact that the balls returned by this algorithm have radius at most1/c.This fact follows from the following known lemma[28]:

Lemma3.2.For any vertex u and any family of balls B(u,r),the condition cut(B(u,r)) c ln(n+1)×vol(B(u,r)) is achieved for some r 1/c.

3.2.

4.Negative edges

As in Bejerano et al.[2],we can use this radius guarantee to bound the remaining component of our objective function.We see that our solution gives a c/(c?2)-approximation to the cost of negative edges(within clusters). Again,let B be the set of balls found by our algorithm in step3.Then,

w m(FRAC)=

(u,v)∈E ?

c uv(1?FRAC(x uv))

B∈B

(u,v)∈B∩E ?

c uv(1?FRAC(x uv))

E.D.Demaine et al./Theoretical Computer Science 361(2006)172–187179

B ∈B

(u,v)∈B ∩E ? c uv (1?2/c)

(1?2/c)

B ∈B (u,v)∈B ∩E ? c uv =c ?2c

w m (SOL ),where the third line follows from the triangle inequality and the 1/c bound on the radius of the balls.The approximation ratio c/(c ?2)is O (1)provided c >2.

3.2.5.Overall approximation

Combining these results,we pay a total of

w(SOL )=w p (SOL )+w m (SOL ) c ln (n +1)×w p (OPT )+ c

c ?2 ×w m (OPT )

max c ln (n +1),c c ?2

w(OPT )and thus we have an O (ln n)approximation,where the lead constant,c ,is just slightly larger than 2.

3.3.Rounding in K r,r -minor-free graphs

In K r,r -minor-free graphs,we can use a theorem of Klein et al.[16]to round our linear program in a way that guarantees an O (r 3)approximation to the cost of disagreements between clusters.The clusters produced by this rounding have radius at most 1/c ,and thus the rest of the results from the previous section follow trivially.The theorem states that,in graphs with unit-length edges,there is an algorithm to ?nd a “small’’cut such that the remaining graph has “small’’connected components:

Theorem 3.3(Klein et al.[16]).In a graph G with weight u on the edges which satisfy the triangle inequality ,one can ?nd in polynomial time either a K r,r minor or an edge cut of weight O (rU/ )whose removal yields connected components of weak diameter 4O (r 2 )where U is the total weight of all edges in G .

As in the case of the region-growing technique,this theorem allows us to cluster the graph cheaply,subject to some radius guarantee.As this clustering cost is independent of n ,this technique is typically applied in place of the region-growing technique to get better approximations for K r,r -minor-free graphs.(see,for example,Tardos and Vazirani

[27]or Bejerano et al.[2])In particular,this implies constant-factor approximations for various clustering problems on planar graphs.

The idea is as follows.Given a K r,r -minor-free graph G with weights c e and edge lengths x e as de?ned by the linear program,we subdivide each edge e ∈E + (G)into a chain of kx e edges of the same weight c e for some appropriate k ,yielding a new graph G .(Note that we do not include the negative edges of G in G .)We apply Theorem 3.3to G ,getting an edge cut F which maps to an edge cut F in G of at most the same weight.This creates the natural correspondence between the resulting components of G and G .Note two nodes at distance d in G are at distance at least kd in G .Hence,if we take =O (k/(cr 2)),for a suf?cient setting of the constants the components in G will have diameter at most 2/c .

For any constant c >2,we can bound the weight of the negative mistakes as in Section 3.2by (c/(c ?2))w m (FRAC ).The weight of the positive mistakes is simply the weight of the cut F ,and so,by Theorem 3.3,we just need to bound the total weight U of the graph G .Let U = e ∈E + (G)c e be the total weight of positive edges in G and 4The weak diameter of a connected component in a modi?ed graph is the maximum distance between two vertices in that connected component as measured in the original graph.For our purposes,distances are computed according to the x u,v which satisfy the triangle inequality and are de?ned on all pairs of vertices,so the weak diameter equals the diameter.

180 E.D.Demaine et al./Theoretical Computer Science361(2006)172–187

recall vol(G)=

e∈E + (G)

c e x e.Then

U = e∈G

c e

=

e∈E + (G)

kx e c e

e∈E + (G)

(kx e+1)c e

=k vol(G)+U.

By Theorem3.3,the weight of F,which equals the weight of F ,is O(rU / )=O(r3(k vol(G)+U)/k).Taking k=U/vol(G),this becomes O(r3vol(G))and is thus an O(r3)approximation of the cost of disagreements between clusters,as desired.The size of G may be pseudo-polynomial in the size of G.However,the algorithm of Klein et al.

[16]consists of r breath-?rst searches of G ,and these can be implemented without explicitly subdividing G.Thus,the algorithm runs in polynomial time.

4.Relation to multicut

In this section,we prove that the correlation-clustering problem is equivalent to the multicut problem,thus yielding another O(log n)approximation for our problem as well as some hardness results.

De?nition4.1.The weighted multicut problem is the following:given an undirected graph G,a weight function w on the edges of G,and a collection of k pairs of distinct vertices(s i,t i)of G,?nd a minimum weight set of edges of G whose removal disconnects every s i from the corresponding t i.

The multicut problem was?rst stated by Hu in1963[12].For k=1,the problem coincides,of course,with the ordinary min-cut problem.For k=2,it can be also solved in polynomial time by two applications of a max?ow algorithm[31].The problem was proven NP-hard and APX-hard for any k 3in by Dahlhaus et al.[5].The best known approximation ratio for weighted multicut in general graphs is O(log k)[9].For planar graphs,Tardos and Vazirani[27]give an approximate max-?ow min-cut theorem and an algorithm with a constant approximation ratio. For trees,Garg et al.[10]give an algorithm with an approximation ratio of2.

4.1.Reduction from correlation clustering to weighted multicut

We now show that?nding an optimal clustering is equivalent to?nding a minimum-weight edge set that covers all erroneous cycles.An erroneous cycle is a simple cycle containing exactly one negative edge.An edge covers a cycle if the edge is contained in the cycle,i.e.,removing the edge breaks the cycle.Fig.2illustrates the equivalence. Guided by this observation,we de?ne a multicut problem derived from our original graph by replacing the negative edges with source-sink pairs(and some other required changes).We show that a solution to the newly formed multicut problem induces a solution to the clustering problem with the same weight,and that optimal solution to the multicut problem induces an optimal solution to the clustering problem.

These reductions imply that the O(log k)-approximation algorithm for the multicut problem[9]induces an O(log n)-approximation algorithm for the correlation-clustering problem.We prove this for weighted general graphs,which implies the same result for unweighted general graphs.We start by proving two simple lemmata.We call a clustering a consistent clustering if it contains no mistakes,i.e.,if there are no cut positive edges or uncut negative edges.

Lemma4.2.A graph contains no erroneous cycles if and only if it has a consistent clustering.

Proof.Let G be a graph with no erroneous cycles,and let S be the set of connected components of G + .We argue that S is a consistent clustering.Assume that(u,v)is a negative mistake in S,i.e.,e(u,v)= ? and u∈C(v).Because

E.D.Demaine et al./Theoretical Computer Science 361(2006)172–187

181

Original Graph G

Clustering A Clustering B

<+> Labeled edge

<-> Labeled edge

Fig.2.Two optimal clusterings for G .For both of these clusterings we have removed two edges (different edges)so as to eliminate all the erroneous cycles in G .After the edges were removed every connected component of G + is a cluster.Note that the two clusterings are consistent;no positive edges connect two clusters and no negative edges connect vertices within the same cluster.

u and v belong to the same cluster,they are on the same connected component of G + .It follows that there is a simple path of positive edges from u to v .This path,combined with the negative edge (u,v),creates an erroneous cycle.From the construction of S (the connected components of G + )it is obvious that S contains no positive mistakes either,and so S is a consistent clustering.

Let S be a consistent clustering on a graph G .We claim that G contains no erroneous cycles.Assume (v 1,v 2,...,v k )is an erroneous cycle of size k .Without loss of generality,we let (v 1,v 2)be the negative edge in the cycle.If v 1∈C(v 2)then we have a negative mistake which means S is not consistent.If v 1/∈C(v 2)then there must exist a point 1 i k ?1along the cycle such that v i /∈C(v i +1)which in turn means that the edge (v i ,v i +1)is a positive mistake.

Lemma 4.3.Let F be a set of edges of minimum weight such that G \F contains no erroneous cycles and T be a set of edges on which an optimal clustering makes mistakes.Then the weight of F equals the weight of T.

Proof.Let S be a clustering on G ,and let T be the set of mistakes made by this clustering.Then S is a consistent clustering on G \T ,and so the result follows from Lemma 4.2.

We give a reduction from the problem of correlation clustering to the weighted multicut problem.The reduction translates an instance of unweighted correlation clustering into an instance of unweighted graph multicut,and an instance of weighted correlation clustering into an instance of weighted graph multicut.Refer to Fig.3for an example.Given a weighted graph G whose edges are labelled { + , ? },we construct a new graph H G and a collection of source-sink pairs S G ={ s i ,t i }as follows:?For every negative edge (u,v)∈E ? we introduce a new vertex v u ,v ,a new edge (v u ,v ,u)with weight equal to that of (u,v),and a source-sink pair v u ,v ,v .

?Let V new denote the set of new vertices,E new ,the set of new edges,and S G ,the set of source-sink pairs.Let V =V ∪V new ,E =E + ∪E new ,H G =(V ,E ).The weight of the edges in E + remains unchanged.We now have a multicut problem on (H G ,S G ).

We claim that any solution to the multicut problem implies a solution to the correlation-clustering problem with the exact same value,and that an approximate solution to the former gives an approximate solution to the latter.

182 E.D.Demaine et al./Theoretical Computer Science 361(2006)172–187

Transformed graph H

S <+> Labeled edge

Source-sink pair

New Vertices

Fig.3.The original graph from Fig.2after the transformation.

Theorem 4.4.(H G ,S G )has a cut of weight W if and only if G has a clustering of weight W ,and we can easily construct one from the other.In particular ,the optimal clustering in G of weight W implies an optimal multicut in (H G ,S G )of weight W and vice versa .

The proof follows from the following two lemmas:

Lemma 4.5.Let S be a clustering on G with weight W .Then there exists a multicut T in (H G ,S G )with weight W .Proof.Let S be a clustering of G with weight W ,where T is the set of mistakes made by S (w(T )=W ).Let T =(T ∩G + )∪{(v u ,v ,u)|(u,v)∈T ∩G ? }.Note that w(T )=w(T ).We now argue that T is a multicut.Assume not;then there exists a pair (v u ,v ,v)∈S G and a path from v u ,v to u that contains no edge from T .By construction,this implies that there exists a path from u to v in G + \T which,together with the negative edge (u,v),forms an erroneous cycle in G \T .This is a contradiction since,by de?nition of T ,G \T has a clustering with no mistakes.Note that the proof is constructive.

Lemma 4.6.If T is a multicut in (H G ,S G )of weight W ,then there exists a clustering S in G of weight W .Proof.We construct a set T from the cut T by replacing all edges in T that are in E new with the corresponding negative edges in G ,and de?ne a clustering S by taking every connected component of G + \T as a cluster.Note that T has the same total weight as T .We argue that T is precisely the set of mistakes of S ,and thus the weight of S is W .In other words,it suf?ces to prove that S is a consistent clustering on G \T .Assume not;then there exists an erroneous cycle in G \T (Lemma 4.2).Let (u,v)be the negative edge along this cycle.Then it follows that there is a path from v u ,v to v in H G which does not traverse any edge in T .But the pair v u,v ,v is a source-sink pair which is in contradiction to T being a multicut.

We have presented a mapping f from an instance x of the correlation-clustering problem to an instance f (x)of the multicut problem.We have also presented a mapping g from a feasible solution of f (x)(multicut problem)to a feasible solution of x (the clustering problem)such that the value of the solution does not change.This means that if s is a solution to the multicut problem f (x)such that w(s)=O ( ·w(OPT (Multicut )))then g(s)is a solution to the clustering problem such that w(g(s))= ·w(OPT (Clustering )).We can now use the approximation algorithm of [9]to get an O (log k)approximation solution to the multicut problem (k is the number of source-sink pairs)which translates into an O (log |E ? |) O (log n 2)=O (log n)solution to the clustering problem.Note that this result holds for both weighted and unweighted graphs and that the reduction of the unweighted correlation-clustering problem results in a multicut problem with unit capacities and demands.

E.D.Demaine et al./Theoretical Computer Science 361(2006)172–187183

4.2.Reduction from multicut to correlation clustering

In the previous section,we showed that every correlation-clustering problem can be presented (and approximately solved)as a multicut problem.We now show that the opposite is true as well,that every instance of the multicut problem can be transformed to an instance of a correlation-clustering problem,and that transformation has the following properties:any solution to the correlation-clustering problem induces a solution to the multicut problem with lower or equal weight,and an optimal solution to the correlation-clustering problem induces an optimal solution to the multicut problem.

In the previous section,we could use one reduction for the weighted version and the unweighted version.Here,we present two slightly different reductions from unweighted multicut to unweighted correlation clustering and from weighted multicut to weighted correlation clustering.

4.2.1.Reduction from weighted multicut to weighted correlation clustering

Given a multicut problem instance:an undirected graph H ,a weight function w on the edges of H ,w :E →R +,and a collection of k pairs of distinct vertices S ={ s i ,t i ,..., s k ,t k )}of H ,we construct a correlation-clustering problem as follows:

?We start with G H =H ,all edge weights are preserved and all edges labelled + .?In addition,for every source-sink pair s i ,t i we add to G H a negative edge e i =(s i ,t i )with weight w(e i )=

e ∈H w(e)+1.

Our transformation is polynomial,adds at most O (n 2)edges,and increases the largest weight in the graph by a multiplicative factor of at most O (n 2).

Theorem 4.7.A clustering on G H with weight W induces a multicut on (H,S)with weight at most W .Similarly ,a multicut in (H,S)with weight W induces a clustering on G H with weight W .

Proof.Let S be a clustering on G H of weight W .If S contains no negative mistakes,then the set of positive mistakes T is a multicut on H of weight W .If S contains a negative mistake,say (u,v),take one of the endpoints (u or v )and place it in a cluster of its own,thus eliminating this mistake.Because every negative edge has weight at least the sum of all positive edges,the gain by splitting the cluster exceeds the loss introduced by new positive mistakes.Therefore,the new clustering S on G has weight W

Now,let T denote a multicut in (H,S)of weight W and de?ne clustering S on G H as the connected components of G + \T .Note that S contains no negative mistakes and so has weight w(T )=W .

Corollary 4.8.An optimal clustering in G H induces an optimal multicut in (H,S).

Proof.This follows directly from the statements in the above lemma.

4.2.2.Reduction from unweighted multicut to unweighted correlation clustering

Given an unweighted multicut problem instance:an undirected graph H and a collection of k pairs of distinct vertices S ={ s i ,t i ,..., s k ,t k }of H ,we construct an unweighted correlation-clustering problem as follows;refer to Fig.4.?For every v such that v,u ∈S or u,v ∈S (i.e.,v is either a source or a sink),we add n ?1new vertices and connect those vertices and v in a clique with positive edges (weight 1).We denote this clique by Q v .

?For every pair s i ,t i ∈S ,we connect all vertices of Q s i to t i and all vertices of Q t i to s i using edges labelled ? .?Other vertices of H are added to the vertex set of G H ,and edges of H are added to the edge set of G H and labelled + .

Our goal is to emulate the previous argument for weighted general graphs in the context of unweighted graphs.We do so by replacing the single edge of high weight with many unweighted negative edges.Our transformation needs polynomial time,and it adds at most n 2vertices and at most O (n 3)edges.

Lemma 4.9.Given a clustering C on G ,we can construct another clustering C on G such that C is pure and w(C ) w(C).

184 E.D.Demaine et al./Theoretical Computer Science 361(2006)172–187

Original Multicut problem

Transformed graph G 1

S T

Fig.4.Transformation from the unit-capacity multicut problem (on the left)to the unweighted correlation-clustering problem (on the right).Proof.For every Q v that is split amongst two or more cluster we take all vertices of Q v to form a new cluster.By doing so,we may be adding up to n ?1new mistakes,(positive mistakes,positive edges adjacent to v in original graph).Merging these vertices into one cluster component reduces the number of errors by at least n ?1.

If two Q v and Q w are in the same cluster component,we can move one of them into a cluster of its own.As before,we may be introducing as many as n ?1new positive mistakes but simultaneously eliminating 2n negative mistakes.

Theorem 4.10.A clustering on G H with weight W induces a multicut on (H,S)with weight W .An optimal clustering in G of weight W induces an optimal multicut for (H,S)of weight W .

Proof.We call a clustering pure if all vertices that belong to the same Q v are in the same cluster,and that if v,w ∈S then Q v and Q w are in different clusters.The following proposition states that we can “?x’’any clustering to be a pure clustering without increasing its weight.Given a clustering C on G H we ?rst “?x’’it using the technique of Lemma 4.9to obtain a pure clustering C .Any mistake for pure clustering must be a positive mistake,the only negative edges are between clusters.

Let T be the set of positive mistakes for C ,we now show that T is a multicut on (H,S).No source-sink pair are in the same cluster because the clustering is pure and removing the edges of T disconnects every source/sink pair.Thus,T is a multicut for (H,S).

Let OPT be the optimal clustering on G .Without loss of generality,we can assume that OPT is pure (otherwise,by Lemma 4.9,we can construct another clustering with no higher cost)and therefore induces a multicut on (H,S).Let T denote the minimum multicut in (H,S).T induces a pure-clustering on G as follows:take the connected component of G + \T as clusters and for every terminal v ∈S add every node in Q v to the cluster containing vertices v .It can be easily seen that this gives a pure clustering,and that the only mistakes on the clustering are the edges in T .The result follows.

4.3.Remarks

The two-way reduction we just presented proves that the correlation-clustering problem and the multicut problem are essentially identical.This reduction also allows us to transfer hardness-of-approximation results from one problem to the other.Because the multicut problem is APX-hard and remains APX-hard even in the unweighted case,the unweighted correlation-clustering problem is also APX-hard.

E.D.Demaine et al./Theoretical Computer Science361(2006)172–187185 An interesting observation comes from the constant-factor approximation for the correlation clustering problem on the unweighted complete graph from[1].This result implies that the corresponding unweighted multicut problem,in which every pair u,v of nodes is either an edge or a source-sink pair,has a constant-factor approximation.

On the other hand,correlation-clustering problems where G + is a planar graph or has a tree structure has a constant-factor approximation(as follows from[27,10]).

https://www.sodocs.net/doc/23786728.html,e of correlation clustering

In this section,we motivate correlation clustering by describing several clustering situations where it is useful. 5.1.When we have attraction/rejection data

The correlation-clustering approach gives us a natural and unique way to solve problems that have more than one distance measure or problems that need to balance between two possible contradicting measures.For example,suppose we have a set of guests,each of whom has preferences for people they would like to sit with and for people they would prefer to avoid.We must group the guests into tables in a way that enhances amicability of the https://www.sodocs.net/doc/23786728.html,ing the correlation-clustering setting we can solve this problem and get a provable approximation bound.

5.2.As a way of solving constraint-clustering problems

There has been a great deal of work on solving clustering problems subject to constraints[18,29,30,15].In a constraint clustering problem we have the following inputs:

(1)A set of objects,a distance measure between them,and an objective function to be minimized(the traditional

clustering setting).

(2)A set of pairwise constraints among the objects.

One seeks a clustering that minimizes the objective function and satis?es the set of constraints simultaneously.One common approach to address constraint-clustering problems is to use a clustering/search algorithm(e.g.,k-means[18], k-median,hierarchical clustering)in the space of all feasible clusterings(those respecting the constraint).This approach treats the constraints as hard and consistent(must and can be simultaneously satis?ed),and assumes that one can search through the space of feasible constraints[29,30,15].

We suggest a different approach:instead of treating a constraint-clustering problem as an unconstrained clustering problem in a trimmed search space,we look at the problem as a soft constraint satisfaction problem.We convert the distance between any two items into a constraint,combine this set of constraints with the original set of constraints and solve the resulting constraint satisfaction problem.Restructuring the problem as a constant satisfaction problem allows us to use the correlation-clustering algorithm and get a provable approximation result as opposed to the search heuristics used in previous constrained clustering algorithms.

5.3.As a way to determine the number of clusters

Traditional clustering methods use positive distance or similarity measures and seek to minimize the distance or maximize the similarity of elements within the same cluster.To avoid the problem of converging to the trivial solutions of putting all elements in the same cluster(maximize similarity)or putting each element in a cluster of his own, (minimize distances within a cluster),one usually limits the number of clusters.(e.g.,k-means,k-median,etc.). Correlation clustering suggests a different approach.Given a distance measure,we set a threshold,where all distances below the threshold represent attraction(the elements are close,they should be in the same cluster)and distance above the threshold represents rejection(the elements are far apart,they should not be in the same cluster).We create a graph were each item is a node and between any two elements we have an edge.We label these edges + if the distance between the elements is below the threshold and ? if it is above the threshold.We set the weight of the edge according to the original distance between the elements;small distances become a positive labelled edges with large weight,whereas large distance become negative labelled edges of large weight.We now solve the correlation-clustering problem on the resulting graph.

186 E.D.Demaine et al./Theoretical Computer Science361(2006)172–187

One advantage of this approach is that choosing a threshold can be much more meaningful then choosing the number of clusters.The threshold is a property of the distance measure;it is just our observation of“near’’and“far’’.It is independent of the actual data,and therefore,unlike an arbitrary number of clusters,k,may have a“real’’meaning. Another possible advantage of this approach is that the correlation-clustering algorithm can also handle outliers easily. Outliers will usually get a cluster of their own,and when the algorithm ends we can simply discard small isolated clusters.

5.4.When we have only partial pairwise measures

Most k-based algorithms assume points are positioned in a metric space,or at least that we have a known weight for every pair of items.This is usually not the case in real-world problems.When using the correlation-clustering framework we can work with any subset of pairwise distance measure and just omit the unknown relations from the graph.

6.Conclusion

In this paper,we have investigated the problem of minimizing disagreements in the correlation-clustering problem. We gave an O(log n)approximation for general graphs,an O(r3)approximation for K r,r-minor-free graphs,and showed that the natural linear-programming formulation upon which these algorithms are based has a gap of (log n)in general graphs.We also showed that this problem is equivalent to the minimum multicut,yielding another implicit O(log n) approximation as well as an APX-hardness result.

A natural extension of this work would be to improve the approximation factor for minimizing disagreements.Given our hardness result and the history of the minimum-multicut problem,this goal is probably very dif?cult.Another option is to improve the lower bound,but for the same reason,this goal is probably very dif?cult.On the other hand, one might try to design an alternate O(log n)-approximation algorithm that is combinatorial,perhaps by counting erroneous cycles in a cycle cover of the graph.

Another interesting direction is to explore other objective functions of the correlation-clustering problem.Bansal et al.

[1]give a polynomial-time approximation scheme(PTAS)for maximizing agreements in unweighted complete graphs. For weighted general graphs,one of two trivial clusterings(every vertex in a distinct cluster or all vertices in a single

cluster)is a1

2approximation.Recently,Swamy[26]gave a0.7666-approximation algorithm for the unweighted general

graphs based on semide?nite programming.It would be interesting to obtain better approximations,in particular a PTAS, for general graphs.Bansal et al.[1]also mention the objective of maximizing agreements minus disagreements.This objective is of particular practical interest.However,there are no known approximation algorithms for this objective, even for complete graphs.

Finally,it would be interesting to apply the techniques presented here to other problems.The region-growing technique and Klein et al.[16]rounding technique both provide a radius guarantee on the output clusters.Many papers have used this radius guarantee to demonstrate that the solution is feasible,i.e.,satis?es the constraints.Inspired by Bejerano et al.[2],we also use the radius guarantee to bound the approximation factor.This idea might be applicable to other problems.

Acknowledgments

Many thanks go to Avrim Blum,Shuchi Chawla,Mohammad Mahdian,David Liben-Nowell,and Grant Wang for helpful discussions.Many results in this paper were inspired by conversations with Sef?Naor.

References

[1]N.Bansal,A.Blum,S.Chawla,Correlation clustering,in:Proc.43rd Annu.IEEE Symp.on Foundations of Computer Science,Vancouver,

Canada,November2002,pp.238–250.

[2]Y.Bejerano,N.Immorlica,J.Naor,M.Smith,Ef?cient location area planning for personal communication systems,in:Proc.Ninth Annual

Internat.Conf.on Mobile Computing and Networking,San Diego,CA,September2003,pp.109–121.

[3]M.Charikar,V.Guruswami,A.Wirth,Clustering with qualitative information,in:Proc.44th Annu.IEEE Symp.Foundations of Computer

Science,2003,pp.524–533.

E.D.Demaine et al./Theoretical Computer Science361(2006)172–187187

[4]M.Charikar,A.Wirth,Maximizing quadratic programs:extending grothendieck’s inequality,in:Proc.45th Annu.IEEE Symp.on Foundations

of Computer Science,2004,pp.54–60.

[5]E.Dahlhaus,D.S.Johnson,C.H.Papadimitriou,P.D.Seymour,M.Yannakakis,The complexity of multiterminal cuts,SIAM https://www.sodocs.net/doc/23786728.html,put.4(23)

(1994)864–894.

[6]E.D.Demaine,N.Immorlica,Correlation clustering with partial information,in:Proc.Sixth Internat.Workshop on Approximation Algorithms

for Combinatorial Optimization Problems,Princeton,NJ,August2003,pp.1–13.

[7]D.Emanuel,A.Fiat,Correlation clustering—minimizing disagreements on arbitrary weighted graphs,in:Proc.11th Annu.European Symp.

on Algorithms,2003,pp.208–220.

[8]M.Ester,H.-P.Kriegel,J.Sander,X.Xu,Clustering for mining in large spatial databases,KI-Journal1(1998)18–24(Special Issue on Data

Mining.ScienTec Publishing).

[9]N.Garg,V.V.Vazirani,M.Yannakakis,Approximate max-?ow min-(multi)cut theorems and their applications,SIAM https://www.sodocs.net/doc/23786728.html,put.25(2)(1996)

235–251.

[10]N.Garg,V.V.Vazirani,M.Yannakakis,Primal-dual approximation algorithms for integral?ow and multicut in trees,Algorithmica18(1)

(1997)3–20.

[11]D.S.Hochbaum,D.B.Shmoys,A uni?ed approach to approximation algorithms for bottleneck problems,J.ACM33(1986)533–550.

[12]T.C.Hu,Multicommodity network?ows,Oper.Res.11(1963)344–360.

[13]K.Jain,V.V.Vazirani,Primal-dual approximation algorithms for metric facility location and k-median problems,in:Proc.40th Annu.Symp.

on Foundations of Computer Science,1999,pp.2–13.

[14]T.Kanungo,D.M.Mount,https://www.sodocs.net/doc/23786728.html,anyahu,C.D.Piatko,R.Silverman,A.Y.Wu,An ef?cient k-means clustering algorithm:analysis and

implementation,IEEE Trans.Pattern Anal.Mach.Intell.24(7)(2002)881–892.

[15]D.Klein,S.D.Kamvar,C.D.Manning,From instance-level constraints to space-level constraints:making the most of prior knowledge in data

clustering,in:Proc.19th Internat.Conf.on Machine Learning,2002,pp.307–314.

[16]P.N.Klein,S.A.Plotkin,S.Rao,Excluded minors,network decomposition,and multicommodity?ow,in:Proc.25th Annu.ACM Symp.on

Theory of Computing,1993,pp.682–690.

[17]T.Leighton,S.Rao,Multicommodity max-?ow min-cut theorems and their use in designing approximation algorithms,J.ACM46(6)(1999)

787–832.

[18]J.B.MacQueen,Some methods for classi?cation and analysis of multivariate observations,in:Proc.Fifth Symp.on Mathematical,Statistics,

and Probability,1967,pp.281–297.

[19]M.Meila,D.Heckerman,An experimental comparison of several clustering and initialization methods,Conf.on Uncertainty in Arti?cial

Intelligence,1998,pp.386–395.

[20]F.Murtagh,A survey of recent advances in hierarchical clustering algorithms,Comput.J.26(4)(1983)354–359.

[21]C.M.Procopiuc,Clustering problems and their applications,Department of Computer Science,Duke University. https://www.sodocs.net/doc/23786728.html,/

~magda/clustering-survey.ps.gz .

[22]O.Reingold,S.P.Vadhan,A.Wigderson,Entropy waves,the zig-zag graph product,and new constant-degree expanders and extractors,

Electronic Colloq.on Computational Complexity(ECCC),V ol.8(18),2001.

[23]M.Satchell,Hunting for good Will:Will the real Shakespeare please stand up?U.S.News and World Report,July242000,pp.71–72.

[24]L.J.Schulman,Clustering for edge-cost minimization,Electronic Colloq.on Computational Complexity(ECCC),V ol.6(035),1999.

[25]M.Steinbach,G.Karypis,V.Kumar,A comparison of document clustering techniques,KDD-2000Workshop on Text Mining,2000.

[26]C.Swamy,Correlation clustering:maximizing agreements via semide?nite programming,in:Proc.15th Annu.ACM-SIAM Symp.on Discrete

Algorithms,New Orleans,LA,2004.Society for Industrial and Applied Mathematics,pp.526–527.

[27]E.Tardos,V.V.Vazirani,Improved bounds for the max-?ow min-multicut ratio for planar and K rr-free graphs,Inform.Process.Lett.47(2)

(1993)77–80.

[28]V.V.Vazirani,Approximation Algorithms,Springer,Berlin,2001.

[29]K.Wagstaff,C.Cardie,Clustering with instance-level constraints,in:Proc.17th Internat.Conf.on Machine Learning,2000,pp.1103–1110.

[30]K.Wagstaff,C.Cardie,S.Rogers,S.Schroedl,Constrained K-means clustering with background knowledge,in:Proc.18th Internat.Conf.

Machine Learning,2001,pp.577–584.

[31]M.Yannakakis,P.C.Kanellakis,S.C.Cosmadakis,C.H.Papadimitriou,Cutting and partitioning a graph after a?xed pattern,in:Proc.10th

Internat.Colloq.on Automata,Languages and Programming,1983,pp.712–722.

相关主题