`
ajinn
  • 浏览: 34323 次
文章分类
社区版块
存档分类
最新评论

练习Data Science - Pandas Pandas Pandas

 
阅读更多

n = int(input())

 

import numpy as np 

from sklearn.cluster import KMeans

 

c1=np.array([0,0])

c2=np.array([2,2])

# print(c1,c2)

X = []

X1 = []

X2 = []

for i in range(n):

    X=([float(x) for x in input().split()])

    X0=np.array(X)

    if np.linalg.norm(X0-c1)<=np.linalg.norm(X0-c2):

        X1.append(X)

    else:

        X2.append(X)    

 

# print(X1)

d1=np.array(X1)

# print(X2)

d2=np.array(X2)

if not d1.any():

    print('None')

else:

    print(np.around(np.mean(d1,axis=0),2))

# print(np.array(X1))

if not d2.any():

    print('None')

else:

    print(np.around(np.mean(d2,axis=0),2))

# print(X2)

===============

Data Science - Pandas Pandas Pandas

 

Finding the next centroid

 

Unsupervised learning algorithm clustering involves updating the centroid of each cluster. Here we find the next centroids for given data points and initial centroids.

 

Task

Assume that there are two clusters among the given two-dimensional data points and two random points (0, 0), and (2, 2) are the initial cluster centroids. Calculate the euclidean distance between each data point and each of the centroid, assign each data point to its nearest centroid, then calculate the new centroid. If there's a tie, assign the data point to the cluster with centroid (0, 0). If none of the data points were assigned to the given centroid, return None.

 

Input Format

First line: an integer to indicate the number of data points (n)

Next n lines: two numeric values per each line to represent a data point in two dimensional space.

 

Output Format

Two lists for two centroids. Numbers are rounded to the second decimal place.

 

Sample Input

3

1 0

0 .5

4 0

 

Sample Output

[0.5 0.25]

[4. 0.]

Explanation

There are 3 data points and we would like to identify two clusters among them. Initial centroids are given (0, 0), and (2, 2). The distances between the first data point (1, 0) and each of the centroids are 1.0 and 2.24, rounded to the second decimal place. The first data point is closter to (0, 0), thus assigned the 0-th cluster. Similarly data point (0, .5) is closer to (0, 0) than to (2, 2), also assigned to the 0th cluster; while (4, 0) is closter to (2, 2), thus assigned to the 1st cluster. To calculate the new centroids, take the mean of all data points in the 0-th and 1st cluster, respectively. Hence the results are [0.5 0.25] and [4. 0.].

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics