python中K-means算法基础知识点

更新时间：2021年01月25日 15:01:39 作者：十一

在本篇文章里小编给大家整理的是一篇关于python中K-means算法基础知识点内容，有兴趣的朋友们可以学习参考下。

能够学习和掌握编程，最好的学习方式，就是去掌握基本的使用技巧，再多的概念意义，总归都是为了使用服务的，K-means算法又叫K-均值算法，是非监督学习中的聚类算法。主要有三个元素，其中N是元素个数，x表示元素，c(j)表示第j簇的质心，下面就使用方式给大家简单介绍实例使用。

K-Means算法进行聚类分析

km = KMeans(n_clusters = 3)
km.fit(X)
centers = km.cluster_centers_
print(centers)

三个簇的中心点坐标为：

[[5.006 3.428 ]

[6.81276596 3.07446809]

[5.77358491 2.69245283]]

比较一下K-Means聚类结果和实际样本之间的差别：

predicted_labels = km.labels_
fig, axes = plt.subplots(1, 2, figsize=(16,8))
axes[0].scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1, 
        edgecolor='k', s=150)
axes[1].scatter(X[:, 0], X[:, 1], c=predicted_labels, cmap=plt.cm.Set1,
        edgecolor='k', s=150)
axes[0].set_xlabel('Sepal length', fontsize=16)
axes[0].set_ylabel('Sepal width', fontsize=16)
axes[1].set_xlabel('Sepal length', fontsize=16)
axes[1].set_ylabel('Sepal width', fontsize=16)
axes[0].tick_params(direction='in', length=10, width=5, colors='k', labelsize=20)
axes[1].tick_params(direction='in', length=10, width=5, colors='k', labelsize=20)
axes[0].set_title('Actual', fontsize=18)
axes[1].set_title('Predicted', fontsize=18)

k-means算法实例扩展内容：

# -*- coding: utf-8 -*- 
"""Excercise 9.4"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
import random

data = pd.read_csv(filepath_or_buffer = '../dataset/watermelon4.0.csv', sep = ',')[["密度","含糖率"]].values

########################################## K-means ####################################### 
k = int(sys.argv[1])
#Randomly choose k samples from data as mean vectors
mean_vectors = random.sample(data,k)

def dist(p1,p2):
 return np.sqrt(sum((p1-p2)*(p1-p2)))
while True:
 print mean_vectors
 clusters = map ((lambda x:[x]), mean_vectors) 
 for sample in data:
  distances = map((lambda m: dist(sample,m)), mean_vectors) 
  min_index = distances.index(min(distances))
  clusters[min_index].append(sample)
 new_mean_vectors = []
 for c,v in zip(clusters,mean_vectors):
  new_mean_vector = sum(c)/len(c)
  #If the difference betweenthe new mean vector and the old mean vector is less than 0.0001
  #then do not updata the mean vector
  if all(np.divide((new_mean_vector-v),v) < np.array([0.0001,0.0001]) ):
   new_mean_vectors.append(v) 
  else:
   new_mean_vectors.append(new_mean_vector) 
 if np.array_equal(mean_vectors,new_mean_vectors):
  break
 else:
  mean_vectors = new_mean_vectors 

#Show the clustering result
total_colors = ['r','y','g','b','c','m','k']
colors = random.sample(total_colors,k)
for cluster,color in zip(clusters,colors):
 density = map(lambda arr:arr[0],cluster)
 sugar_content = map(lambda arr:arr[1],cluster)
 plt.scatter(density,sugar_content,c = color)
plt.show()

到此这篇关于python中K-means算法基础知识点的文章就介绍到这了,更多相关python中K-means算法是什么内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家！

您可能感兴趣的文章:

Python正则表达式使用经典实例
本文给大家总结了17种python正则表达式使用经典实例，非常不错具有参考借鉴价值，感兴趣的朋友一起学习吧
2016-06-06
Python 内置函数globals()和locals()对比详解
这篇文章主要介绍了Python globals()和locals()对比详解,文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
2019-12-12
python模块中pip命令的基本使用
这篇文章主要为大家介绍了python机器学习python实现神经网络的示例解析，在同样在进行python机器学习的同学可以借鉴参考下，希望能够有所帮助
2021-10-10
Python实现文件压缩和解压的示例代码
这篇文章主要介绍了Python实现文件压缩和解压的方法，帮助大家更好的理解和学习python，感兴趣的朋友可以了解下
2020-08-08
详解在python操作数据库中游标的使用方法
这篇文章主要介绍了在python操作数据库中游标的使用方法,本文给大家介绍的非常详细，具有一定的参考借鉴价值，需要的朋友可以参考下
2019-11-11
python爬虫Scrapy框架:媒体管道原理学习分析
这篇文章主要介绍了python爬虫Scrapy框架:媒体管道原理学习分析，有需要的朋友可以借鉴参考，希望可以对广大一同学习的读者朋友有所帮助
2021-09-09
详解用Python为直方图绘制拟合曲线的两种方法
这篇文章主要介绍了详解用Python为直方图绘制拟合曲线的两种方法，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2019-08-08
Python实现的matplotlib动画演示之细胞自动机
这篇文章主要介绍了Python实现的matplotlib动画演示之细胞自动机,用python来模拟，首先尝试表示Beacon，本文通过实例代码给大家介绍的非常详细，需要的朋友可以参考下
2022-04-04
python顺序执行多个py文件的方法
今天小编大家分享一篇python顺序执行多个py文件的方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2019-06-06
python实现数通设备端口监控示例
这篇文章主要介绍了python实现数通设备端口监控示例,需要的朋友可以参考下
2014-04-04

python中K-means算法基础知识点

相关文章

最新评论

大家感兴趣的内容

最近更新的内容

常用在线小工具