前言

这次使用Tensorflow不熟花了比较多时间(一直找官网API....)，我想很多人都有点痛恨用个迴圈就能快速解决了但在python中迴圈的速度其实就并不快，而Tensorflow主要是用图表示，也就是说迴圈部分要尽量避免掉(跑回圈会跑出一堆图)，要尽量使用张量来运算才不会消耗太多时间。但也因为没使用迴圈，程式码看起来很简洁，这也算是好处之一。

训练步骤

创建中心点资料和训练资料。(这里为了好理解都已先设定好，通常都会给随机值)。走访训练次数。预测目前每个训练资料所属群。更新中心。重複2~4直到训练次数。

K-Mean

程式码

初始化

1.datas1、datas2、datas3为初始化点。
2.datas为使用concat将资料串起。
3.center_datas为随机中心点(这里固定)。
4.input_center为输入佔位符。
5.train_op为训练函数(后面介绍)。
6.画目前圆心点(紫色)。

# initdatas1 = tf.random_uniform([10, 2], minval=0, maxval=5, name="datas1")datas2 = tf.random_uniform([10, 2], minval=10, maxval=15, name="datas2")datas3 = tf.random_uniform([10, 2], minval=20, maxval=25, name="datas3")datas = tf.concat([datas1, datas2, datas3], axis=0, name="datas")center_datas = np.array([[1, 1], [11., 11.], [17., 17.]])input_center = tf.placeholder(tf.float32, shape=[center_datas.shape[0], 2], name="input_center")train_op = train(input_center, datas)plt.plot(center_datas[:,0], center_datas[:,1], 'mo')

距离函数

这里我使用broadcast_to将训练资料複製到与center_datas一样大小，这样就可以很方便做矩阵减法运算，然后在使用reshape将资料依照data数量分群转为三维阵列，最后在使用第三维去做总和结果即是每个训练资料对每个点的结果。

1.col_size为计算center_datas大小。
2.将center_datas转维一维大小即是col_size(后面计算会自动扩展)。
3.使用广播(扩展)broadcast_to，将data的座标(向量)往右扩展至col_size。
4.计算上述两者的平方差。
5.转至大小[data数量,-1(这里为col_size / 2) 2]。
6.使用第三维计算每个座标(向量)差的和。

def distance(center_datas, data):    col_size = center_datas.shape[1] * center_datas.shape[0]    diff = (tf.reshape(center_datas, [-1]) - tf.broadcast_to(data, [data.shape[0], col_size])) ** 2    return tf.reduce_sum(tf.reshape(diff, [data.shape[0], -1, 2]), axis=2)

预测函数

1.取得每个点与中心点的距离。
2.对第二维取最小值，即是目前离最近的值。

def predict(center_datas, data):    distances = distance(center_datas, data)    return tf.argmin(distances, axis=1)

更新函数

这里使用segment_mean函数，会依照给的segment_ids参数去做分类并且平均。但它有个限制segment_ids必须经过小到大排序否则会抛错。
注:这里算有个BUG，如果预测没有某群，则segment_mean则不会分到。

1.先使用negative将值转为负，在使用top_k(大到小排序)取出原先的索引。
2.使用gather取出对应的datas和sort_predicts。
3.使用segment_mean分组计算平均(segment_ids=sort_predicts)。

def update(datas, predicts):    indices = tf.nn.top_k(tf.negative(predicts), k=datas.shape[0]).indices    sort_datas = tf.gather(datas, indices)    sort_predicts = tf.gather(predicts, indices)    return tf.math.segment_mean(sort_datas, sort_predicts, name="update_center")

训练函数

1.预测。
2.更新权重。

def train(center_datas, datas):    predicts = predict(center_datas, datas)    return update(datas, predicts)

训练

1.训练至设置times次即可。这里可使用repeat将中心点修正至原先大小。

# trainfor time in range(times):    center_datas = session.run(train_op, feed_dict={input_center:center_datas})    # using numpy "repeat" or other  method

结果

上图，紫色为原先座标，绿色为更新后的群中心，其他为预测结果。

结语

这次转个方向使用张量来处理算是满有收穫的，因原先我是使用别的语言实作，很多地方都只要用for迴圈就能很快处理完，但张量要用矩阵的方式去思考，对我而言算是个不错训练，等比较有时间再补上完整程式码下载。若有错误或疑问欢迎留言或私讯。

参考网址与书籍

[1] https://www.tensorflow.org/api_docs/python/tf

[笔记]Tensorflow-Lesson5_K群分类演算法(K-Means)

前言

训练步骤

K-Mean

程式码

初始化

距离函数

预测函数

更新函数

训练函数

训练

结果

结语

参考网址与书籍

关于作者: 网站小编

前言

训练步骤

K-Mean

程式码

初始化

距离函数

预测函数

更新函数

训练函数

训练

结果

结语

参考网址与书籍

给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1[笔记]Tensorflow-Lesson5_K群分类演算法(K-Means)

2【C#】小知识 #9 : 解决私有内部类别单元测试问题:使用 internal + AssemblyInfo.cs 与

3Scrum: Sprint循环8个步骤

4透过 Crontab 排程备份 Mariadb （Mysql）使用 php + exec + mysqldump + g

5Scrum: “你做完了吗？” What is Definition of Done?