Chap.O 程式基础 & 简介：

Part 1. 常用于演算法的开发程式，有以下几种：

1-1. Python (免费，套件多，系统整合佳)

1-2. R (免费，套件多，系统整合差)

1-3. Matlab (贵，套件少但功能完整，系统整合佳)

Part 2. Python 能做甚么？

Program development 程式开发Website development, crawler 网站开发、爬虫Statistics, Mathematics 统计、数学Programming language 程式开发入门语言System Management Script 系统管理脚本Data Science 资料科学（着重分析资料）Data Mining Algorithms 数据挖掘算法（着重分析资料）Deep Learning: Neural Network、CNN/RNN　深度学习：神经网路（着重预测资料）

Part 3. 那么，AI 又有哪些应用领域呢？

Natural Language Understanding 自然语言处理Computer Vision 电脑视觉Speech Understanding 语音辨识Robotic Application 机器人应用Intelligent Agent 智慧型代理人：聊天机器人、AlphaGo...etc.Self driving Car 自驾车医疗：MRI 影像处理、诊断、新药开发...etc.智慧製造、智慧农业、智慧理财...etc.

了解上述功能之后，接着进入正题~

Chap.I 理论基础：

了解上述功能与应用后，我们会从基础数学理论开始说起。其中包括：

Part 1：Linear algebra 线性代数

Part 2：Differential & Integral 微积分

Part 3：Vector 向量

Part 4：Statistics & Probability 统计&机率

Chap.II 深度学习与模型优化：

所有预测模型，都离不开下图 10 大步骤。此章节会依序解释每个步骤的应用。

sklearn 简介-如何选择一个合适的演算法

深度学习根据情境不同，概略分为三种：

Part 1. Supervised 监督式学习：

资料经过 Lebaling 标籤化，即有正确解答。
此外，依据资料类型不同，监督式学习分为以下两种：

Classification 分类：

资料集以＂有限的类别＂分布，对于其做归类，即分类。如：铁达尼号、红酒分类...等。
以下会用两个範例说明：

A.＂鸢尾花＂的分类预测：

import pandas as pdimport numpy as npfrom sklearn import datasets     # 引用 Scikit-Learn 中的 套件 datasets# 1. Data Setds = datasets.load_iris()        # dataset: 引用 datasets 中的函数 load_irisprint(ds.DESCR)                  # DESCR: description，描述载入内容X =pd.DataFrame(ds.data, columns=ds.feature_names)y = ds.target# 2. Data clean (missing value check)print(X.isna().sum())>>  sepal length (cm)    0    sepal width (cm)     0    petal length (cm)    0    petal width (cm)     0    dtype: int64# 3. Feature Engineering# No need# 4. Data Split (Training data & Test data)from sklearn.model_selection import train_test_split    # test_size=0.2: 测试用资料为 20%X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)print(X_train.shape, y_train.shape)>>  (120, 4) (120,)# 5. Define and train the KNN modelfrom sklearn.neighbors import KNeighborsClassifier# n_neighbors=: 超参数 (hyperparameter)clf = KNeighborsClassifier(n_neighbors = 3)# 适配 (训练)，迴归/分类/降维...皆用 fit()clf.fit(X_train, y_train)# algorithm.score: 使用 test 资料 input，并根据结果评分print(f'score={clf.score(X_test, y_test)}')>>  score=0.9# 验证答案print(' '.join(y_test.astype(str)))print(' '.join(clf.predict(X_test).astype(str)))>>  1 2 0 0 0 2 1 1 1 0 1 2 2 2 0 2 1 1 1 0 1 1 2 2 1 1 0 2 2 2    1 2 0 0 0 2 1 1 1 0 1 1 2 2 0 2 1 1 1 0 1 1 2 2 1 2 0 2 1 2# 查看预测的机率print(clf.predict_proba(X_test.head()))  # 预测每个 x_test 机率>>  [[0. 1. 0.]     [0. 0. 1.]     [1. 0. 0.]     [1. 0. 0.]     [1. 0. 0.]]

B.＂乳癌＂的分类预测：

import pandas as pdimport numpy as npfrom sklearn import datasets# 1. Datasetds = datasets.load_breast_cancer()X =pd.DataFrame(ds.data, columns=ds.feature_names)y = ds.target# 2. Data clean# no need# 3. Feature Engineering# no need# 4. Splitfrom sklearn.model_selection import train_test_split    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)# 5. Define and train the KNN modelfrom sklearn.neighbors import KNeighborsClassifierclf = KNeighborsClassifier(n_neighbors = 3)# 适配(训练)，迴归/分类/降维...皆用 fit(x_train, y_train)clf.fit(X_train, y_train)# algorithm.score: 使用 test 资料 input，并根据结果评分print(f'score={clf.score(X_test, y_test)}')>>  score=0.9210526315789473# 验证答案print(' '.join(y_test.astype(str)))print(' '.join(clf.predict(X_test).astype(str)))>>  1 1 0 0 0 ... 0    1 1 0 0 0 ... 0# 查看预测的机率print(clf.predict_proba(X_test.head()))>>  [[0. 1.]     [0. 1.]     [1. 0.]     [1. 0.]     [1. 0.]]

Regression 迴归：

资料集以＂连续的方式分布＂，对于其以线性方式描述，即迴归。如：房价预测、小费预测...等。

此图为线性迴归原理

以下会用两个範例说明：

A.＂世界人口＂的迴归预测：

# 1. DataSetyear=[1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039, 2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047, 2048, 2049, 2050, 2051, 2052, 2053, 2054, 2055, 2056, 2057, 2058, 2059, 2060, 2061, 2062, 2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 2071, 2072, 2073, 2074, 2075, 2076, 2077, 2078, 2079, 2080, 2081, 2082, 2083, 2084, 2085, 2086, 2087, 2088, 2089, 2090, 2091, 2092, 2093, 2094, 2095, 2096, 2097, 2098, 2099, 2100]pop=[2.53, 2.57, 2.62, 2.67, 2.71, 2.76, 2.81, 2.86, 2.92, 2.97, 3.03, 3.08, 3.14, 3.2, 3.26, 3.33, 3.4, 3.47, 3.54, 3.62, 3.69, 3.77, 3.84, 3.92, 4.0, 4.07, 4.15, 4.22, 4.3, 4.37, 4.45, 4.53, 4.61, 4.69, 4.78, 4.86, 4.95, 5.05, 5.14, 5.23, 5.32, 5.41, 5.49, 5.58, 5.66, 5.74, 5.82, 5.9, 5.98, 6.05, 6.13, 6.2, 6.28, 6.36, 6.44, 6.51, 6.59, 6.67, 6.75, 6.83, 6.92, 7.0, 7.08, 7.16, 7.24, 7.32, 7.4, 7.48, 7.56, 7.64, 7.72, 7.79, 7.87, 7.94, 8.01, 8.08, 8.15, 8.22, 8.29, 8.36, 8.42, 8.49, 8.56, 8.62, 8.68, 8.74, 8.8, 8.86, 8.92, 8.98, 9.04, 9.09, 9.15, 9.2, 9.26, 9.31, 9.36, 9.41, 9.46, 9.5, 9.55, 9.6, 9.64, 9.68, 9.73, 9.77, 9.81, 9.85, 9.88, 9.92, 9.96, 9.99, 10.03, 10.06, 10.09, 10.13, 10.16, 10.19, 10.22, 10.25, 10.28, 10.31, 10.33, 10.36, 10.38, 10.41, 10.43, 10.46, 10.48, 10.5, 10.52, 10.55, 10.57, 10.59, 10.61, 10.63, 10.65, 10.66, 10.68, 10.7, 10.72, 10.73, 10.75, 10.77, 10.78, 10.79, 10.81, 10.82, 10.83, 10.84, 10.85]df = pd.DataFrame({'year' : year, 'pop' : pop})# 2. 求 1 次项均方误差 MSE (Mean-Square Error)in_year = int(input('Please input 1950~2100 to calculation:'))fit1 = np.polyfit(x, y, 1)if 2100 >= in_year >= 1950:    print('The actual pop is:', y[in_year-1950])    print('Predict pop is:', f'{(np.poly1d(fit1)(in_year)):.2}')    y1 = fit1[0]*np.array(x) + fit1[1]    print('MSE is:', f'{((y - y1)**2).mean():.2}')else:    print('Wrong year!')# 3. 作图def ppf(x, y, order):    fit = np.polyfit(x, y, order)      # 线性迴归，求 y=a + bx^1+ cx^2 ...的参数    p = np.poly1d(fit)                 # 将 polyfit 迴归解代入    t = np.linspace(1950, 2100, 2000)    plt.plot(x, y, 'ro', t, p(t), 'b--')plt.figure(figsize=(18, 4))titles = ['fitting with 1', 'fitting with 3', 'fitting with 50']for i, o in enumerate([1, 3, 50]):    plt.subplot(1, 3, i+1)    ppf(year, pop, o)    plt.title(titles[i], fontsize=20)plt.show()

B.＂波士顿房价＂的迴归预测：

import pandas as pdimport numpy as npfrom sklearn import datasets# 1. Datasetds = datasets.load_boston()X =pd.DataFrame(ds.data, columns=ds.feature_names)y = ds.target# 2. Data cleanprint(X.isna().sum())# 3. Feature Engineering# 4. Splitfrom sklearn.model_selection import train_test_split    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)>> (404, 13) (404,)# 5. Define and train the LinearRegression modelfrom sklearn.linear_model import LinearRegressionclf = LinearRegression()# 适配(训练)，迴归/分类/降维...皆用 fit(x_train, y_train)clf.fit(X_train, y_train)# algorithm.score: 使用 test 资料 input，并根据结果评分print(f'score={clf.score(X_test, y_test)}')>> import pandas as pdimport numpy as npfrom sklearn import datasets# 1. Datasetds = datasets.load_boston()X =pd.DataFrame(ds.data, columns=ds.feature_names)y = ds.target# 2. Data cleanprint(X.isna().sum())# 3. Feature Engineering# 4. Splitfrom sklearn.model_selection import train_test_split    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)>> (404, 13) (404,)# 5. Define and train the LinearRegression modelfrom sklearn.linear_model import LinearRegressionclf = LinearRegression()# 适配(训练)，迴归/分类/降维...皆用 fit(x_train, y_train)clf.fit(X_train, y_train)# algorithm.score: 使用 test 资料 input，并根据结果评分print(f'score={clf.score(X_test, y_test)}')>>  score=0.6008214413101689# 验证答案print(list(y_test))b = [float(f'{i:.2}') for i in clf.predict(X_test)]print(b)>>  [30.3, 8.4, 17.4, 10.2, 12.8, ... 22.5]    [32.0, 4.6, 22.0, 6.2, 13.0, ... 29.0]

Part 2. Unsupervised 非监督式学习：

部分或者全部资料 Unlebaling 无标籤化，即没有正确解答。

2-1. Clustering 集群

将特徵相近的点归类，概念有些类似 Regression，称为集群。如下图：

以下为 CLV (Regression) 範例：

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltds = pd.read_csv('CLV.csv')print(ds.describe().T)

A. 手动分群

分 1~10群，计算误差平方和 (elbow method) 最少者为优。

# 没有 yX=ds.iloc[:,[0,1]].valuesfrom sklearn.cluster import KMeanswcss = []for i in range(1,11):          km=KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)    km.fit(X)    wcss.append(km.inertia_)plt.plot(range(1,11),wcss)plt.title('Elbow Method')plt.xlabel('Number of clusters')plt.ylabel('wcss')plt.show()

可以取用 2 群、4 群 or 10 群。

B. 自动分群

使用 sklearn 内建计算轮廓係数 (Silhoutte Coefficient)

from sklearn.metrics import silhouette_scorefrom sklearn.cluster import KMeansfor n_cluster in range(2, 11):    kmeans = KMeans(n_clusters=n_cluster).fit(X)    label = kmeans.labels_    sil_coeff = silhouette_score(X, label, metric='euclidean')    print(f"n_clusters={n_cluster}, Silhouette Coefficient is {sil_coeff:.4}")    >>  n_clusters=2, Silhouette Coefficient is 0.4401    n_clusters=3, Silhouette Coefficient is 0.3596    n_clusters=4, Silhouette Coefficient is 0.3721    n_clusters=5, Silhouette Coefficient is 0.3617    n_clusters=6, Silhouette Coefficient is 0.3632    n_clusters=7, Silhouette Coefficient is 0.3629    n_clusters=8, Silhouette Coefficient is 0.3538    n_clusters=9, Silhouette Coefficient is 0.3441    n_clusters=10, Silhouette Coefficient is 0.3477

分成 9 群效果最显着。

若要视觉化分群，可见以下

# Fitting kmeans to the datasetkm4=KMeans(n_clusters=8,init='k-means++', max_iter=300, n_init=10, random_state=0)y_means = km4.fit_predict(X)# Visualising the clusters for k=4plt.scatter(X[y_means==0,0],X[y_means==0,1],s=50, c='purple',label='Cluster1')plt.scatter(X[y_means==1,0],X[y_means==1,1],s=50, c='blue',label='Cluster2')plt.scatter(X[y_means==2,0],X[y_means==2,1],s=50, c='green',label='Cluster3')plt.scatter(X[y_means==3,0],X[y_means==3,1],s=50, c='cyan',label='Cluster4')plt.scatter(X[y_means==4,0],X[y_means==4,1],s=50, c='yellow',label='Cluster5')plt.scatter(X[y_means==5,0],X[y_means==5,1],s=50, c='black',label='Cluster6')plt.scatter(X[y_means==6,0],X[y_means==6,1],s=50, c='brown',label='Cluster7')plt.scatter(X[y_means==7,0],X[y_means==7,1],s=50, c='red',label='Cluster8')plt.scatter(km4.cluster_centers_[:,0], km4.cluster_centers_[:,1],s=200,marker='s', c='red', alpha=0.7, label='Centroids')plt.title('Customer segments')plt.xlabel('Annual income of customer')plt.ylabel('Annual spend from customer on site')plt.legend()plt.show()

Note: 一般客户分析会使用 RFM (Recency-Frequency-Monetary) 分析
此为机器学习第三步：Feature Engineering

Part 3. Reinforcement 强化学习：

让机器学习算法，自动学会对环境做出反应。

结论：

由于是初学，因此会先聚焦在**＂监督式学习＂&＂非监督式学习＂**上。
以上就是程式基础简介，下篇将从理论基础开始介绍。
.
.
.
.
.

Homework 小费的迴归 (regression)：

请使用 sklearn 内建的 Datasets，依照上述步骤完成以下资料的迴归or分类：

Python 演算法 Day 1 - 程式基础 & 简介

Chap.O 程式基础 & 简介：

Part 1. 常用于演算法的开发程式，有以下几种：

1-1. Python (免费，套件多，系统整合佳)

1-2. R (免费，套件多，系统整合差)

1-3. Matlab (贵，套件少但功能完整，系统整合佳)

Part 2. Python 能做甚么？

Part 3. 那么，AI 又有哪些应用领域呢？

了解上述功能之后，接着进入正题~

Chap.I 理论基础：

Part 1：Linear algebra 线性代数

Part 2：Differential & Integral 微积分

Part 3：Vector 向量

Part 4：Statistics & Probability 统计&机率

Chap.II 深度学习与模型优化：

Part 1. Supervised 监督式学习：

Classification 分类：

A.＂鸢尾花＂的分类预测：

B.＂乳癌＂的分类预测：

Regression 迴归：

A.＂世界人口＂的迴归预测：

B.＂波士顿房价＂的迴归预测：

Part 2. Unsupervised 非监督式学习：

2-1. Clustering 集群

A. 手动分群

可以取用 2 群、4 群 or 10 群。

B. 自动分群

分成 9 群效果最显着。

Part 3. Reinforcement 强化学习：

结论：

Homework 小费的迴归 (regression)：

1. 红酒分类

2. 糖尿病迴归

2. 小费迴归

补充：入门书籍推荐

1. 精通 Python (Bill Lubanovic) + github

2. Python Data Science Handbook (Jake VanderPlas) + github

3. 精通机器学习使用 Scikit-Learn, Keras 与 TansorFlow

4. DEEP LEARNING (Ian Goodfellow) 非学术，非常难看不要看...

关于作者: 网站小编

Chap.O 程式基础 & 简介：

Part 1. 常用于演算法的开发程式，有以下几种：

1-1. Python (免费，套件多，系统整合佳)

1-2. R (免费，套件多，系统整合差)

1-3. Matlab (贵，套件少但功能完整，系统整合佳)

Part 2. Python 能做甚么？

Part 3. 那么，AI 又有哪些应用领域呢？

了解上述功能之后，接着进入正题~

Chap.I 理论基础：

Part 1：Linear algebra 线性代数

Part 2：Differential & Integral 微积分

Part 3：Vector 向量

Part 4：Statistics & Probability 统计&机率

Chap.II 深度学习与模型优化：

Part 1. Supervised 监督式学习：

Classification 分类：

A.＂鸢尾花＂的分类预测：

B.＂乳癌＂的分类预测：

Regression 迴归：

A.＂世界人口＂的迴归预测：

B.＂波士顿房价＂的迴归预测：

Part 2. Unsupervised 非监督式学习：

2-1. Clustering 集群

A. 手动分群

可以取用 2 群、4 群 or 10 群。

B. 自动分群

分成 9 群效果最显着。

Part 3. Reinforcement 强化学习：

结论：

Homework 小费的迴归 (regression)：

1. 红酒分类

2. 糖尿病迴归

2. 小费迴归

补充：入门书籍推荐

1. 精通 Python (Bill Lubanovic) + github

2. Python Data Science Handbook (Jake VanderPlas) + github

3. 精通机器学习 使用 Scikit-Learn, Keras 与 TansorFlow

4. DEEP LEARNING (Ian Goodfellow) 非学术，非常难看不要看...

给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1Python 演算法 Day 1 - 程式基础 & 简介

2Day 17 (Ps)

3.NET Core API 产生 server-side 验证码

4javascript 防疫自学日记 day 1

5json档删除符合条件的特定事件该怎么做?

3. 精通机器学习使用 Scikit-Learn, Keras 与 TansorFlow