论文导读：Replanting Your Forest: NVM-friendly Bagging Strategy f-58码农网

介绍

non-volatile memory (NVM)非挥发性记忆体

优点：

higher cell density ->可以存放更多空间lower power consumption->不用靠着週期性充电来存放资料read performance与RAM差不多

缺点：

limited write endurance-->写入寿命低The asymmetric properties of NVM read/write operations may also largely limit the feasibility of performing machine learning algorithms directly on NVM

有可能取代DRAM的NVM

spin-transfer torque magnetic random access memory (STT-RAM or STT-MRAM)resistive random access memory (ReRAM)phase-change memory (PCM)

Random forest (or random decision forest)

应用
supervised and ensemble learning algorithm for both classification and regression problems, which constitute the majority of machine learning applications or systems nowadays.

bagging process
随机选择一定数量的来自训练数据集的数据样本形成和训练decision trees，bagging process通常运行几轮形成decision tree forest。值得注意的是，每轮bagging使用的选定数据样本将被放回原位并可能被下一轮使用。bagging保证多个decision trees的随机性。vote
在建立了多个独立的decision trees之后，一个随机的forest 可以对分类任务进行多数投票或对个人决策的所有估计取平均值回归问题的树有一个更準确和稳定的预测。优点
当子树够多，通常随机森林不会over-fit the model or training dataset

论文动机

因随着数据的生成爆炸性地，随机森林算法需要处理数据集的大小迅速增加，并且可能面临保持数量飙升数据样本到main memory的困难。
而现实中庞大的数据集可能首先保存在secondary storage(SSD)，透过swap in-and-out between the faster-but-smaller main memory and the slower but-larger secondary storage。
儘管如此，这种解决方案可能不可避免地导致频繁的数据交换，这可能会严重降低随机森林的构建/训练阶段中的runtime performance。
而频繁的swapping导不但导致NVM的寿命缩短还会使runtime performance下降以及耗电上升此外发现随机森林很有可能选取不同子树但是是同比资料造成不必要的data swapping因此提出MVN-friendly bagging strategy

论文目的

协调具有NVM特性的机器学习算法的特殊数据access pattern，从而最终将 NVM write minimize使得secondary storage to main memory之间不必要的数据交换和续航优化。

MVN-friendly bagging strategy

核心概念

可以积极地重複使用data在bagging process过程中，并且不影响prediction accuracy of a random forest。

方法

Design Concept: Sampled Data Reusing
每次写入新的round时，random reusing data可以使写入次数下降，但是会有Wear Un-leavling(写入的block是不平均的进而导致经常重複写入的block寿命较短)的问题发生，因此我们的设计理念需要修改。

Marching Based Reusing Policy(MRP)
上述提到的问题透过MRP来解决，选择一个枢纽将reusing data和random access data拆成两个部分，而下一次则是将上一次的random access data当作reuse data，枢纽的终止条件为当reuse frame=random frame时也就是上图中的3rd Round，从结果来看这个方法不但可以降低写入次数还能使得Wear Leveling，增加block的平均寿命。

论文结果

使用数据

Adult dataset is used to predict whether a person earns over 50,000 USD a year by considering attributes,such as age, education, occupation, sex and race.Dota2 dataset is used to predict the winner among two competitive teams (5 members for each team) by considering attributes,such as game type, game mode and the hero identification.Poker dataset is used to predict a hand consisting of five playing cards (such as full house and royal flush) drawn from a standard deck of 52 poker cards by considering attributes, such as ordinal and numerical.

每个data区分成70% training data、30% testing data并用testing data的ACC来验证reuse data不会影响ACC。

图表分析

data reused ratio is the ratio between the size of reused data and the selected data.分别有0%、25%、50%、75%、100%

write与reuse ratio的关係

首先上图(Fig. 3)可以看出当reuse ratio越高，write的次数可以下降因为可以减少在main memory与secondary storage之间的data movement。

ACC与reuse ratio的关係

而上图(Fig. 4)可以看出当reuse ratio不要超过75%时几乎可以跟0%有着一样的ACC。

ACC与tree size和reuse ratio的关係

因许多研究指出[1]random forest may not always be beneficial to the forest
size (e.g., the number of the decision trees).而上图(Fig. 5)可以看出当reuse ratio不要超过75%时几乎可以跟0%有着一样的ACC，即便改变tree size也一样。

结论

本篇论文使用了NVM-friendly bagging strategy去降低写入次数及解决Wear Un-leavling的问题，最终此设计最多可节省72%的写入访问且几乎不影响ACC。
$$ 0.72 = \frac{写入数量当reuse-ratio=0-写入数量当reuse-ratio=0.75}{写入数量当reuseratio=0} $$

参考资料

[1] Thais Mayumi Oshiro, Pedro Santoro Perez, and Jos´e Augusto Baranauskas. How many trees in a random forest? In International workshop on machine learning and data mining in pattern recognition,pages 154–168. Springer, 2012.
[2] Y. T. Ho, C. -F. Wu, M. -C. Yang, T. -Y. Chen and Y. -H. Chang, "Replanting Your Forest: NVM-friendly Bagging Strategy for Random Forest," 2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA), Hangzhou, China, 2019, pp. 1-6, doi: 10.1109/NVMSA.2019.8863525.

介绍

non-volatile memory (NVM)非挥发性记忆体

Random forest (or random decision forest)

论文动机

论文目的

MVN-friendly bagging strategy

核心概念

方法

论文结果

使用数据

图表分析

write与reuse ratio的关係

ACC与reuse ratio的关係

ACC与tree size和reuse ratio的关係

结论

参考资料

给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1论文导读：Replanting Your Forest: NVM-friendly Bagging Strategy f

2留职停薪到菲律宾游学

3ISO 27001 资讯安全管理系统 【解析】(完)

4【基础影像应用篇】DAY20.资料回测-区域警报模式

5VMware 硬碟资料 不同主机系统 搬移救援

3ISO 27001 资讯安全管理系统【解析】(完)

5VMware 硬碟资料不同主机系统搬移救援