ML

李宏毅-机器学习2021春-6

李宏毅-机器学习2021春-6 1 GNN 2 Deeep Reinforcement Learning (RL) 当人类也不知道什么是好的输出结果时，可以用RL。 2.1 RL与机器学习的关系 Step 1 Function with unknown parameters 用sample产生随机输出 Step 2 Define “Loss” Step 3 Optimization 2.2 Policy Gradient 加入$A_n$，代表期望执行的程度。 $\gamma$：learning rate Policy Gradient的步骤： On-policy & Off-policy On-policy：用于训练的actor和用于交互的actor相同。 Off-policy：用于训练的actor和用于交互的actor不同。如Proximal Policy Optimization (PPO)。 2.3 Actor-Critic Montre-Carlo (MC) based approach Temporal-difference (TD) approach Veresion 3.5 Version 4 —— Advantage Actor-Critic

October 27, 2021 Read

李宏毅-机器学习2021春-2

李宏毅-机器学习2021春-2 1 机器学习任务攻略 Model bias：模型过于简单，有局限性 Optimization Issue：模型足够复杂了，但Optimization做的不够好。 Overfitting：在训练集上效果好，在测试集上效果差。解决方法如下：增加训练数据，Data augmentation（图片左右翻转）限制模型，减少模型的flexibility：减少参数，共享参数（CNN）减少features Early stopping Regularization Dropout 模型的复杂性需要在Overfitting和Model bias间进行trade-off mismatch：训练集和测试集的数据分别不同 N-fold Cross Validation：将Training Set进行不同的分组，分别对模型进行训练和验证，得到的mse取平均值。 2 类神经网络训练不起来怎么办 2.1 Local Minima与Saddle Point Critical point包括Local Minima与Saddle Point。当有很多参数时，Local Minima几乎不存在。 2.2 Batch与Momentum Batch Shuffle：在一个epoch结束后重新分batch。大Batch v.s. 小Batch 当考虑并行运算（gpu）后，大Batch的运行速度更快。小Batch的noise更大，但这反而有助于训练。

October 17, 2021 Read

李宏毅-机器学习2021春-3

李宏毅-机器学习2021春-3 1 Classification 将Class用one-hot vector表示回归与分类的区别： softmax： Loss的计算： 2 Convolutional Neural Network (CNN) 为Image Classification设计的网络。默认Image大小为100*100。 Receptive field（感受野）：图像的一个局部特征共享参数：w1，w2是相同的 receptive field + sharing parameteres = convolutional layer Convolutional layer：由一堆Filter组成，Filter捕捉图片里的pattern。 Feature Map：图片经过Convolution layer得到的结果 Pooling—Max Pooling 没有参数需要学习 Spatial Transformer Layer：解决CNN无法面对放大和旋转的问题 3 Self-attention 3.1 背景当输入是多组向量时，输出的情况：每一个向量都有一个label（sequence labeling），此处可以用self-attention 整个sequence有一个label 模型自己决定输出长度（seq to seq） 3.2 原理 Self-attention的输出$b^1$，既代表$a^1$，又代表$a^1$和$a^2$、$a^3$的关系。

October 17, 2021 Read

李宏毅-机器学习2021春-4

李宏毅-机器学习2021春-4 1 Transformer Sequence-to-sequence（Seq2seq）输出的长度由模型决定 Encoder+Decoder 1.1 Encoder encoder内部由许多block组成：每个block的构成如下： 1.2 Decoder-Autoregressive(AT) 与Encoder的对比图：（Multi-Head Attention前加了一个Masked） Masked Self-attention 产生$b^1$的时候只能考虑$b^1$的资讯。产生$b^2$的时候只能考虑$b^1$、$b^2$的资讯。产生$b^3$的时候只能考虑$b^1$、$b^2$、$b^3$的资讯。 1.3 Encoder-Decoder Cross attention: Teacher Forcing: 用ground truth（答案）作为Decoder的输入 1.4 Training Tips Copy Mechanism Guided Attention Beam Search：不基于贪心的一种搜索算法 2 BERT Self-supervised Learning 系统通过输入数据的一部分进行predict，另一部分输入用于进行比对。 Masking Input Next Sentence Prediction 对于BERT不是很有用 BERT经过Fine-tune，可用于下游任务。 BERT整体是Semi-supervised的。填空题（Pre-train）阶段是Self-supervised，Finetune阶段是supervised。 2.1 Bert Case 用于语义分析。输出类别。BERT是用填空题预训练的。用于词性标注。输出和输出长度相同。用于自然语言推测。输入两个句子，输出一个类别。基于提取的问答系统输入一个问题和一篇文档，输出答案的起始位置和结束位置。 2.

October 17, 2021 Read

李宏毅-机器学习2021春-5

李宏毅-机器学习2021春-5 1 Word Embedding 将每一个Word都投影到一个High Dimension的空间上。相似的词距离近。不同的Dimension代表不同的含义。 Word Embedding是一个unsupervised的过程。机器通过阅读大量文章，根据上下文信息进行学习。 Counting based Prediction based：拿出Prediction Model的第一个hiden layer，即可得到word embedding. 2 Recurrent Neural Network 有记忆的NN。 Long Shor-term Memory(LSTM)

October 17, 2021 Read

李宏毅-机器学习2021春-1

李宏毅-机器学习2021春-1 1 机器学习基本概念 1.1 机器学习基本任务机器学习的基本任务：寻找一个函数不同种类的函数： Regression（回归）：函数输出一个标量如：预测PM2.5 Clssificatiion（分类）：给定选项，函数输出选项如：Alpha Go下棋 Structured Learning：创造一些结构（图片，文件） 1.2 通过训练数据定义Loss Loss 也是一个函数，它的输入是Model中的parameters： $$ L(b,w) $$ Loss function：$L=\frac{1}{N}\sum_ne_n$ Mean Absolute Error(MAE)：$e=|y-\hat{y}|$ Mean Square Error(MSE)：$e=(y-\hat{y})^2$ 1.3 Optimization 目标：得到最优的参数。 $$ w^{*}, b^{*}=\arg \min _{w, b} L $$ 方式：Gradient Descent 一个参数w的情况随机选取初始值$w^0$ 计算$\left.\frac{\partial L}{\partial \mathcal{\imath}}\right|_{w=w^{0}}$ learning rate：$\eta$，表示梯度下降的速率不断更新w：$w^{1} \leftarrow w^{0}-\left.\eta \frac{\partial L}{\partial w}\right|_{w=w^{0}}$ 两个参数的情况： 2 深度学习基本概念 2.

October 15, 2021 Read

AndrewNG-CV基础

AndrewNG-CV 基础 1 The Basics of Convolutional Neural Networks 1.1 Edge detection Use filter to do the convolution operation One Example Convolution function in tensorflow: tf.nn.conv2d Other Examples 1->-1: light->dark -1->1: dark->light Furthermore, treat the 9 numbers as parameters, and use backward propagation to improve them. 1.2 Padding(填充) To preserve the information on the edges and corners. Valid convolutions: No padding $n\times n$ * $f\times f$ ——> $(n-f+1)\times(n-f+1)$ Same convolutions: Pad so that output size is the same as the input size.

August 5, 2021 Read

AndrewNG-DL基础

AndrewNG-Deep Learning 基础 1 Logistic Regression Model 1.1 Binary Classification To learn a classifier that can input an image represented by the feature vector x, and predict the corresponding label y. Notation——n training examples: ($ n_x $为向量维数，$X$为$ n_x\times m $矩阵) $$ (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}), …, (x^{(m)},y^{(m)}),x\in R^{n_x}, y\in {0,1} $$ $$ X=[x^{(1)},x^{(2)},…,x^{(m)}], X\in R^{n\times m} $$ $$ Y=[y^{(1)},y^{(2)},…,y^{(m)}], Y\in R^{1\times m} $$ 1.2 Logistic Regression An algorithm for binary classification problems.

August 5, 2021 Read

AndrewNG-DL应用

AndrewNG-DL 应用 1 Setting up ML application 1.1 Train/dev/test set Training set Validation/Development set: used for selecting model. Test set: used for assessment of the generalization error of the final chosen model. In previous era, we with limited data, we use 60/20/20 for tain/dev/test. In Big data era, we use 99% of data as training set. Make sure dev and test set come from same distribution.

August 5, 2021 Read

AndrewNG-GAN基础

AndrewNG-GAN Course 1 —— Build Basic GANs 1.1 Introduction Generative Models: Variational Autoencoders(VAE): GANS: GAN in Real Life GAN的创始人：Ian Goodfellow GAN的应用领域： Image Generation, Deep fake Text Generation Data Augmentaion Image Filters 1.2 Basic Components Discriminator Use Neural Networks, input: features(image), output: probability 0.85这个概率也会交给Generator Input features e.g.: RGB pixel values for images Generator Use Neural Networks, input: class+noise vector, output: features(image)

August 5, 2021 Read