cs224 HW2

本次先写代码部分。跳过1，2章节。看兴趣补充吧。

本次code part是要写一个神经转移式依存句法分析。

基础概念：依存分析器通过栈、缓冲区和依存列表维护部分分析状态，支持 SHIFT（缓冲区首词入栈）、LEFT-ARC（栈次顶词依存于栈顶词）、RIGHT-ARC（栈顶词依存于次顶词）三种转移操作。
具体子任务

栈：当前正在处理的单词。

缓冲区：待处理的单词缓冲区。

依存列表：解析器预测的依赖项列表。

起初，栈中只包含ROOT，依赖列表为空，缓冲区按顺序包含句子的所有单词。在每一步中，解析器都会对部分解析结果应用一个转换，直到缓冲区为空且栈的大小为1。

• SHIFT：从缓冲区中移除第一个单词，并将其压入栈中。

• LEFT-ARC：将栈上的第二个（倒数第二个添加的）项目标记为第一个项目的依存项，并从栈中移除第二个项目，同时将“第一个词→第二个词”的依存关系添加到依存列表中。

• RIGHT-ARC：将栈上的第一个（最近添加的）项目标记为第二个项目的从属项，并从栈中移除第一个项目，同时将第二个词￫第一个词的依存关系添加到依存关系列表中。

For example

一个包含n个单词的句子需要2n步，前n步进行shift，n-2n步骤进行SHIFT,LEFT-ARC.

code1 initilize Partial Parse

为每个句子创建一个PartialParse对象。

函数parse_step 接收模型预测的字符串"S""LA""RA"，我们需要维护对应的栈，缓冲区，依存关系。

minibatch_parse 函数：

这里最讨厌的是没有给模型预测的接口，也不知道模型预测输出是什么。

class PartialParse(object):
    def __init__(self, sentence):
        """Initializes this partial parse.

        @param sentence (list of str): The sentence to be parsed as a list of words.
                                        Your code should not modify the sentence.
        """
        # The sentence being parsed is kept for bookkeeping purposes. Do NOT alter it in your code.
        self.sentence : list[str] = sentence
        self.stack : list[str] = ["ROOT"]
        self.buffer : list[str] = list(sentence)    
        self.dependencies : list[tuple[str, str]] = []

 

    def parse_step(self, transition):
        """Performs a single parse step by applying the given transition to this partial parse

        @param transition (str): A string that equals "S", "LA", or "RA" representing the shift,
                                left-arc, and right-arc transitions. You can assume the provided
                                transition is a legal transition.
        """
        ### YOUR CODE HERE (~7-12 Lines)
        ### TODO:
        ###     Implement a single parsing step, i.e. the logic for the following as
        ###     described in the pdf handout:
        ###         1. Shift
        ###         2. Left Arc
        ###         3. Right Arc
        if transition == "S":
            self.stack.append(self.buffer.pop(0)) # shift the first word in the buffer to the stack
        elif transition == "LA":
            if len(self.stack) < 2:
                raise ValueError("Left arc transition requires at least 2 items on the stack")
            self.dependencies.append((self.stack[-1], self.stack[-2])) 
            self.stack.pop(-2)  #remove 
        elif transition == "RA":
            if len(self.stack) < 2:
                raise ValueError("Right arc transition requires at least 2 items on the stack")
            self.dependencies.append((self.stack[-2], self.stack[-1]))
            self.stack.pop(-1)

        ### END YOUR CODE

    def parse(self, transitions):
        """Applies the provided transitions to this PartialParse

        @param transitions (list of str): The list of transitions in the order they should be applied

        @return dependencies (list of string tuples): The list of dependencies produced when
                                                        parsing the sentence. Represented as a list of
                                                        tuples where each tuple is of the form (head, dependent).
        """
        for transition in transitions:
            self.parse_step(transition)
        return self.dependencies


def minibatch_parse(sentences, model, batch_size):
    """Parses a list of sentences in minibatches using a model.

    @param sentences (list of list of str): A list of sentences to be parsed
                                            (each sentence is a list of words and each word is of type string)
    @param model (ParserModel): The model that makes parsing decisions. It is assumed to have a function
                                model.predict(partial_parses) that takes in a list of PartialParses as input and
                                returns a list of transitions predicted for each parse. That is, after calling
                                    transitions = model.predict(partial_parses)
                                transitions[i] will be the next transition to apply to partial_parses[i].
    @param batch_size (int): The number of PartialParses to include in each minibatch


    @return dependencies (list of dependency lists): A list where each element is the dependencies
                                                    list for a parsed sentence. Ordering should be the
                                                    same as in sentences (i.e., dependencies[i] should
                                                    contain the parse for sentences[i]).
    """
    dependencies = []

    ### YOUR CODE HERE (~8-10 Lines)
    ### TODO:
    ###     Implement the minibatch parse algorithm.  Note that the pseudocode for this algorithm is given in the pdf handout.
    ###
    ###     Note: A shallow copy (as denoted in the PDF) can be made with the "=" sign in python, e.g.
    ###                 unfinished_parses = partial_parses[:].
    ###             Here `unfinished_parses` is a shallow copy of `partial_parses`.
    ###             In Python, a shallow copied list like `unfinished_parses` does not contain new instances
    ###             of the object stored in `partial_parses`. Rather both lists refer to the same objects.
    ###             In our case, `partial_parses` contains a list of partial parses. `unfinished_parses`
    ###             contains references to the same objects. Thus, you should NOT use the `del` operator
    ###             to remove objects from the `unfinished_parses` list. This will free the underlying memory that
    ###             is being accessed by `partial_parses` and may cause your code to crash.

    # 初始化部分解析列表：为每个句子创建一个PartialParse对象
    # Initialize partial parses as a list of PartialParses, one for each sentence in sentences
    partial_parses = [PartialParse(sentence) for sentence in sentences]  # 使用列表推导式，为sentences中的每个句子创建一个PartialParse实例
    # 初始化未完成的解析列表：作为partial_parses的浅拷贝
    # Initialize unfinished parses as a shallow copy of partial parses
    unfinished_parses = partial_parses[:]  # 使用切片操作创建浅拷贝，两个列表引用相同的PartialParse对象
    
    # 当未完成的解析列表不为空时，继续循环处理
    # while unfinished parses is not empty do
    while unfinished_parses:  # 只要unfinished_parses列表不为空，就继续执行循环
        # 从未完成的解析列表中取出前batch_size个解析作为一个小批次
        # Take the first batch size parses in unfinished parses as a minibatch
        mini_batch_partial_parses = unfinished_parses[:batch_size]  # 使用切片获取前batch_size个元素作为小批次
        
        # 使用模型为小批次中的每个部分解析预测下一个转移操作
        # Use the model to predict the next transition for each partial parse in the minibatch
        transitions = model.predict(mini_batch_partial_parses)  # 调用模型的predict方法，返回每个部分解析对应的下一个转移操作列表
        
        # 对小批次中的每个部分解析执行其预测的转移操作
        # Perform a parse step on each partial parse in the minibatch with its predicted transition
        for partial_parse, transition in zip(mini_batch_partial_parses, transitions):  # 使用zip将部分解析和对应的转移操作配对
            partial_parse.parse_step(transition)  # 对每个部分解析执行一步解析操作（SHIFT、LEFT-ARC或RIGHT-ARC）
        # 从未完成的解析列表中移除已完成的解析（缓冲区为空且栈大小为1）
        # Remove the completed (empty buffer and stack of size 1) parses from unfinished parses
        new_unfinished_parses = []  # 创建一个新的空列表用于存储未完成的解析
        for pp in unfinished_parses:  # 遍历当前未完成的解析列表中的每个部分解析
            if len(pp.buffer) > 0 or len(pp.stack) > 1:  # 如果缓冲区不为空或栈大小大于1，说明解析尚未完成
                new_unfinished_parses.append(pp)  # 将该部分解析添加到新的未完成列表中
        unfinished_parses = new_unfinished_parses  # 更新未完成的解析列表，移除已完成的解析
    
    # 为每个句子返回依赖关系列表
    # Return dependencies for each sentence
    for pp in partial_parses:  # 遍历所有部分解析（包括已完成的）
        dependencies.append(pp.dependencies)  # 将每个部分解析的依赖关系列表添加到结果列表中

    ### END YOUR CODE

    return dependencies

code2 model

我们现在要训练一个神经网络，使其能够在给定栈、缓冲区和依赖关系状态的情况下，预测下一步应该应用哪种转换。

首先，模型提取一个代表当前状态的特征向量。我们将使用原始神经依存句法分析论文中提出的特征集：《使用神经网络的快速准确依存句法分析器》。提取这些特征的函数已在utils/parser utils.py中为你实现。该特征向量由一系列标记组成（例如，栈中的最后一个词、缓冲区中的第一个词、栈中倒数第二个词的依存词——如果有的话，等等）。它们可以表示为一个整数列表w=[w₁, w₂, ..., w_m]，其中m是特征的数量，每个0 ≤w_i<|V|是词汇表中某个标记的索引（|V|是词汇表大小）。然后，我们的网络会为每个词查找一个嵌入，并将它们连接成一个单一的输入向量：

x = [E_{w1} , ..., E_{wm} ] ∈ R^{dm}

即对单词进行token化。
模型结构是：

h = ReLU(xW + b1)\\l = hU + b_2 \\ \hat{y} = softmax(l)

class ParserModel(nn.Module):
    """ Feedforward neural network with an embedding layer and two hidden layers.
    The ParserModel will predict which transition should be applied to a
    given partial parse configuration.

    PyTorch Notes:
        - Note that "ParserModel" is a subclass of the "nn.Module" class. In PyTorch all neural networks
            are a subclass of this "nn.Module".
        - The "__init__" method is where you define all the layers and parameters
            (embedding layers, linear layers, dropout layers, etc.).
        - "__init__" gets automatically called when you create a new instance of your class, e.g.
            when you write "m = ParserModel()".
        - Other methods of ParserModel can access variables that have "self." prefix. Thus,
            you should add the "self." prefix layers, values, etc. that you want to utilize
            in other ParserModel methods.
        - For further documentation on "nn.Module" please see https://pytorch.org/docs/stable/nn.html.
    """
    def __init__(self, embeddings, n_features=36,
        hidden_size=200, n_classes=3, dropout_prob=0.5):
        """ Initialize the parser model.

        @param embeddings (ndarray): word embeddings (num_words, embedding_size)
        @param n_features (int): number of input features
        @param hidden_size (int): number of hidden units
        @param n_classes (int): number of output classes
        @param dropout_prob (float): dropout probability
        """
        super(ParserModel, self).__init__()
        self.n_features = n_features
        self.n_classes = n_classes
        self.dropout_prob = dropout_prob
        self.embed_size = embeddings.shape[1]
        self.hidden_size = hidden_size
        self.embeddings = nn.Parameter(torch.tensor(embeddings))

        ### YOUR CODE HERE (~9-10 Lines)
        ### TODO:
        ###     1) Declare `self.embed_to_hidden_weight` and `self.embed_to_hidden_bias` as `nn.Parameter`.
        ###        Initialize weight with the `nn.init.xavier_uniform_` function and bias with `nn.init.uniform_`
        ###        with default parameters.
        ###     2) Construct `self.dropout` layer.
        ###     3) Declare `self.hidden_to_logits_weight` and `self.hidden_to_logits_bias` as `nn.Parameter`.
        ###        Initialize weight with the `nn.init.xavier_uniform_` function and bias with `nn.init.uniform_`
        ###        with default parameters.
        ###
        ### Note: Trainable variables are declared as `nn.Parameter` which is a commonly used API
        ###       to include a tensor into a computational graph to support updating w.r.t its gradient.
        ###       Here, we use Xavier Uniform Initialization for our Weight initialization.
        ###       It has been shown empirically, that this provides better initial weights
        ###       for training networks than random uniform initialization.
        ###       For more details checkout this great blogpost:
        ###             http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization
        ###
        ### Please see the following docs for support:
        ###     nn.Parameter: https://pytorch.org/docs/stable/nn.html#parameters
        ###     Initialization: https://pytorch.org/docs/stable/nn.init.html
        ###     Dropout: https://pytorch.org/docs/stable/nn.html#dropout-layers
        ### 
        ### See the PDF for hints.

        #! Step1
        #! 构建一个神经网络，首先使用nn.Parameter声明一个权重矩阵和偏置向量，
        #! 然后使用nn.init.xavier_uniform_初始化权重矩阵，使用nn.init.uniform_初始化偏置向量
        #? 怎么确定矩阵的行列：1.作用：xW+b 2.输入维度 × 输出维度： 输入是n_features * self.embed_size，输出是hidden_size 
        self.embed_to_hidden_weight=nn.Parameter(torch.empty(n_features * self.embed_size, hidden_size))
        self.embed_to_hidden_bias=nn.Parameter(torch.empty(hidden_size))

        #! 初始化权重矩阵和偏置向量
        nn.init.xavier_uniform_(self.embed_to_hidden_weight)
        nn.init.uniform_(self.embed_to_hidden_bias)

        #! Step2
        self.dropout=nn.Dropout(p=dropout_prob)

        #! Step3
        #? 为什么第二层输出是n_classes：因为要预测3种转移操作，所以输出是3
        self.hidden_to_logits_weight=nn.Parameter(torch.empty(hidden_size,n_classes))
        self.hidden_to_logits_bias=nn.Parameter(torch.empty(n_classes))

        #! 初始化权重矩阵和偏置向量
        nn.init.xavier_uniform_(self.hidden_to_logits_weight)
        nn.init.uniform_(self.hidden_to_logits_bias)


        ### END YOUR CODE

    def embedding_lookup(self, w):
        """ Utilize `w` to select embeddings from embedding matrix `self.embeddings`
            @param w (Tensor): input tensor of word indices (batch_size, n_features)

            @return x (Tensor): tensor of embeddings for words represented in w
                                (batch_size, n_features * embed_size)
        """

        ### YOUR CODE HERE (~1-4 Lines)
        ### TODO:
        ###     1) For each index `i` in `w`, select `i`th vector from self.embeddings
        ###     2) Reshape the tensor using `view` function if necessary
        ###
        ### Note: All embedding vectors are stacked and stored as a matrix. The model receives
        ###       a list of indices representing a sequence of words, then it calls this lookup
        ###       function to map indices to sequence of embeddings.
        ###
        ###       This problem aims to test your understanding of embedding lookup,
        ###       so DO NOT use any high level API like nn.Embedding
        ###       (we are asking you to implement that!). Pay attention to tensor shapes
        ###       and reshape if necessary. Make sure you know each tensor's shape before you run the code!
        ###
        ### Pytorch has some useful APIs for you, and you can use either one
        ### in this problem (except nn.Embedding). These docs might be helpful:
        ###     Index select: https://pytorch.org/docs/stable/torch.html#torch.index_select
        ###     Gather: https://pytorch.org/docs/stable/torch.html#torch.gather
        ###     View: https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view
        ###     Flatten: https://pytorch.org/docs/stable/generated/torch.flatten.html
        x = None
        #! w.view(-1)：将索引w展平成一维
        #! 依据扩展后的索引从self.embeddings的第0维（行）中选择索引对应的行。
        #! (batch_size,n_features)-->(batch_size*n_features,embed_size)
        x=torch.index_select(self.embeddings, 0, w.view(-1))

        #! (batch_size*n_features,embed_size)-->(batch_size,n_features*embed_size)
        #!  view 函数：改变张量的形状，第0维：batch_size（保持批次维度）
        #! -1 = (总元素数) / w.shape[0]  n_features * embed_size
        x=x.view(w.shape[0], -1)

        ### END YOUR CODE
        return x


    def forward(self, w):
        """ Run the model forward.

            Note that we will not apply the softmax function here because it is included in the loss function nn.CrossEntropyLoss

            PyTorch Notes:
                - Every nn.Module object (PyTorch model) has a `forward` function.
                - When you apply your nn.Module to an input tensor `w` this function is applied to the tensor.
                    For example, if you created an instance of your ParserModel and applied it to some `w` as follows,
                    the `forward` function would called on `w` and the result would be stored in the `output` variable:
                        model = ParserModel()
                        output = model(w) # this calls the forward function
                - For more details checkout: https://pytorch.org/docs/stable/nn.html#torch.nn.Module.forward

        @param w (Tensor): input tensor of tokens (batch_size, n_features)

        @return logits (Tensor): tensor of predictions (output after applying the layers of the network)
                                 without applying softmax (batch_size, n_classes)
        """
        ### YOUR CODE HERE (~3-5 lines)
        ### TODO:
        ###     Complete the forward computation as described in write-up. In addition, include a dropout layer
        ###     as decleared in `__init__` after ReLU function.
        ###
        ### Note: We do not apply the softmax to the logits here, because
        ### the loss function (torch.nn.CrossEntropyLoss) applies it more efficiently.
        ###
        ### Please see the following docs for support:
        ###     Matrix product: https://pytorch.org/docs/stable/torch.html#torch.matmul
        ###     ReLU: https://pytorch.org/docs/stable/nn.html?highlight=relu#torch.nn.functional.relu
        logits = None

        #! Step1
        x=self.embedding_lookup(w)
        #! Step2
        x=torch.matmul(x, self.embed_to_hidden_weight) + self.embed_to_hidden_bias
        #! Step3
        x=F.relu(x)
        #! Step4
        x=self.dropout(x)
        #! Step5
        x=torch.matmul(x, self.hidden_to_logits_weight) + self.hidden_to_logits_bias
        #! Step6
        logits=x
        ### END YOUR CODE
        return logits

code1 initilize Partial Parse

code2 model

cs224 note3

Comments NOTHING

取消回复