Hypothesis

127 Matching Annotations

Oct 2025
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

2
1. WangXiaobu 18 Oct 2025
  
  in Public
  
  在该教程中，BERT模型预测的开始位置和结束位置，均是针对输入的“问题-文本”拼接序列中的文本部分（即回答来源文本） 而言，目标是定位该文本中能够回答问题的片段的起始与终止边界。以下是具体拆解说明：
  
  1. 输入结构：“问题-文本”的拼接规则
  
  SQuAD任务的核心是“给定问题和一段包含答案的文本，从文本中提取答案片段”，因此模型输入需先将“问题”和“文本”按固定格式拼接，具体规则在教程的construct_input_ref_pair函数中定义： - 拼接顺序：[CLS] + 问题 tokens + [SEP] + 文本 tokens + [SEP] - [CLS]：BERT的特殊起始token，用于整体序列表示； - [SEP]：特殊分隔token，第一个[SEP]分隔“问题”和“文本”，第二个[SEP]标记整个序列的结束； - 示例（教程中的输入）： - 问题：What is important to us? - 文本：It is important to us to include, empower and support humans of all kinds. - 拼接后完整序列（含token索引）： [CLS](0) what(1) is(2) important(3) to(4) us(5) ?(6) [SEP](7) # 问题部分（0-7） it(8) is(9) important(10) to(11) us(12) to(13) include(14) ,(15) em(16) ##power(17) and(18) support(19) humans(20) of(21) all(22) kinds(23) .(24) [SEP](25) # 文本部分（8-25）
  
  2. 预测目标：定位“文本部分”中的答案片段
  
  模型预测的“开始位置”和“结束位置”，是答案片段在上述完整拼接序列中的token索引，但这些索引必然落在“文本部分”（即第一个[SEP]之后、第二个[SEP]之前的区域，教程示例中为索引8-24），原因如下： - SQuAD任务的定义决定：答案只能从“文本”中提取，而非“问题”； - 教程中的验证： - 真实答案（ground truth）：to include, empower and support humans of all kinds，对应文本部分的token索引13（to）-23（kinds）； - 模型预测结果：to include , em ##power and support humans of all kinds，对应索引13-23，与真实答案的位置完全匹配（见教程中print('Predicted Answer: ...')的输出）； - 归因分析佐证：教程中“结束位置预测”的归因结果显示，kinds（索引23，文本部分的关键token）的归因分数最高，进一步说明预测目标是“文本部分的答案边界”。
  
  3. 关键辅助机制：token_type_ids区分“问题”与“文本”
  
  为避免模型混淆“问题”和“文本”，教程通过construct_input_ref_token_type_pair函数生成token_type_ids（序列类型标识），明确划分两部分： - token_type_ids=0：对应“问题部分”（从[CLS]到第一个[SEP]，示例中索引0-7）； - token_type_ids=1：对应“文本部分”（从第一个[SEP]到第二个[SEP]，示例中索引8-25）； - 模型在训练时会学习到“答案仅来自token_type_ids=1的区域”，因此预测的开始/结束位置会自动约束在该区域内。
  
  总结
  
  模型预测的“开始位置”和“结束位置”，是SQuAD任务中“答案片段”在“问题-文本拼接序列”中的token索引，且这些索引必然属于“文本部分”（即第一个[SEP]之后、第二个[SEP]之前的区域）——本质是定位“文本中能够回答问题的片段的起始和终止token”。
2. WangXiaobu 18 Oct 2025
  
  in Public
  
  Defining a custom forward function that will allow us to access the start and end postitions of our prediction using the position
  
  本项目任务：预测 start and end positions
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/Bert_SQUAD_Interpret
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

1
1. WangXiaobu 18 Oct 2025
  
  in Public
  
  Note that here we do not have information about different heads. Heads related information will be examined separately when we visualize the attribution scores of the attention matrices with respect to the start or end position predictions.
  
  下面的图中没有关于不同的Head的信
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/Bert_SQUAD_Interpret2
Sep 2025
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

1
1. WangXiaobu 28 Sep 2025
  
  in Public
  
  shape -> layer x batch x head x seq_len x seq_len
  
  注意形状
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/Bert_SQUAD_Interpret2
Jul 2025
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

1
1. WangXiaobu 10 Jul 2025
 
 in Public
 
 attr = LayerIntegratedGradients(vqa_resnet, [vqa_resnet.module.input_maps["v"], vqa_resnet.module.module.text.embedding])
 
 The use of .module.module in your code suggests that vqa_resnet is wrapped inside a module container (likely using torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel), which is a common practice when working with multi-GPU setups in PyTorch. Let me break this down more clearly:
 
 .module in PyTorch:
 
 When you use torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel, PyTorch wraps the original model (vqa_resnet in this case) inside a container. The container has a .module attribute that points to the actual model.
 
 For example:
 
 python model = torch.nn.DataParallel(vqa_resnet) # or torch.nn.parallel.DistributedDataParallel(vqa_resnet)
 
 This means that:
 
 vqa_resnet is now inside a DataParallel (or DistributedDataParallel) container.
 
 To access the original vqa_resnet model, you need to use .module.
 
 .module.module:
 
 Now, based on the code you provided:
 
 python vqa_resnet.module.module.text.embedding
 
 It suggests that the vqa_resnet model has been wrapped twice in a container (perhaps a custom wrapper inside your codebase). This would mean:
 
 The first .module accesses the model wrapped by DataParallel or DistributedDataParallel. 这里是 captum 的 ModelInputWrapper 
 
 The second .module accesses another level of encapsulation or custom module (like another wrapper or submodule) around vqa_resnet.
 
 这里确实有两层 wrapper，第一个是ModelInputWrapper(vqa_resnet)，第二个是 torch.nn.DataParallel(vqa_resnet)
 
 查阅 pytorch-vqa 源码知道，text.embedding self.text 是一个 TextProcessor 类的实例，而这个 embedding 是一个 PyTorch 的 nn.Embedding 层，用于将输入的单词索引序列（问题的 token id）映射成词向量（embedding）
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/Multimodal_VQA_Interpret
Jun 2025
www.sciencedirect.com www.sciencedirect.com

Guide for authors - Journal of Business Research - ISSN 0148-2963 | ScienceDirect.com by Elsevier

2
1. WangXiaobu 10 Jun 2025
  
  in Public
  
  footnote
  
  footnote 的样式没有规定
2. WangXiaobu 03 Jun 2025
  
  in Public
  
  Author contributions: CRediTCorresponding authors are required to acknowledge co-author contributions using CRediT (Contributor Roles Taxonomy) roles:
  
  这玩意写在哪里呢？
  
  每个作者做了哪些事情
Visit annotations in context

Annotators

WangXiaobu

URL

sciencedirect.com/journal/journal-of-business-research/publish/guide-for-authors
May 2025
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

6
1. WangXiaobu 31 May 2025
 
 in Public
 
 Defining a custom forward function that will allow us to access the start and end postitions of our prediction using the position input argument.
 
 重新定义前向传播函数
 
 返回的是我们归因的对象——最终的预测！
 
 def squad_pos_forward_func(input_ids, attention_mask, position=0): pred = model(input_ids, attention_mask) # 获取预测结果 pred = pred[position] # 当 position 为 0 时，取的是起始位置所在的分布，为 1 时，取的是结束位置所在的分布 return pred.max(1).values # 取分布的最大值，即预测结果
 
 ⚠️ 注意这个函数，这个就是自定义的 forwad_func 函数！
 
 predict 函数返回的是一个元组，分别起、止位置的预测logist → pred[0]，pred[1] 就是起始位置的分布，是一个矩阵
2. WangXiaobu 31 May 2025
 
 in Public
 
 attributions_start_sum
 
 tensor([[[-0.0000e+00, -0.0000e+00, 0.0000e+00, ..., 0.0000e+00, 0.0000e+00, -0.0000e+00], [ 7.9740e-03, 1.1938e-03, 1.4542e-03, ..., -1.6658e-03, -2.3311e-04, -1.6646e-03], [-6.5184e-04, 8.4541e-04, 9.1006e-03, ..., 4.9416e-03, 4.0402e-04, -7.4628e-05], ..., [ 1.9171e-02, 8.3873e-03, 2.4527e-02, ..., -1.3171e-03, 3.0566e-02, 1.0291e-02], [ 6.1036e-04, -3.1783e-04, 1.2646e-03, ..., 5.8634e-04, 2.5525e-03, -1.6722e-04], [-0.0000e+00, 0.0000e+00, 0.0000e+00, ..., -0.0000e+00, 0.0000e+00, -0.0000e+00]]], dtype=torch.float64) 以上是每个 token 对答案起始位置的贡献
 
 使用 summarize_attributions 函数即 attributions_start.sum(dim=-1)/attributions_start.sum() 进行归一化
3. WangXiaobu 31 May 2025
 
 in Public
 
 lig = LayerIntegratedGradients(squad_pos_forward_func, model.bert.embeddings)
 
``` from captum.attr import LayerIntegratedGradients

lig = LayerIntegratedGradients( squad_pos_forward_func, model.distilbert.embeddings) # 输入前向函数以及模型中的某一层

对于输出答案初始位置，词向量层的贡献计算

attributions_start, delta_start = lig.attribute(inputs=input_ids, baselines=input_base, additional_forward_args=( attention_mask, 0), return_convergence_delta=True)

对于输出答案结束位置，词向量层的贡献计算

attributions_end, delta_end = lig.attribute( ```

目的是获得所有 token 对结果的贡献，使用下面的 summarize_attributions 函数后进行归一化
4. WangXiaobu 31 May 2025
 
 in Public
 
 # storing couple samples in an array for visualization purposes
 
``` from captum.attr import visualization

print("各单词对于答案起始位置的影响：") start_position_vis = visualization.VisualizationDataRecord( attributions_start_sum, # 贡献 torch.max(torch.softmax(outputs["start_logits"][0], dim=0)), start_pred.item(), # 预测的起始位置 ground_truth_start_ind, # 真实起始位置 str(ground_truth_start_ind), # 真实起始位置的字符串形式 attributions_start_sum.sum(), # 所有贡献量的加和 all_tokens, # 输入的 token delta_start) # 计算误差 visualization.visualize_text([start_position_vis]) ```
5. WangXiaobu 31 May 2025
 
 in Public
 
 Also, let's define the ground truth for prediction's start and end positions.
 
 ground_truth = "New York's Roseland Ballroom" ground_truth_tokens = tokenizer.encode( ground_truth, add_special_tokens=False) # 对真实答案进行 encode ground_truth_end_ind = input_ids[0].detach().tolist().index( ground_truth_tokens[-1]) # 通过获取真实答案结束位置在所有输入中的位置得到结束位置 ground_truth_start_ind = ground_truth_end_ind - \ len(ground_truth_tokens) + 1 # 再获取答案初始位置
6. WangXiaobu 28 May 2025
 
 in Public
 
 model.zero_grad()
 
 在这段代码的上下文中，model.zero_grad() 的作用是将模型中所有参数的梯度清零。
 
 在深度学习中，在反向传播计算梯度时，梯度会不断累加。如果在进行下一次反向传播之前不清零梯度，新计算的梯度会与之前的梯度累加，导致结果出现偏差。
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/Bert_SQUAD_Interpret
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

1
1. WangXiaobu 31 May 2025
 
 in Public
 
 NOTE: The method implemented here is very computationally intensive, and should only be used with a very small number of features (e.g. < 7). This implementation simply extends ShapleyValueSampling and evaluates all permutations, leading to a total of n * n! evaluations for n features. Shapley values can alternatively be computed with only 2^n evaluations, and we plan to add this approach in the future.
 
 这方法反而是 ShapleyValueSampling 的拓展
 
 目前别用这个方法呗，超过7个特征都不能用 ShapleyValue
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/api/shapley_value_sampling.html
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

1
1. WangXiaobu 31 May 2025
  
  in Public
  
  model
  
  这里传入 model ，实际上也是传入的model callable 的forward函数
  
  可以自定义 / 修改 forward 函数
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/IMDB_TorchText_Interpret
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

15
1. WangXiaobu 28 May 2025
 
 in Public
 
 import resnet # from pytorch-resnet
 
 ⚠️ 这不是官方的 resnet，而是来自于 https://github.com/Cyanogenoid/pytorch-resnet 的 restnet.py
 
 类似地，下面的 Net, apply_attention, tile_2d_over_nd 也不是官方的代码，而是来自于 https://github.com/Cyanogenoid/pytorch-resnet
2. WangXiaobu 28 May 2025
 
 in Public
 
 You can access the associated layer for input named "foo" via the ModuleDict: wrapped_model.input_maps["foo"].
 
 在这个例子中是 vqa_resnet.module.input_maps["v"]
3. WangXiaobu 28 May 2025
 
 in Public
 
 # saved vqa model's parameters
 
 从这里开始就是用原来 vqa_model 的权重加载到使用了其部分结构的 vqa_resnet 里面
4. WangXiaobu 28 May 2025
 
 in Public
 
 vqa_net.module.text.embedding.num_embeddings
 
 vqa_net.module.text.embedding.num_embeddings 大概率是指在 vqa_net 模型中，文本嵌入（text.embedding）部分所涉及的嵌入数量（num_embeddings）。这通常用于定义词表大小或类别数量等，比如在词嵌入层中，num_embeddings 可能表示词汇表中不同单词的总数，模型会根据这个数量来为每个单词创建对应的嵌入向量。
5. WangXiaobu 28 May 2025
 
 in Public
 
 And wrap the model with ModelInputWrapper.
 
 这是上面 option 2 需要的
6. WangXiaobu 28 May 2025
 
 in Public
 
 ModelInputWrapper
 
 configure_interpretable_embedding_layer 这是 Captum 提供的一个工具函数，用来“打补丁”替换模型原有的嵌入层（nn.Embedding）为一个“可解释嵌入层”（InterpretableEmbeddingLayer）。它会在原来嵌入层前后各插入一个恒等映射：输入索引先通过原始嵌入层得到向量，再经恒等层直接输出；在反向传播时，Captum 能根据这个恒等层反推出各个输入索引对应的归因分数。使用完之后，要调用 remove_interpretable_embedding_layer 将原始嵌入层还原回去。
 
 ModelInputWrapper 当你的模型有多个输入（例如图像张量 v、问题索引张量 q、长度 q_len）或者嵌入层前包含了预处理逻辑时，直接对输入索引做层归因（Layer-IG）会比较麻烦。ModelInputWrapper 会把模型包一层，将你传给它的每个“原始输入”映射到模型内部对应的子模块（通过 wrapped_model.input_maps），这样你就可以像对待普通层一样对这些“输入层”使用 LayerIntegratedGradients，而不用手动拆解输入和预处理过程。
 
 LayerIntegratedGradients 这是 Captum 针对“中间层”做积分梯度的算法变体。普通的 IntegratedGradients 只能直接对模型最终输出相对于原始输入（如像素或索引）做归因；而 LayerIntegratedGradients 则允许你选定模型中的任意一层（包括刚才通过 configure_interpretable_embedding_layer 或 ModelInputWrapper 暴露出来的嵌入层），计算该层输出特征对最终预测的归因分数。内部原理与 IG 类似，只是积分路径在“层输入→层输出”空间里进行。
7. WangXiaobu 28 May 2025
 
 in Public
 
 In order to explain text features, we must let integrated gradients attribute on the embeddings, not the indices. The reason for this is simply due to Integrated Gradients being a gradient-based attribution method, as we are unable to compute gradients with respect to integers.
 
 这是这个 VQA 模型特有的，存在 index 和向量之间转换
8. WangXiaobu 28 May 2025
 
 in Public
 
 (v.norm(p=2, dim=1, keepdim=True).expand_as(v) + 1e-8)
 
 这一段代码是为了对从 ResNet 提取出来的空间特征图做逐通道的 L₂ 归一化，使得每个空间位置上的通道向量长度为 1，从而稳定后续注意力计算。我们分步来看：
 
 python v.norm(p=2, dim=1, keepdim=True)
 
 v 的形状是 (batch_size, C, H, W)，其中 C=2048。
 
 v.norm(p=2, dim=1) 计算了沿着“通道”维度的 L₂ 范数，也就是对每个样本、每个空间位置 $(h,w)$ 上的 2048 维向量求平方和再开根号。
 
 keepdim=True 会让输出保留一个大小为 1 的通道维度，得到形状 (batch_size, 1, H, W)，这样后续张量运算更方便。
 
 python .expand_as(v)
 
 把 (batch_size, 1, H, W) 的范数张量沿通道维度复制 C 次，使之广播匹配 v 的形状 (batch_size, C, H, W)。
 
 相当于将每个空间位置的 L₂ 范数“复制”到该位置所有通道，以便逐元素相除。
 
 python + 1e-8
 
 在分母上加一个极小常数，防止某些位置的特征向量全为 0 时出现除以 0 的数值不稳定。
 
 合在一起，整行代码
 
 python v = v / (v.norm(p=2, dim=1, keepdim=True).expand_as(v) + 1e-8)
 
 的效果就是，对 v 中每个样本、每个空间位置的通道向量执行
 
 $$ \mathbf{v}{h,w} \;\longmapsto\; \frac{\mathbf{v}{h,w}}{\|\mathbf{v}_{h,w}\|_2 + 10^{-8}} $$
 
 令其成为单位向量（长度约为 1）。这样归一化后的特征在后续注意力和融合阶段更具可比性，也更易优化。
9. WangXiaobu 28 May 2025
 
 in Public
 
 class ResNetLayer4(torch.nn.Module):
 
 这里 ResNetLayer4 模块通过在 ResNet-152 的第四个大块（layer4）上注册一个 forward hook，实现“截断式”特征提取：
 
 初始化 ResNet
 
 self.r_model = resnet.resnet152(pretrained=True)：加载预训练的 ResNet-152
 
 self.r_model.eval()：切换到推理模式，禁用 dropout/BatchNorm 的训练行为
 
 self.r_model.to(device)：把模型搬到指定设备（CPU/GPU）上
 
 缓冲输出
 
 self.buffer = {} 用于存储钩子捕获的中间特征，以设备（output.device）为 key。
 
 用 threading.Lock() 保证多线程/多卡场景下写入 buffer 的线程安全。
 
 注册 Hook
 
 python def save_output(module, input, output): with lock: self.buffer[output.device] = output self.r_model.layer4.register_forward_hook(save_output)
 
 每当你对 r_model 调用一次 forward(x)，ResNet 在执行到 layer4 这一大层的前向输出时，就会触发 save_output，把这一层的输出张量存进 self.buffer。
 
 截断执行
 
 注意这段代码并没有手动抛出异常来提前终止整个网络的前向传播，而是简单地“跑完”了 ResNet 的所有层。
 
 如果想更早停止，可以在 hook 里抛一个自定义异常，然后在 forward 方法里捕获并忽略，以节省多余计算。
 
 forward(x)
 
 python self.r_model(x) return self.buffer[x.device]
 
 调用 self.r_model(x)，内部 hook 会把 layer4 的输出保存在 buffer
 
 然后直接从 buffer 里取出，不用关心后续的 layer5 或分类头，得到形状 (batch_size, 2048, h, w) 的特征图。
 
 self.buffer[x.device] 就是你想截取的 ResNet-152 的第四大层（layer4）对输入 x 做前向计算后产生的中间特征张量。具体来说：
 
 当你执行 self.r_model(x) 时，ResNet 会从第一层一路算到最后。在它走到 layer4（第四个残差大块）完成前向输出时，你注册的钩子函数 save_output 会被触发，把那一层的输出 output 存到 self.buffer[output.device]。
 
 output 的形状通常是 (batch_size, 2048, H', W')，其中
 
 batch_size 与输入 x 的第 0 维相同
 
 通道数 2048 来自 ResNet-152 在第四层的输出通道数
 
 H', W' 是经过前面多次下采样后的空间尺寸（如果输入图像是 224×224，则通常是 7×7）
 
 在 forward(x) 里，你调用 self.r_model(x) 完成一次完整的前向过程后，直接通过 return self.buffer[x.device] 拿到刚才存的那份中间特征。x.device 只是用来从可能的多个设备缓存里取出对应 GPU（或 CPU）上的结果。
 
 因此，最终返回的就是一个 Tensor，代表输入 x 在 ResNet-152 的 layer4 处的激活特征图，方便你后续在 VQA 模型中接着做归一化、注意力、融合、分类等操作。
10. WangXiaobu 28 May 2025
 
 in Public
 
 class VQA_Resnet_Model
 
 在 VQA_Resnet_Model 中： - 继承并初始化父类
 
```python super().__init__(embedding_tokens) ``` 这一行把 `embedding_tokens`（词表大小）交给父类 `Net` 去创建文本嵌入、注意力模块和分类头，子类无需重复这些逻辑。

ResNetLayer4 会在内部用 forward‐hook “截取”预训练 ResNet-152 的第 4 大 block 的输出。

forward

文本处理

python q = self.text(q, list(q_len.data))

把问题的词索引 q 和它们的长度 q_len 输入到父类定义的文本编码器，得到一个固定维度的语义向量 q。

图像特征提取 v = self.resnet_layer4(v) 得到每张图像经过 ResNet-152 前 4 个大块后的空间特征图，得到形状为 (batch_size, 2048, H′, W′) 的空间特征图。

L2 归一化

python v = v / (v.norm(p=2, dim=1, keepdim=True).expand_as(v) + 1e-8)

对每个通道特征向量做单位长度归一化，稳定后续注意力计算。

注意力机制 & 特征融合

python a = self.attention(v, q) v = apply_attention(v, a)

用文本向量 q 引导图像特征 v 上的空间注意力 a，再把注意力权重应用到 v 上，得到一个融合后的图像语义向量。

拼接 & 分类

python combined = torch.cat([v, q], dim=1) answer = self.classifier(combined)

把图像和文本的融合特征在通道维度上拼起来，送入父类的分类头 self.classifier（一系列全连接层＋激活）得到最终的答案 logits。

这样做的好处是“零改动”地复用了 torchvision 的预训练模型，仅通过 hook 快速拿到其中间层特征，而无需拷贝或重写 ResNet 模型定义。
11. WangXiaobu 28 May 2025
 
 in Public
 
 super().__init__(embedding_tokens)
 
 python super().__init__(embedding_tokens)
 
 这行代码出现在自定义模型类（如 VQA_Resnet_Model）的 __init__ 方法里时，它的作用是调用父类的构造函数，把 embedding_tokens 这个参数传下去。
 
 super()：拿到当前类的父类
 
 __init__(...)：执行父类的初始化逻辑（比如创建词嵌入层、设置一些默认属性等）
 
 这样写可以避免在子类里重复父类已经实现的初始化细节，只需要在子类构造中专注于新增的模块（比如把 ResNet 钩子模块 attach 上来）就行。
 
 super().__init__(...) 则是把参数和初始化责任交给父类，让子类继承并复用父类已有的构造逻辑。
12. WangXiaobu 28 May 2025
 
 in Public
 
 self.text
 
 这个就是父类 Net 里面的初始化的东西
13. WangXiaobu 28 May 2025
 
 in Public
 
 Net
 
 注意这个Net的来源
14. WangXiaobu 28 May 2025
 
 in Public
 
 saved_state
 
 这段代码主要是从之前保存的模型检查点中恢复词汇表，并把索引映射反向构造成便于查找的列表：
 
 saved_state = torch.load(..., map_location=device) 载入了一个字典结构的检查点，其中至少包含了模型权重和一个名为 vocab 的子字典。map_location=device 保证无论原来是在哪个设备上保存，都能映射到当前的 device（CPU 或 GPU）。
 
 vocab = saved_state['vocab'] 取出检查点里的词汇表。通常这个 vocab 会是一个包含多个子映射（比如对问题词、答案词、甚至对图像区域标签等）的字典。
 
 token_to_index = vocab['question'] 从 vocab 里读取 “问题（question）” 里面的词到索引映射。这里 token_to_index 是一个 {word: idx} 的字典，用于把输入问题的每个词转换成模型能处理的整数索引。
 
 answer_to_index = vocab['answer'] 同理，读取 “答案（answer）” 的词到索引映射。常见于 VQA（Visual Question Answering）等任务，模型最后要预测的是这个映射集合里的某一个或若干个答案类别。
 
 num_tokens = len(token_to_index) + 1 计算问题词表的大小并加 1，这里通常是为了给可能的 padding 或 unknown 预留一个额外索引位置。
 
 构造 answer_words 列表：
 
 python answer_words = ['unk'] * len(answer_to_index) for w, idx in answer_to_index.items(): answer_words[idx] = w
 
 先创建一个长度等于答案词表大小的列表、初始全部填 'unk'；
 
 再遍历 answer_to_index，把每个答案词 w 放到它对应的 idx 位置。这样，通过 answer_words[idx] 就能快速地把模型输出的类别索引映射回对应的自然语言答案。
 
 最终，你得到了：
 
 一个能把输入问题词转成索引的字典 token_to_index；
 
 一个能把输出预测索引转回字符串答案的列表 answer_words；
 
 以及问题词表的大小 num_tokens，可用于后续构建词嵌入层或定义嵌入矩阵维度。
15. WangXiaobu 28 May 2025
 
 in Public
 
 VQA models
 
 VQA 即 Visual Question Answering，是视觉问答模型
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/Multimodal_VQA_Interpret
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

1
1. WangXiaobu 28 May 2025
  
  in Public
  
  from dlrm_s_pytorch import DLRM_Net
  
  来自 https://github.com/facebookresearch/dlrm 的预训练模型
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/DLRM_Tutorial
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

3
1. WangXiaobu 28 May 2025
 
 in Public
 
 The output is desktop_computer, another class in ImageNet whose images usuallly include a monitor and a keyboard. Not bad.
 
 非常有意思，现在的分类更加精准和准确了，desktop_computer 是最高概率了，而 monitor 降为 No.5 概率，不是最高概率
2. WangXiaobu 28 May 2025
 
 in Public
 
 [ 0 5 20 255]
 
 feature_mask 中只有四个独特的值——0,5,20,255
3. WangXiaobu 28 May 2025
 
 in Public
 
 # uses torchvision transforms to convert a PIL image to a tensor and normalize it
 
 这段 classify 函数和前面的变换管道配合起来，完成了从 PIL 图像到 ResNet 预测标签的完整流程：
 
 变换管道 img_to_resnet_input
 
 T.ToTensor()：把 PIL 图像转换成形状为 (C, H, W)、数值范围 [0,1][0,1] 的浮点张量
 
 T.Normalize(mean, std)：对每个通道做归一化，分别用 ImageNet 上常用的均值 [0.485,0.456,0.406][0.485,0.456,0.406] 和标准差 [0.229,0.224,0.225][0.229,0.224,0.225]
 
 classify(img, print_result=True)
 
 预处理并添加批量维度
 
 python input_tensor = img_to_resnet_input(img).unsqueeze(0) # 变成 (1, C, H, W)
 
 前向推理
 
 python output = resnet(input_tensor) # 未归一化的 logits output = F.softmax(output, dim=1) # 变成概率分布
 
 取 Top-1
 
 python prediction_score, pred_label_idx = torch.topk(output, 1) # shape 都是 (1,1)，后面用 squeeze_() 去掉多余维度变标量
 
 可选地打印结果
 
 从 ~/.torch/models/imagenet_class_index.json 读取从 idx 到标签的映射
 
 打印出 “Predicted: <标签名> id=<索引> with a score of: <概率>”
 
 返回
 
 python return pred_label_idx.item(), prediction_score.item()
 
 即：类别索引（0–999）和该类别的预测概率。
 
 这样，你只要传入一个 PIL Image 给 classify(img)，就能快速得到 ResNet 在 ImageNet 上预训练模型的 Top-1 预测结果。
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/Resnet_TorchVision_Ablation
captum.ai captum.ai

Captum · Model Interpretability for PyTorch

7
1. WangXiaobu 28 May 2025
  
  in Public
  
  LayerActivation, LayerIntegratedGradients
  
  这两个在本例子中其实没有用到
2. WangXiaobu 28 May 2025
  
  in Public
  
  It is interesting to observe that the attribution scores for the 10 neurons in the last layer are spread between half of the weights and all have positive attribution scores.
  
  the attribution scores 1. are spread between half of the ~~weigths~~ neurons 2. all (weights / neurons) have positive attribution scores
3. WangXiaobu 28 May 2025
  
  in Public
  
  10 neurons
  
  size_hidden3 = 10
4. WangXiaobu 28 May 2025
  
  in Public
  
  It is interesting to observe that the feature Population has high positive attribution score based on some of the attribution algorithms. This can be related, for example, to the choice of the baseline. In this tutorial we use zero-valued baselines for all features, however if we were to choose those values more carefully for each feature the picture will change.
  
  baselines 的选择很重要
5. WangXiaobu 28 May 2025
  
  in Public
  
  It is important to note the we aggregate the attributions across the entire test dataset in order to retain a global view of feature importance. This, however, is not ideal since the attributions can cancel out each other when we aggregate then across multiple samples.
  
  这里讲的“将所有样本的归因值累加”是为了把每个特征在整个测试集上的重要性做一个“全局”概览——把各样本贡献统一成一条曲线或一组柱状，用于比较不同算法或模型权重。但这样做有个隐忧：如果某个特征在一些样本上对模型输出有正向贡献（归因值为正），又在另一些样本上是负向贡献（归因值为负），那么正负值相加就可能互相抵消，导致全局总和很小甚至趋近于零。结果看上去好像这个特征不重要，实际上只是不同样本中它的作用方向不一致而已。
  
  通俗地说，就像把喝咖啡的提神效果（+1）和喝酒的醉意影响（–1）直接相加，结果是 0，给人一种“没效果”的错觉，但事实上两者在不同场景下是完全不同的。要避免这种“相互抵消”的问题，可以改为统计每个样本归因绝对值的平均（mean absolute attribution），或者画出归因值的分布（比如箱线图、直方图），这样既能反映全局重要性，也能保留正负方向的信息差异。
6. WangXiaobu 28 May 2025
  
  in Public
  
  # prepare attributions for visualization
  
  这段代码的核心目的是把多种归因方法（Integrated Gradients、SmoothGrad-增强的 IG、DeepLift、GradientSHAP、Feature Ablation）在测试集上对每个输入特征的“全局”重要性，以及模型第一层权重的绝对大小，都归一化后放到同一张柱状图里进行对比。
  
  数据准备
  
  X_test.shape[1]：特征总数，作为横轴坐标的索引。
  
  feature_names：每个特征的名字，用来做 x 轴刻度标签。
  
  对于每种归因方法，先 detach().numpy().sum(0) 将整个测试集上的 attribution 张量沿样本维度求和，得到一个形状为 (n_features,) 的累积向量，再除以它的 L₁ 范数（np.linalg.norm(..., ord=1)）做归一化，保证不同方法结果落在同一量纲下。
  
  lin_weight = model.lin1.weight[0]：取第一层第一个输出神经元对应的权重向量，同样做 L₁ 归一化，用来跟归因方法对比。
  
  柱状图布局
  
  width = 0.14：定义每组柱子的宽度。
  
  横坐标 x_axis_data 本来是 [0, 1, 2, …, n_features-1]，后面每个方法的柱子都沿 x 轴依次右移一个 width。
  
  IG 在 x
  
  IG+SmoothGrad 在 x + width
  
  DeepLift 在 x + 2·width
  
  …
  
  权重在 x + 5·width
  
  美化细节
  
  figsize=(20,10) 让整张图足够大，字体也都设置为 16 号。
  
  每种方法都给了不同的颜色和透明度（alpha），方便区分。
  
  ax.autoscale_view() 和 plt.tight_layout() 保证坐标轴自动调整、图例和标签不重叠。
  
  ax.set_xticks(x_axis_data + 0.5) 以及 ax.set_xticklabels(feature_names) 将每组柱子下方的标签居中显示。
  
  图例放在左下角（loc=3），列出了各条柱子的含义。
  
  执行后，你会得到一张 20×10 英寸的对比柱状图，每个特征横向一组柱子，展示了不同归因算法和模型权重对该特征的重要性打分。这样可以直观看出，不同算法在全局意义上对哪些特征给出了相似或截然不同的归因评价。
7. WangXiaobu 28 May 2025
  
  in Public
  
  pf = np.polyfit(x, y, 1)
  
  pf = np.polyfit(x, y, 1) 和 p = np.poly1d(pf) 对 (x, y)（x 为当前特征的所有样本值，y 为房价中位数数组）做一次（degree=1）多项式拟合，得到拟合系数 pf，再用 np.poly1d 封装成函数 p(x)，以便后面直接计算拟合直线上的 y 值。
  
  1 表示“拟合多项式的最高次数”，也就是一次多项式（直线）。
  
  poly1d 专门用于一维多项式（$p(x) = a_n x^n + \dots + a_1 x + a_0$）的封装和运算。
  
  如果你想处理双变量或更高维度的多项式拟合／评估，不是去找 poly2d，而是用 numpy.polynomial 里的相应函数（如 polyfit + polyval2d）或自己基于输出系数矩阵构造计算逻辑。
  
  ax.plot(x, y, 'o') 在子图上绘制散点——'o' 指用圆点表示。
  
  ax.plot(x, p(x), "r--") 在同一个子图上画出红色虚线 ("r--") 代表的拟合直线。
  
  ax.set_title(col + ' vs Median House Value')、ax.set_xlabel(col)、ax.set_ylabel('Median House Value') 分别设置每个子图的标题和坐标轴标签，横轴是当前特征名，纵轴都是“Median House Value”。
Visit annotations in context

Annotators

WangXiaobu

URL

captum.ai/tutorials/House_Prices_Regression_Interpret
www.sciencedirect.com www.sciencedirect.com

Guide for authors - Journal of Business Research - ISSN 0148-2963 | ScienceDirect.com by Elsevier

8
1. WangXiaobu 26 May 2025
  
  in Public
  
  Submit each image as a separate file using a logical naming convention for your files (for example, Figure_1, Figure_2 etc).
  
  如果不是文本图片，那么必须以单独的文件形式存放
  
  下面还有个 Figure captions 的要求
  
  All images must have a caption. A caption should consist of a brief title (not displayed on the figure itself) and a description of the image. We advise you to keep the amount of text in any image to a minimum, though any symbols and abbreviations used should be explained.
  
  Provide captions in a separate file.
2. WangXiaobu 19 May 2025
  
  in Public
  
  The submitted articles in JBR must not exceed 45 double-spaced pages, with 1 inch margins, and 12 pt fonts, not counting title and abstract pages. Tables and references should be typed on separate pages at the end. The title page should contain title, authors, and affiliations. An Abstract of 150 words or less and a list of four-six keywords should follow the title page. On page 3 of the manuscript repeat the title, but not the author's names, to permit anonymity during the reviewing process. Final accepted manuscripts typically should have less than 8000 words (all inclusive)."
  
  在 JBR 中提交的文章不得超过 45 页双倍行距，边距为 1 英寸，字体为 12 pt，不包括标题页和摘要页。表格和参考文献应在末尾的单独页面上键入。标题页应包含标题、作者和单位。标题页后面应有一个 150 字或更少的摘要和四到六个关键词的列表。在手稿的第 3 页，重复标题，但不重复作者姓名，以便在审稿过程中保持匿名。最终接受的稿件通常应少于 8000 字（包括全部）。
3. WangXiaobu 19 May 2025
  
  in Public
  
  Reference style
  
  我们用的the American Psychological Association.第7版
4. WangXiaobu 19 May 2025
  
  in Public
  
  Highlights
  
  鼓励提交的单独文件
5. WangXiaobu 19 May 2025
  
  in Public
  
  Title page
  
  包含个人信息的 Title page
  
  Article title 文章标题应简洁明了，避免使用不常见的缩写和公式；
  
  Author names 提供作者的名和姓，确保姓名拼写准确，必要时可添加母语姓名；
  
  Affiliations 详细列出作者所属机构地址，包括国家名称和作者邮箱；
  
  Corresponding author address 明确通讯作者，其信息在投稿和发表过程中需保持更新；若作者地址有变动，可使用脚注注明现地址或永久地址。
6. WangXiaobu 19 May 2025
  
  in Public
  
  Vitae
  
  提交附带照片、不超过100字的个人简历
  
  我们好奇这个的位置，参考Feifei 老师 AMJ的格式，这个放在Appendixes 后面
7. WangXiaobu 19 May 2025
  
  in Public
  
  Acknowledgements
  
  Acknowledgements 致谢写在 title page
8. WangXiaobu 19 May 2025
  
  in Public
  
  Article sections
  
  章节标题
  
  交叉引用需要使用编号格式，不要饮用完整文本
  
  标题单独成行
  
  摘要不要编号
Visit annotations in context

Annotators

WangXiaobu

URL

sciencedirect.com/journal/journal-of-business-research/publish/guide-for-authors
Aug 2024
platform.openai.com platform.openai.com

OpenAI Platform

1
1. WangXiaobu 04 Aug 2024
  
  in Public
  
  Both input and output tokens count toward these quantities. For example, if your API call used 10 tokens in the message input and you received 20 tokens in the message output, you would be billed for 30 tokens. Note however that for some models the price per token is different for tokens in the input vs. the output (see the pricing page for more information).
  
  输入输出都计价输入的价格和输出的价格不一样
Visit annotations in context

Annotators

WangXiaobu

URL

platform.openai.com/docs/advanced-usage/managing-tokens
zh.d2l.ai zh.d2l.ai

5.2. 参数管理 — 动手学深度学习 2.0.0 documentation

4
1. WangXiaobu 01 Aug 2024
  
  in Public
  
  Init weight torch.Size([8, 4]) Init weight torch.Size([1, 8])
  
  两个结果的原因是 net 有两个线性层
2. WangXiaobu 01 Aug 2024
  
  in Public
  
  *[(name, param.shape) for name, param in m.named_parameters()][0]
  
  [(name, param.shape) for name, param in m.named_parameters()][0] 首先作为一个整体，获取列表的第一个元素，即第一个 (name, param.shape) 元组
  
  然后对这个元祖进行解包！
3. WangXiaobu 01 Aug 2024
  
  in Public
  
  net
  
  net 是 net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
4. WangXiaobu 01 Aug 2024
  
  in Public
  
  if type(m) == nn.Linear
  
  nn.Linear 是一个全连接层（也称为线性层或密集层）。如果代码中有 if type(m) == nn.Linear: 这样的语句，它的意思是检查变量 m 是否是一个全连接层。
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_deep-learning-computation/parameters.html
zh.d2l.ai zh.d2l.ai

5.1. 层和块 — 动手学深度学习 2.0.0 documentation

2
1. WangXiaobu 01 Aug 2024
  
  in Public
  
  net(X)
  
  这是python的一个语法糖，net()实际上调用net.call()，而__call__()调用了forward
2. WangXiaobu 01 Aug 2024
  
  in Public
  
  F.relu
  
  nn.ReLU()是构造了一个ReLU对象，并不是函数调用，而F.ReLU()是函数调用
  
  这里也可以写成 return self.out(nn.ReLU()(self.hidden(X)) ，但是没有必要
  
  In the provided code, the ReLU layer is applied as a function within the forward method, using F.relu(self.hidden(X)). This means that the ReLU activation is not explicitly recorded as a separate layer in the model's structure. Instead, it is applied directly to the output of the hidden layer during the forward pass.
  
  If you want to explicitly include the ReLU layer in the model's structure, you can define it as a separate layer in the __init__ method and then use it in the forward method. Here's an example:
  
```python class MLP(nn.Module): def init(self): super().init() self.hidden = nn.Linear(20, 256) # 隐藏层 self.relu = nn.ReLU() # ReLU 层 self.out = nn.Linear(256, 10) # 输出层

def forward(self, X): return self.out(self.relu(self.hidden(X)))

```
  
  In this version, the ReLU layer is explicitly defined and included in the model's structure.
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_deep-learning-computation/model-construction.html
Jul 2024
zh.d2l.ai zh.d2l.ai

5.5. 读写文件 — 动手学深度学习 2.0.0 documentation

1
1. WangXiaobu 08 Jul 2024
  
  in Public
  
  clone.eval()
  
  模型开启评估模式，在这里其实这行代码可以删掉
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_deep-learning-computation/read-write.html
zh.d2l.ai zh.d2l.ai

5.4. 自定义层 — 动手学深度学习 2.0.0 documentation

1
1. WangXiaobu 08 Jul 2024
  
  in Public
  
  torch.matmul
  
  torch.matmul可以用在更多维度的矩阵运算，mm只能在两纬
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_deep-learning-computation/custom-layer.html
zh.d2l.ai zh.d2l.ai

5.3. 延后初始化 — 动手学深度学习 2.0.0 documentation

1
1. WangXiaobu 08 Jul 2024
  
  in Public
  
  pytorch 代码请参见 https://d2l.ai/chapter_builders-guide/lazy-init.html
  
  The following method passes in dummy inputs through the network for a dry run to infer all parameter shapes and subsequently initializes the parameters. It will be used later when default random initializations are not desired.
  
  python @d2l.add_to_class(d2l.Module) #@save def apply_init(self, inputs, init=None): self.forward(*inputs) if init is not None: self.net.apply(init)
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_deep-learning-computation/deferred-init.html
d2l.ai d2l.ai

6.4. Lazy Initialization — Dive into Deep Learning 1.0.0-alpha0 documentation

1
1. WangXiaobu 08 Jul 2024
  
  in Public
  
  torch.rand(2, 20)
  
  torch.rand(2, 20) 是 PyTorch 中的一个函数，用于生成一个填充了从均匀分布（在区间 [0, 1)）中随机抽取的数字的张量1。这里的 (2, 20) 是张量的形状，表示生成的张量有 2 行和 20 列。
Visit annotations in context

Annotators

WangXiaobu

URL

d2l.ai/chapter_builders-guide/lazy-init.html
zh.d2l.ai zh.d2l.ai

5.2. 参数管理 — 动手学深度学习 2.0.0 documentation

1
1. WangXiaobu 08 Jul 2024
  
  in Public
  
  m.weight.data.abs() >= 5
  
  返回一个元素为0或者1的向量，表示该位置的元素的绝对值是否大于5
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_deep-learning-computation/parameters.html
zh.d2l.ai zh.d2l.ai

5.1. 层和块 — 动手学深度学习 2.0.0 documentation

1
1. WangXiaobu 07 Jul 2024
  
  in Public
  
  torch.mm
  
  matrix multiplication
  
  The torch.mm function in PyTorch performs a matrix multiplication of two matrices¹. Here's a more detailed explanation:
  
  torch.mm(input, mat2, *, out=None) → Tensor¹
  
  It performs a matrix multiplication of the matrices input and mat2¹.
  
  If input is a (n × m) tensor and mat2 is a (m × p) tensor, out will be a (n × p) tensor¹.
  
  This function does not support broadcasting. For broadcasting matrix products, you should use torch.matmul()¹.
  
  It supports strided and sparse 2-D tensors as inputs, and autograd with respect to strided inputs¹.
  
  Here's an example of how to use it: python mat1 = torch.randn(2, 3) mat2 = torch.randn(3, 3) result = torch.mm(mat1, mat2) print(result) This will output a 2x3 tensor which is the result of the matrix multiplication of mat1 and mat2¹. Please note that the number of columns in the first matrix must be equal to the number of rows in the second matrix for the multiplication to be valid.
  
  It's important to note that torch.mm differs from other similar functions like torch.matmul and torch.mul. While torch.mm performs a matrix multiplication without broadcasting and expects two 2D tensors, torch.matmul performs a matrix product with broadcasting and can handle tensors with different shapes². On the other hand, torch.mul performs element-wise multiplication².
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_deep-learning-computation/model-construction.html
zh.d2l.ai zh.d2l.ai

4.10. 实战Kaggle比赛：预测房价 — 动手学深度学习 2.0.0 documentation

5
1. WangXiaobu 07 Jul 2024
  
  in Public
  
  相对误差
  
  $1-\frac{\hat y}{y}$ 要小，等价为 $\frac{y}{\hat y}$ 要小，然后取对数，除法变减法
2. WangXiaobu 07 Jul 2024
  
  in Public
  
  loss
  
  nn.MSEloss() 返回的是 batch_size 的（label-output）2
3. WangXiaobu 07 Jul 2024
  
  in Public
  
  dummy_na=True
  
  缺失值也作为一个类别
4. WangXiaobu 07 Jul 2024
  
  in Public
  
  all_features[numeric_features] = all_features[numeric_features].apply( lambda x: (x - x.mean()) / (x.std()))
  
  这里为了方便起见，在train set 和 test set 上整体标准化，我们在做的时候应该是在 train set 上标准化并获得其均值和方差，然后拿这个均值和方差来标准化 test set.
5. WangXiaobu 07 Jul 2024
  
  in Public
  
  exist_ok=True
  
  exist_ok=True is an optional argument. When set to True, the function does not raise an exception if the directory already exists. If exist_ok is False (the default), an exception is raised if the target directory already exists.
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_multilayer-perceptrons/kaggle-house-price.html
Jun 2024
zh.d2l.ai zh.d2l.ai

4.7. 前向传播、反向传播和计算图 — 动手学深度学习 2.0.0 documentation

2
1. WangXiaobu 22 Jun 2024
  
  in Public
  
  关于隐藏层输出的梯度∂J/∂h∈Rh由下式给
  
  这里的 $\partial o / \partial h$ 的结果是符合矩阵求导的，得到的也是 $W^{(2)\prime}$
  
  为什么是 ${\mathbf{W}^{(2)}}^\top \frac{\partial J}{\partial \mathbf{o}}$ 而不是 $\frac{\partial J}{\partial \mathbf{o}}{\mathbf{W}^{(2)}}^\top$ 这个主要是因为前面也提到，prod运算符是指执行必要的操作，也就是说会自动根据需要进行换位和交换输入位置等，然后再进行相乘。然而这里的 $\frac{\partial J}{\partial \mathbf{o}}$ 是h维向量，$W^{(1)}$ 是 h❌d 维，所以（转置后）必须写到前面
2. WangXiaobu 22 Jun 2024
  
  in Public
  
  链式法则得出
  
  下面这个求导法则在数学上是有点问题的
  
  $$ W^{(2)}h = \begin{pmatrix} w^{(2)\prime}{1}\ w^{(2)\prime}{2} \ \vdots& \ w^{(2)\prime}{q} \ \end{pmatrix} \begin{pmatrix}h{1} \ h_{2} \\vdots\ h_{h}\end{pmatrix} = \begin{pmatrix} w^{(2)\prime}{1} h\ w^{(2)\prime}{2} h\ \vdots& \ w^{(2)\prime}_{q} h\ \end{pmatrix} $$
  
  $w^{(2)}_{1}$ 为 $W^{(2)}$ 每个输出单元对应的仿射变换的权重 h维 列向量，上面转置以适应矩阵乘法 $o=W^{(2)}h$
  
  $\frac{\partial (W^{(2)}h)}{\partial W^{(2)}}$ 这种矩阵求导不存在的，实际上是做什么呢？其每个分量对每个分量求导，即 $\frac{\partial (w^{(2)\prime}{1}h)}{\partial w^{(2)}{1}}=h$ ，其他分量也得到 $h$ 。
  
  为了 $W^{(2)}$ 能够与这个结果直接进行加减运算、更新梯度，求导的结果可记为（注意这种求导依然是不存在的）
  
  $$ \frac{\partial (W^{(2)}h)}{\partial W^{(2)}} = \begin{pmatrix}h^T \ h^T \ \vdots \ h^T\end{pmatrix} $$
  
  W2-=eta * G ，其中eta为学习率，G为 $h^{T}$ ，这样写没问题，因为有广播原则！
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_multilayer-perceptrons/backprop.html
zh.d2l.ai zh.d2l.ai

4.5. 权重衰减 — 动手学深度学习 2.0.0-beta0 documentation

6
1. WangXiaobu 21 Jun 2024
  
  in Public
  
  l = loss(net(X), y) + lambd * l2_penalty(w)
  
  loss 的每个分量都增加了一个 lambd*l2_penalty
  
  总Loss增加了 batch_size*lambd*l2_penalty ↔ $\lambda/2 \lVert\mathbf{w}\rVert^2$
  
  最有意思的是，这也是 pytroch torch.optim.SGD() 里面设置的 weight_decay 的运行方式，即weight_decay 作用于每个维度的参数，最终加和反映到该批次总的目标函数上
  
  自定义实现，这里确实得运用广播机制，每个分量都得加 penalty，而不是 python l = loss(net(X), y) (l.sum()+lambd * l2_penalty(w)).backward()
  
  d2l.sgd() 在反向传播的时候都是除以 bacth_size，所以这里要加，就是每个分量都要加
  
  所以李沐的手动实现代码和简洁实现代码中的 lambd 的效果完全等价，训练的可视化图形的差异是别的原因
2. WangXiaobu 21 Jun 2024
  
  in Public
  
  trainer = torch.optim.SGD([ {"params":net[0].weight,'weight_decay': wd}, {"params":net[0].bias}], lr=lr)
  
  无权重衰减的写法，查看 https://zh.d2l.ai/chapter_linear-networks/linear-regression-concise.html
  
  python trainer = torch.optim.SGD(net.parameters(), lr=0.03)
3. WangXiaobu 21 Jun 2024
  
  in Public
  
  net, loss = lambda X: d2l.linreg(X, w, b), d2l.squared_loss
  
  这个 lambda 函数非常有意思，返回的是两个函数 ❌
  
  等价为 python net = lambda X: d2l.linreg(X, w, b) loss = d2l.squared_loss
4. WangXiaobu 21 Jun 2024
  
  in Public
  
  load_array
  
```python

Defined in file: ./chapter_linear-networks/linear-regression-concise.md

def load_array(data_arrays, batch_size, is_train=True): """Construct a PyTorch data iterator.""" dataset = data.TensorDataset(*data_arrays) return data.DataLoader(dataset, batch_size, shuffle=is_train) ```
5. WangXiaobu 21 Jun 2024
  
  in Public
  
  synthetic_data
  
```python

Defined in file: ./chapter_linear-networks/linear-regression-scratch.md

def synthetic_data(w, b, num_examples): """Generate y = Xw + b + noise.""" X = d2l.normal(0, 1, (num_examples, len(w))) y = d2l.matmul(X, w) + b y += d2l.normal(0, 0.01, y.shape) return X, d2l.reshape(y, (-1, 1)) ``` d2l.reshape 就是调用 reshape(tensor, [shape]) 就地修改

python reshape = lambda x, *args, **kwargs: x.reshape(*args, **kwargs)
6. WangXiaobu 21 Jun 2024
  
  in Public
  
  给定k个变量，阶数为d的项的个数为 (k−1+dk−1)，即Ck−1+dk−1=(k−1+d)!(d)!(k−1)!。
  
  这个的数学计算没搞懂
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_multilayer-perceptrons/weight-decay.html
zh.d2l.ai zh.d2l.ai

4.6. 暂退法（Dropout） — 动手学深度学习 2.0.0-beta0 documentation

3
1. WangXiaobu 21 Jun 2024
  
  in Public
  
  torch.rand
  
  Returns a tensor filled with random numbers from a uniform distribution on the interval :math:[0, 1)
2. WangXiaobu 21 Jun 2024
  
  in Public
  
  num_epochs, lr, batch_size = 10, 0.5, 256 loss = nn.CrossEntropyLoss(reduction='none') train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size) trainer = torch.optim.SGD(net.parameters(), lr=lr) d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
  
  这些代码和简洁实现的代码一样，所以
  
  class Ne(nm.Module) 和 nn.Sequentia() 返回的东西是一样的
3. WangXiaobu 21 Jun 2024
  
  in Public
  
  mask * X
  
  元素乘积
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_multilayer-perceptrons/dropout.html
zh.d2l.ai zh.d2l.ai

4.2. 多层感知机的从零开始实现 — 动手学深度学习 2.0.0 documentation

2
1. WangXiaobu 20 Jun 2024
  
  in Public
  
  return (H@W2 + b2)
  
  net 给的是 $o_k$, 交叉熵损失由 nn.CrossEntropyLoss(reduction='None') 来解决
2. WangXiaobu 20 Jun 2024
  
  in Public
  
  X@W1
  
  在PyTorch中，torch.mm(A, B)和A @ B都是用来进行矩阵乘法的。
  
  torch.mm(A, B)是一个函数，它接受两个2D张量（即矩阵）作为输入，并返回它们的矩阵乘积。这个操作遵循矩阵乘法的规则，即A的列数必须等于B的行数。
  
  A @ B是一个运算符，它也是用来进行矩阵乘法的。它的行为与torch.mm(A, B)基本相同，但是它还支持高维张量的乘法。对于高维张量，它会对最后两个维度进行矩阵乘法，其他维度必须匹配或者为1。
  
  例如：
  
```python import torch

A = torch.tensor([[1, 2], [3, 4]]) B = torch.tensor([[5, 6], [7, 8]])

C = A @ B print(C) # 输出：tensor([[19, 22], [43, 50]]) ```

在这个例子中，A @ B和torch.mm(A, B)的结果是相同的。但是如果A和B是高维张量，那么A @ B可以进行高维矩阵乘法，而torch.mm(A, B)则会报错。
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_multilayer-perceptrons/mlp-scratch.html
zh.d2l.ai zh.d2l.ai

4.4. 模型选择、欠拟合和过拟合 — 动手学深度学习 2.0.0 documentation

2
1. WangXiaobu 20 Jun 2024
  
  in Public
  
  我们试图只用来自大学生的人脸数据来训练一个人脸识别系统，然后想要用它来监测疗养院中的老人。
  
  训练的抽样数据彼此间有相关性，但是训练数据和测试数据间相关性很弱，违反 iid
2. WangXiaobu 20 Jun 2024
  
  in Public
  
  不是让模型记住 ID 和 outcome 之间的关系，而是具备预测能力
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_multilayer-perceptrons/underfit-overfit.html
zh.d2l.ai zh.d2l.ai

4.3. 多层感知机的简洁实现 — 动手学深度学习 2.0.0 documentation

3
1. WangXiaobu 20 Jun 2024
  
  in Public
  
  net.apply(init_weights);
  
  类似 Dataframe.apply 函数
2. WangXiaobu 20 Jun 2024
  
  in Public
  
  nn.init.normal_
  
  nn.init.normal_(m.weight, std=0.01) 是 PyTorch 中的一个函数，用于将权重初始化为正态分布。这个函数会就地（in-place）改变 m.weight 的值。
  
  在这个函数中，m.weight 是一个 nn.Linear 层的权重张量，std=0.01 是正态分布的标准差。函数将 m.weight 中的每个元素初始化为一个随机数，这个随机数来自均值为0、标准差为0.01的正态分布。
  
  这种初始化方法常用于神经网络的权重初始化，因为它可以在训练开始时打破权重的对称性，有助于避免模型陷入不良的局部最优解。
3. WangXiaobu 20 Jun 2024
  
  in Public
  
  nn.Flatten()
  
  ⇔ X = X.reshape((-1, num_inputs))
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_multilayer-perceptrons/mlp-concise.html
zh.d2l.ai zh.d2l.ai

3.6. softmax回归的从零开始实现 — 动手学深度学习 2.0.0 documentation

5
1. WangXiaobu 20 Jun 2024
 
 in Public
 
 assert train_loss < 0.5, train_loss assert train_acc <= 1 and train_acc > 0.7, train_acc assert test_acc <= 1 and test_acc > 0.7, test_acc
 
 这段代码是在进行一些断言检查，以确保训练和测试的结果满足一些预期的条件。
 
 python assert train_loss < 0.5, train_loss 这行代码是检查训练损失（train_loss）是否小于0.5。如果train_loss大于或等于0.5，那么断言就会失败，程序会抛出一个AssertionError异常，并打印出train_loss的值。
 
 python assert train_acc <= 1 and train_acc > 0.7, train_acc 这行代码是检查训练准确率（train_acc）是否在0.7到1之间。如果train_acc小于0.7或大于1，那么断言就会失败，程序会抛出一个AssertionError异常，并打印出train_acc的值。
 
 python assert test_acc <= 1 and test_acc > 0.7, test_acc 这行代码是检查测试准确率（test_acc）是否在0.7到1之间。如果test_acc小于0.7或大于1，那么断言就会失败，程序会抛出一个AssertionError异常，并打印出test_acc的值。
 
 这些断言检查通常用于确保程序的运行结果满足预期。如果某个断言检查失败，那么说明可能存在一些问题，例如模型可能没有正确地训练，或者数据可能存在一些问题。
2. WangXiaobu 20 Jun 2024
 
 in Public
 
 class Animator
 
 这段代码定义了一个名为Animator的类，该类用于在动画中绘制数据。下面是对这个类的主要方法和属性的解释：
 
 __init__：这是类的构造函数，用于初始化对象。它接受多个参数，包括标签、坐标轴限制、坐标轴比例、线型、子图数量、图形大小等。这些参数用于配置图形的各种属性。
 
 add：这个方法用于向图表中添加数据点。它接受两个参数x和y，分别表示数据点的x坐标和y坐标。如果x或y不是列表，那么它会被转换为列表。然后，这些数据点被添加到self.X和self.Y中，这两个属性分别存储所有数据点的x坐标和y坐标。最后，这个方法会清除当前的图形，然后根据新的数据点重新绘制图形。
 
 self.fig和self.axes：这两个属性分别表示图形和子图的对象。self.fig是一个Figure对象，表示整个图形。self.axes是一个Axes对象的列表，表示图形中的子图。
 
 self.config_axes：这是一个函数，用于配置子图的各种属性，如标签、坐标轴限制、坐标轴比例等。
 
 self.X和self.Y：这两个属性分别存储所有数据点的x坐标和y坐标。
 
 self.fmts：这个属性存储了线型的列表，用于在绘制数据点时指定线型。
 
 这个类的主要作用是提供一个方便的接口，用于在动画中绘制数据点。你可以创建一个Animator对象，然后通过调用add方法向图表中添加数据点。每次调用add方法时，图表都会被清除并重新绘制，因此你可以看到数据点的动态变化。
3. WangXiaobu 20 Jun 2024
 
 in Public
 
 X.shape[0]
 
 batch_size
4. WangXiaobu 20 Jun 2024
 
 in Public
 
 l.mean().backward()
 
 l.mean().backward()：当使用PyTorch内置的优化器和损失函数时，损失l是一个批次的平均损失，因此我们需要对平均损失进行反向传播。这样做的好处是，平均损失对批次大小不敏感，无论批次大小如何，梯度的规模都保持不变。
5. WangXiaobu 20 Jun 2024
 
 in Public
 
 zip(self.data, args)
 
``` list(zip([0,0], [1,2]))

[(0, 1), (0, 2)] ``` [0+1,0+2] 更新
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_linear-networks/softmax-regression-scratch.html
zh.d2l.ai zh.d2l.ai

4.1. 多层感知机 — 动手学深度学习 2.0.0 documentation

2
1. WangXiaobu 20 Jun 2024
  
  in Public
  
  retain_graph=True
  
  sigmoid函数的导数为下面的公式：
  
  $$\frac{d}{dx} \operatorname{sigmoid}(x) = \frac{\exp(-x)}{(1 + \exp(-x))^2} = \operatorname{sigmoid}(x)\left(1-\operatorname{sigmoid}(x)\right).$$
  
  sigmoid函数的导数图像如下所示。注意，当输入为0时，sigmoid函数的导数达到最大值0.25；而输入在任一方向上越远离0点时，导数越接近0。
  
  一个常见的需要保留计算图的例子是计算二阶导数（或者叫做Hessian向量积）。在某些优化算法，如牛顿法和共轭梯度法中，需要用到二阶导数。下面是一个简单的例子：
  
```python import torch

创建一个张量，并设置requires_grad=True使其可以计算梯度

x = torch.tensor([1.0], requires_grad=True)

定义一个函数

y = x ** 3

第一次反向传播，计算一阶导数

y.backward(retain_graph=True)

打印一阶导数

print(x.grad) # 输出：tensor([3.])

因为我们要进行第二次反向传播，所以需要先清零梯度

x.grad.zero_()

第二次反向传播，计算二阶导数

y.backward(retain_graph=True)

打印二阶导数

print(x.grad) # 输出：tensor([6.]) ```

在这个例子中，我们首先定义了一个函数y = x ** 3，然后我们两次调用.backward()方法，第一次计算一阶导数，第二次计算二阶导数。在两次反向传播之间，我们需要调用x.grad.zero_()来清零梯度，因为PyTorch默认会累积梯度，而不是替换梯度。同时，我们需要在调用.backward()方法时设置retain_graph=True，以保留计算图，否则在第二次反向传播时会报错，因为计算图已经被清空。
2. WangXiaobu 20 Jun 2024
  
  in Public
  
  输入层不涉及任何计算，因此使用此网络产生输出只需要实现隐藏层和输出层的计算。因此，这个多层感知机中的层数为2。
  
  李沐按照计算层的数量定义层数
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_multilayer-perceptrons/mlp.html
zh.d2l.ai zh.d2l.ai

3.7. softmax回归的简洁实现 — 动手学深度学习 2.0.0 documentation

1
1. WangXiaobu 20 Jun 2024
  
  in Public
  
  reduction='none'
  
  nn.CrossEntropyLoss是PyTorch中的一个类，它实现了交叉熵损失函数。交叉熵损失函数常用于多分类问题，它可以度量模型的预测概率分布与真实概率分布之间的差异。
  
  reduction='none'是一个参数，它指定了如何对每个样本的损失进行聚合。'none'表示不进行聚合，即返回一个损失值的向量，向量的每个元素对应一个样本的损失。其他可能的值包括'mean'（返回所有样本损失的平均值）和'sum'（返回所有样本损失的总和）。
  
  在 train_ch3 → train_epoch_ch3 中内置优化器是 l.mean().backwar()
  
  在这个例子中，我们选择'none'是因为我们想要在后续的计算中手动处理每个样本的损失，例如，我们可能想要计算每个样本损失的平均值，或者只关注损失最大的几个样本。
Visit annotations in context

Annotators

WangXiaobu

URL

zh.d2l.ai/chapter_linear-networks/softmax-regression-concise.html
github.com github.com

The value of stateProbs() and getTrProbs() functions can't match with each other? · bmcclintock/momentuHMM · Discussion #183

1
1. WangXiaobu 02 Jun 2024
  
  in Public
  
  tutorial
  
  文章《Uncovering ecological state dynamics with hidden Markov models》的Appendix pd，已存档Zotero
Visit annotations in context

Annotators

WangXiaobu

URL

github.com/bmcclintock/momentuHMM/discussions/183
Mar 2024
zhuanlan.zhihu.com zhuanlan.zhihu.com

stata构建ARIMA模型并作预测，命令及过程

1
1. WangXiaobu 29 Mar 2024
  
  in Public
  
  分析：由BLR一阶差分自相关图和偏自相关图，自相关图的短期（阶数为5期），自相关系数衰减快，具有短期自相关性，表现为拖尾。除了延迟4期和14期跳出阴影图，其余均落在2倍标准误范围内，绕零值做上下不超过0.2的波动。偏自相关图的短期（阶数为5期），自相关系数衰减快，具有短期自相关性，表现为拖尾。除了延迟4期、14期和18期跳出阴影图，其余均落在2倍标准误范围内，绕零值做上下不超过0.1的波动
  
  什么样的模型要验证直到20阶以后的（偏）自相关系数呀都不用看。所以这两个图都是拖尾
Visit annotations in context

Annotators

WangXiaobu

URL

zhuanlan.zhihu.com/p/517868996
github.com github.com

openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

1
1. WangXiaobu 09 Mar 2024
  
  in Public
  
  Fleurs
  
  在此数据集上，Spanish错误率最低为2.8%，English的错误率为4.1%，中文Mandarin的错误率为7.7%。
Visit annotations in context

Annotators

WangXiaobu

URL

github.com/openai/whisper
Jan 2024
github.com github.com

Initial distribution of HMM in simData · Issue #21 · bmcclintock/momentuHMM

2
1. WangXiaobu 31 Jan 2024
  
  in Public
  
  You're getting an error when delta=c(0,1) because parameters cannot be set on a boundary
  
  不能设置为1或0这两个边界值，只能设置成非常接近边界值的值，例如1.e-100
2. WangXiaobu 31 Jan 2024
  
  in Public
  
  So the first state seems not to honor the given initial distribution, because the first sequence starts from state 2 instead of state 1.
  
  ID step angle x y states 1 1 833.60385 NA 0.0000 0.0000 2
Visit annotations in context

Annotators

WangXiaobu

URL

github.com/bmcclintock/momentuHMM/issues/21
Sep 2023
wrds-www.wharton.upenn.edu wrds-www.wharton.upenn.edu

WRDS Overview of Compustat Execucomp

1
1. WangXiaobu 14 Sep 2023
  
  in Public
  
  served as director during the indicated fiscal year
  
  只是董事会成员，并不一定是董事长
Visit annotations in context

Annotators

WangXiaobu

URL

wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/compustat/execucomp/overview-executive-compensation/
isso.utdallas.edu isso.utdallas.edu

Financial Requirements - International Students and Scholars Office | The University of Texas at Dallas

5
1. WangXiaobu 12 Sep 2023
  
  in Public
  
  Only salary statements from a sponsor will be accepted.
  
  赞助人的薪资声明，那就是要顾老师的薪资证明，是不可以的
2. WangXiaobu 07 Sep 2023
  
  in Public
  
  Incoming J-1 Exchange Students
  
  点击这个查看学费估算
3. WangXiaobu 07 Sep 2023
  
  in Public
  
  bank statements
  
  银行资产证明
4. WangXiaobu 07 Sep 2023
  
  in Public
  
  Tuition Insurance
  
  但是这个tuition insurance 需不需要买呢？
5. WangXiaobu 07 Sep 2023
  
  in Public
  
  Once you enroll in classes
  
  我们不注册课程
Visit annotations in context

Annotators

WangXiaobu

URL

isso.utdallas.edu/joining-ut-dallas/expenses/
github.com github.com

Extracting transition probabilities and confidence intervals · bmcclintock/momentuHMM · Discussion #78

1
1. WangXiaobu 04 Sep 2023
  
  in Public
  
  just the means that are shown when you print the model
  
  指的是 plot.momentHMM ↔ plot(momentHMM)
Visit annotations in context

Annotators

WangXiaobu

URL

github.com/bmcclintock/momentuHMM/discussions/78
Aug 2023
stats.oarc.ucla.edu stats.oarc.ucla.edu

Survey Data Analysis with R

9
1. WangXiaobu 13 Aug 2023
  
  in Public
  
  Note that if we added a random slope, the number of rows in Z would remain the same, but the number of columns would double.
  
  这里只考虑了随机截距，没有随机斜率。但是可以有些变量也有随机效应的 $z$的列双倍，那意味着$u$的行也要双倍
2. WangXiaobu 12 Aug 2023
  
  in Public
  
  .878
  
  $\exp(-0.13)=0.878$
3. WangXiaobu 12 Aug 2023
  
  in Public
  
  the 20th, 40th, 60th, and 80th percentiles
  
  下面的 u = -.158,-0.47,0.54,1.82 分别对应随机效应u的这四个分位数
4. WangXiaobu 12 Aug 2023
  
  in Public
  
  remission
  
  病情缓解
5. WangXiaobu 12 Aug 2023
  
  in Public
  
  GLMM
  
  genelarized linear mixed model 多了个mixed
6. WangXiaobu 12 Aug 2023
  
  in Public
  
  Note that we call this a probability mass function rather than probability density function because the support is discrete (i.e., for positive integers).
  
  第一次明白 probability mass function 和 probability density function的区别
7. WangXiaobu 12 Aug 2023
  
  in Public
  
  We could also frame our model in a two level-style equation
  
  不再使用开头的矩阵表达，转为 two level-style equation
  
  注意：依然只有随机斜率，没有随即截距
8. WangXiaobu 12 Aug 2023
  
  in Public
  
  The final element in our model is the variance-covariance matrix of the residuals
  
  开始讨论模型的最后一个元素，the variance-covariance matrix of the residuals
9. WangXiaobu 12 Aug 2023
  
  in Public
  
  Because we directly estimated the fixed effects, including the fixed effect intercept, random effect complements are modeled as deviations from the fixed effect, so they have mean zero
  
  为什么假设随机效应服从的正态分布均值为0的理由
Visit annotations in context

Annotators

WangXiaobu

URL

stats.oarc.ucla.edu/r/dae/robust-regression/
github.com github.com

fitHMM in parallel · Issue #32 · bmcclintock/momentuHMM

1
1. WangXiaobu 11 Aug 2023
  
  in Public
  
  To my knowledge there is no way to "fit the HMM in parallel".
  
  没有拟合单个HMM并行加速的方法
Visit annotations in context

Annotators

WangXiaobu

URL

github.com/bmcclintock/momentuHMM/issues/32
Jul 2023
statisticsglobe.com statisticsglobe.com

Negative Binomial Distribution in R | dnbinom, pnbinom, qnbinom, rnbinom

1
1. WangXiaobu 26 Jul 2023
  
  in Public
  
  size (i.e. number of trials)
  
  与momentuHMM包的 dnbinom_rcpp 函数不同，这里的size 指的是总的 trials
Visit annotations in context

Annotators

WangXiaobu

URL

statisticsglobe.com/negative-binomial-distribution-in-r-dnbinom-pnbinom-qnbinom-rnbinom/
Jun 2023
zhuanlan.zhihu.com zhuanlan.zhihu.com

数模系列(6)：方差分析（ANOVA）

1
1. WangXiaobu 12 Jun 2023
  
  in Public
  
  389.25
  
  这个均值算错了，应该是 388.5
Visit annotations in context

Annotators

WangXiaobu

URL

zhuanlan.zhihu.com/p/33357167
grf-labs.github.io grf-labs.github.io

Evaluating a causal forest fit

1
1. WangXiaobu 01 Jun 2023
  
  in Public
  
  这个图有三种颜色的原因是：红色和绿色重合了，变成了另外一种颜色
Visit annotations in context

Annotators

WangXiaobu

URL

grf-labs.github.io/grf/articles/diagnostics.html
May 2023
grf-labs.github.io grf-labs.github.io

Estimating ATEs on a new target population

3
1. WangXiaobu 31 May 2023
  
  in Public
  
  The superscript (-1)
  
  这里写错了，应该是上标 $$(-i)$$
2. WangXiaobu 31 May 2023
  
  in Public
  
  We say the first subset participated in a trail, and that Si=1Si=1S_i = 1, and the second did not: Si=0Si=0S_i = 0.
  
  trail population $$S_i=1$$ covariates + binary treatment + response
  
  target population $$S_i=0$$ 只有covariates
3. WangXiaobu 31 May 2023
  
  in Public
  
  RCT
  
  In the context of the article you mentioned, RCT stands for Randomized Controlled Trial. It refers to a subset of a population for which we have access to results from a randomized/observational trial/study 在你提到的文章中，RCT代表随机对照试验。它指的是我们可以从随机/观察性试验/研究中获得结果的人群的一个子集
Visit annotations in context

Annotators

WangXiaobu

URL

grf-labs.github.io/grf/articles/ate_transport.html
Apr 2023
zhuanlan.zhihu.com zhuanlan.zhihu.com

PSM-倾向得分匹配分析的误区

1
1. WangXiaobu 30 Apr 2023
  
  in Public
  
  一旦获得了 _weight 变量，就相当于对样本的匹配情况进行了标记，我们可以直接在 regress 命令后附加加 fweight = _weight 进行样本匹配后的回归。其中，fweight 为「frequency weights」的简写，是指观测值重复次数的权重。若是 1:2 重复匹配，成功匹配的处理组 _weight = 2 / 2，成功匹配的控制组 _weight = 参与匹配次数 / 2，即都要除以 2 进行标准化。因此，若想继续使用 fweight 选项，需要 _weight * 2 转化为频数。
  
  忽略了处理组个体的 _weight 需要为1
  
  econometrics
Visit annotations in context

Tags

econometrics

Annotators

WangXiaobu

URL

zhuanlan.zhihu.com/p/148549112
pandas.pydata.org pandas.pydata.org

pandas.Series.fillna — pandas 2.0.1 documentation

1
1. WangXiaobu 25 Apr 2023
  
  in Public
  
  Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively.
  
  可以对部分列进行操作，不需要对所有列都进行操作（填补缺失值）
  
  code
Visit annotations in context

Tags

code

Annotators

WangXiaobu

URL

pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html
www.statalist.org www.statalist.org

formula stata uses to calculate stdp and stdf - Statalist

1
1. WangXiaobu 12 Apr 2023
  
  in Public
  
  Correction:
  
  这个 $(1+x_i)'$ 有点问题，应该是 $(1 x_i)'$ 这样的转置后的一个行向量
Visit annotations in context

Annotators

WangXiaobu

URL

statalist.org/forums/forum/general-stata-discussion/general/1292903-formula-stata-uses-to-calculate-stdp-and-stdf
bbs.pinggu.org bbs.pinggu.org

dropped singleton observations有什么影响？ - 第2页 - Stata专版 - 经管之家(原人大经济论坛)

2
1. WangXiaobu 03 Apr 2023
  
  in Public
  
  通常在使用xtreg, fe命令时（即控制了企业层面的个体固定效应），如果再控制行业或者区域就会出现多重共线性的问题，会被stata自动drop掉，因为企业层面的个体固定效应已经包含了行业或者区域固定效应。
  
  肯定的呀，你自己都知道因为多重共线性被 omitted 了
  
  典型的 nested 固定效应
2. WangXiaobu 03 Apr 2023
  
  in Public
  
  那为什么reghdfe可以同时控制这三者（即企业、地区、行业）并且不提示共线性的问题？（虽然结果是与xtreg, fe命令一致的）。
  
  因为 -reghdfe- 根本不会汇报固定效应，而是直接在redundant 栏汇报
  
  -xtreg, fe- 除了个体固定效应外，其他的都会汇报
Visit annotations in context

Annotators

WangXiaobu

URL

bbs.pinggu.org/thread-7139297-1-1.html
Mar 2023
zhuanlan.zhihu.com zhuanlan.zhihu.com

这20个高质量超高清的壁纸网站，强烈推荐！

1
1. WangXiaobu 08 Mar 2023
  
  in Public
  
  4、Wallhaven
  
  这个挺好用的
Visit annotations in context

Annotators

WangXiaobu

URL

zhuanlan.zhihu.com/p/217806183

1. 输入结构：“问题-文本”的拼接规则

2. 预测目标：定位“文本部分”中的答案片段

3. 关键辅助机制：token_type_ids区分“问题”与“文本”

总结

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

重新定义 前向传播函数

对于输出答案初始位置，词向量层的贡献计算

对于输出答案结束位置，词向量层的贡献计算

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Defined in file: ./chapter_linear-networks/linear-regression-concise.md

Defined in file: ./chapter_linear-networks/linear-regression-scratch.md

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

创建一个张量，并设置requires_grad=True使其可以计算梯度

定义一个函数

第一次反向传播，计算一阶导数

打印一阶导数

因为我们要进行第二次反向传播，所以需要先清零梯度

第二次反向传播，计算二阶导数

打印二阶导数

Annotators

URL

Annotators

URL

重新定义前向传播函数