Posts by Collection

cv

life

MIDL in Zurish

less than 1 minute read

Published:


The water in Zurish is very very clean.

2022 summer vacation 1

less than 1 minute read

Published:


The beautiful summer vacation started from Berlin.

Daily record

less than 1 minute read

Published:

A very happy day

Night of Leiden

less than 1 minute read

Published:


The beautiful and peaceful night of Leiden.

publications

Stability of Nonfullerene Organic Solar Cells: from Built‐in Potential and Interfacial Passivation Perspectives

less than 1 minute read

Published in Journal, 2019

This paper is the work done in Hong Kong Baptist University.

Recommended citation: Wang, Y., Lan, W., Li, N., Lan, Z., Li, Z., Jia, J., & Zhu, F. (2019). Stability of Nonfullerene Organic Solar Cells: from Built‐in Potential and Interfacial Passivation Perspectives. Advanced Energy Materials, 9(19), 1900157. https://onlinelibrary.wiley.com/doi/full/10.1002/aenm.201900157

Wideband circularly polarized dielectric resonator antenna loaded with partially reflective surface

1 minute read

Published in Journal, 2019

A wideband circularly polarized (CP) dielectric resonator antenna (DRA) loaded with the partially reflective surface for gain enhancement is presented in this article. First, the DRA is excited by a microstrip line through modified stepped ring cross‐slot to generate the circular polarization. Four modified parasitic metallic plates are sequentially placed around the DRA for greatly widening the axial‐ratio bandwidth. Then, a partially reflective surface is introduced for enhancing the gain performance and further improving the CP bandwidth as well. Finally, an optimized prototype is fabricated to verify the design concept. The measured results show that the proposed DRA achieves 54.3% impedance bandwidth (VSWR<2) and 54.9% 3‐dB AR bandwidth. Besides, its average and peak gains are 10.7 dBic and 14.2 dBic, respectively. Wide CP band and high gains make the proposed DRA especially attractive for some broadband wireless applications such as satellite communication and remote sensing.

Recommended citation: Wen, J., Jiao, Y. C., Zhang, Y. X., & Jia, J. (2019). Wideband circularly polarized dielectric resonator antenna loaded with partially reflective surface. International Journal of RF and Microwave Computer‐Aided Engineering, 29(12), e21962. https://onlinelibrary.wiley.com/doi/full/10.1002/mmce.21962

talks

Introduction on Mandarin

less than 1 minute read

Published:

This is a casual and brief introduction on Mandarin Chinese. The used material is here.

technical_blog

MeVisLab tips

less than 1 minute read

Published:

Record some knowledge I need to know for MeVisLab.

Slurm tips

1 minute read

Published:

Frequently used commands

Slurm tips

less than 1 minute read

Published:

As the teaching assitant, I need to be familar with MATLAB more. The following commands will be used in courses.

Jupyter tips

less than 1 minute read

Published:

Some jupyter notebook use tips.

PyCharm tips

less than 1 minute read

Published:

Add Google Analytics feature

less than 1 minute read

Published:

This will record how to add google analytics. I can check the visitor count now from google analytics. The next step is to show it in the web.

Full checklist for a complete python project

5 minute read

Published:

  • Pycharm clone project (optional)
  • Pycharm remote interpreter
  • Pycharm remote deployment
  • Pycharm line separator change from CRLF to LF: file -> file properties -> line separator
  • Structure Overview LICENSE
    • README.md
    • Requirements.txt
    • Setup.py
    • Test (directory)
    • Src / projecct_name (directory)
    • Scripts (directory)
    • Docs (directory)
    • .github (directory)
    • .coveragerc

Checklist for a cluster

less than 1 minute read

Published:

Checklist for Remote cluster working

  1. Reset password
  2. Use key instead of password to save time
  3. Set mobaxterm
  4. Set .ssh/config file
    1. Set login alias
    2. Set to skip gateway
    3. Set port forward
    4. Set port tunnel
    5. Set x11 forwarding

Advanced usage of MLFlow

less than 1 minute read

Published:

MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Add search to your website

2 minute read

Published:

As a technical blog, with the increasement of contents, it becomes difficult to remember all the things you have written down. Thast’s why we need a Search function for our website.

Automatically Generate Documentation for your Project/Package

1 minute read

Published:

Once we built a project/package, we need to tell the users how to use it. Normally we could write a manual. But with the update of the code, the manual needs to be updated at the same time. How to synthesize the update of the manual? We need to use some automatic tools to automate the pipeline. This blog will tell you how to build your Documentation automatically.

Linux commands

less than 1 minute read

Published:

Record some Linux commands I used in case I need them in the future.

Ruby

less than 1 minute read

Published:

Comparision between Packaging commands in Ruby and Python.

Build your personal website

1 minute read

Published:

This blog will show the step-by-step experience building my own academic website.

Why you need a website?

Build a dual-language website

less than 1 minute read

Published:

I checked a great number of blogs about how to build a dual-language website. Most of them need to write seperate content by myself for two languages which is not what I expected. So I finally found my own method to achieve it.

Relative Path

less than 1 minute read

Published:

How to set path is a hot discussed topic.

  1. How to import our own modules? relative path?
  2. How to design the structure of our project?
  3. How to set the results path of the script?
  4. How to set the results path of some visuallization scripts?
  5. How to set PyCharm/VS Code debug working directory?

Remote debugging

1 minute read

Published:

PyCharm and VS Code provide remote debugging features. Let’s see how to implement it, respectively.

Before continue, you need to know some knowledge on ssh from my previous blog.

No gateway/proxy between local machine and remote machine

PyCharm

Placeholder

VS Code

Before remote development, we need to config the SSH for VS Code.

  1. Install remote development extension.
  2. In remote explore, click SSH targets
  3. Click config, add the following code to SSH configeration file XXX\.ssh\config
     Host remote1
       HostName remote-host-name
       User username
       IdentityFile XXX\.ssh\id_rsa
    
  4. The IdentityFile path should be the private key path which was generated following this blog (If you have already generate the key in VS Code terminal, then you can remote this line)
  5. The first connection may require password, after that you can connect remote machine without password.

The above content can also be found as the first chapter of this blog.But the second part of this blog is too complex. Do not use it.

After we successfully connect to remote machine, we can open the remote directory/file using VS Code and run the code directly using the remote Python interpreter (ctrl+shift+p -> Python: interpreter to select your prefer interpreter).

Gateway/proxy between local machine and remote machine

PyCharm

VS Code

The same process with the aforementioned No gateway method, the only thing we need to pay attention is to build password-less SSH connection via gateway following this blog

Host remote2
  HostName remote-host-name
  User username
  ProxyJump username@gateway-host-name:22

SSH 知识详解 Knowledge

less than 1 minute read

Published:

今天想使用VS Code的远程debug功能,所以需要回忆一下SSH的相关知识。

什么是SSH?

简单说,SSH是一种网络协议,用于计算机之间的加密登录。如果一个用户从本地计算机,使用SSH协议登录另一台远程计算机,我们就可以认为,这种登录是安全的,即使被中途截获,密码也不会泄露。最早的时候,互联网通信都是明文通信,一旦被截获,内容就暴露无疑。1995年,芬兰学者Tatu Ylonen设计了SSH协议,将登录信息全部加密,成为互联网安全的一个基本解决方案,迅速在全世界获得推广,目前已经成为Linux系统的标准配置。 SSH只是一种协议,存在多种实现,既有商业实现,也有开源实现。本文针对的实现是OpenSSH,它是自由软件,应用非常广泛。这里只讨论SSH在Linux Shell中的用法。如果要在Windows系统中使用SSH,会用到另一种软件PuTTY,这需要另文介绍。

SSH原理以及中间人攻击

SSH之所以能够保证安全,原因在于它采用了公钥加密。 整个过程是这样的:

  1. 远程主机收到用户的登录请求,把自己的公钥发给用户。
  2. 用户使用这个公钥,将登录密码加密后,发送回来。
  3. 远程主机用自己的私钥,解密登录密码,如果密码正确,就同意用户登录。

这个过程本身是安全的,但是实施的时候存在一个风险:如果有人截获了登录请求,然后冒充远程主机,将伪造的公钥发给用户,那么用户很难辨别真伪。因为不像https协议,SSH协议的公钥是没有证书中心(CA)公证的,也就是说,都是自己签发的。 可以设想,如果攻击者插在用户与远程主机之间(比如在公共的wifi区域),用伪造的公钥,获取用户的登录密码。再用这个密码登录远程主机,那么SSH的安全机制就荡然无存了。这种风险就是著名的”中间人攻击”(Man-in-the-middle attack)。

SSH的用处

SSH主要用于远程登录服务器(主要是Linux服务器),进行代码部署、运行、debugging等。

SSH的安装

Linux系统安装SSH

Linux系统上有开源软件openssh可以实现ssh协议。OpenSSH分客户端openssh-client和服务器端openssh-server

如果你只是想登陆别的机器的SSH只需要安装openssh-client(ubuntu有默认安装,如果没有则sudoapt-get install openssh-client),如果要使本机开放SSH服务就需要安装openssh-server。

运行在服务器上的Linux系统默认已经安装了SSH-server. 运行在用户端的Ubuntu等很多Linux系统默认已经安装了ssh client。 尽管如此,如果你想给一个Linux系统安装ssh的服务器端或/和客户端,请看此文

Windows系统安装SSH

Windows一般极少作为服务器端,不过如果你发现一台Windows服务器,它大概率都是拥有ssh服务器端功能的。运行在用户端的Windows系统不一定有SSH客户端,需要安装SSH客户端软件。如何安装请看下文。

常用的SSH客户端有哪些?

Windows和linux都有很多专门的SSH客户端软件(或者具有SSH客户端功能的软件)。只是Windows系统的客户端软件一般有UI界面,Linux系统的客户端软件一般没有UI界面,通过命令行运行。

  • Mobaxterm (只支持windows系统)
  • PowerShell (只支持windows系统)
  • WSL
  • xshell
  • Putty
  • XManager
  • OpenSSH

更多见此博文

SSH秘钥登录

从上文的SSH原理可知,SSH 默认采用密码登录,这种方法有很多缺点,简单的密码不安全,复杂的密码不容易记忆,每次手动输入也很麻烦。所以,密钥登录是更好的解决方案。秘钥登录又称为免密(码)登录

使用密码登录时候,客户端不需要生成或者使用自己的私钥或公钥,整个过程使用服务器端的公钥即可。但是客户端使用秘钥登录的话,需要自己生成自己的私钥公钥对,然后把公钥传给服务器端。具体登录的原理如下:

利用密钥生成器制作一对密钥——一只公钥和一只私钥。将公钥添加到服务器的某个账户上,然后在客户端利用私钥即可完成认证并登录。这样一来,没有私钥,任何人都无法通过 SSH 暴力破解你的密码来远程登录到系统。此外,如果将公钥复制到其他账户甚至主机,利用私钥也可以登录。

具体登录过程如下: SSH 密钥登录分为以下的步骤。

预备步骤,客户端通过ssh-keygen生成自己的公钥和私钥。

  • 第一步,手动将客户端的公钥放入远程服务器的指定位置。
  • 第二步,客户端向服务器发起 SSH 登录的请求。
  • 第三步,服务器收到用户 SSH 登录的请求,发送一些随机数据给用户,要求用户证明自己的身份。
  • 第四步,客户端收到服务器发来的数据,使用私钥对数据进行签名,然后再发还给服务器。
  • 第五步,服务器收到客户端发来的加密签名后,使用对应的公钥解密,然后跟原始数据比较。如果一致,就允许用户登录。

具体执行命令详见网上其他文章

同一个用户端机器上的同一个SSH客户端软件连接多个服务器的话,需要生成多个秘钥对吗?

比如用MobaXterm去免密码连接多个服务器,需要在MobaXterm里生成多个秘钥对吗? 答案是:不需要。通常,我们只会生成一个SSH Key,名字叫id_rsa,然后提交到多个不同的网站(如:GitHub、DevCloud或Gitee)。再比如,我们在PowerShell上生成了一堆秘钥对之后把公钥传给多个服务器,就可以在PowerShell上免密码登录多个服务器。

同一个用户端机器上的多个SSH客户端软件连接同一个服务器的话,需要生成多个秘钥对吗?

一个客户端机器(Windows系统或者Linux系统)可以有很多SSH客户端软件。比如Windows上有PowerShell和MobaXterm。每个软件都可以生成SSH秘钥对,存放在各自软件对应的文件夹下(也可以自定义文件夹)。所以一个客户端机器是可能存在多个私钥公钥对的。

比如MobaXterm软件生成的秘钥对存放在C:\Users\jjia\DOCUME~1\MOBAXT~1\home\.ssh(可以通过在MobaXterm界面运行open /home/mobaxterm来弹出ssh所在文件夹)。通过MobaXterm运行ssh命令默认都会调用这里的秘钥对。

所以在MobaXterm里配置过后,可以通过MobaXterm免密登录。但是不代表这个配置可以同步应用到其他软件(比如PowerShell或者VS Code)。因为其他软件可能应用的是不同文件夹下的不同的私钥,因此对于其他软件又需要按照类似的方法单独配置免密登录。比如,我在MobaXterm里生成了秘钥对,把MobaXterm默认的.ssh文件夹下的公钥复制到remote server上。但是VS Code是没法使用的。我需要在VS Code的终端重新生成秘钥对,然后重新把新的秘钥对添加到远程的服务器上。注意不要覆盖掉远程服务器之前的公钥。具体可以采用ssh-copy-id -i ~/.ssh/id_rsa.pub <user-name>@server.xxx.xxx的方式。对于有gateway的情况,需要先复制公钥到gateway: ssh-copy-id -i ~/.ssh/id_rsa.pub <user-name>@gateway.xxx.xxx,再复制公钥到remote server: ssh-copy-id -i ~/.ssh/id_rsa.pub <user-name>@server.xxx.xxx

或者,在其他软件中专门设置去调用MobaXterm里生成的秘钥对 (但是我对如何设置并没有太多经验,这里留待其他人补充)。

在一台电脑上,如何配置多个SSH Key?

存在另一种需要,我们在同一个网站上,注册了两个用户名,通常网站不会允许我们为这两个用户名,配置同一个SSH Key,这时候就会有些麻烦。我们就需要在一台电脑上,如何配置多个SSH Key.

详见此文

SSH高级配置 (need to be reviewed and correced)

Take my ssh config as an example.

Host login2tunnel
    HostName xxx.xxx.xxx
    User myname
    ProxyJump [email protected]:22
    LocalForward 5000 localhost:5000
    LocalForward 22 xxx.xxx.xxx:22
    ServerAliveInterval 60

跳板机/代理配置

端口转发

参考资料:

  • https://blog.csdn.net/u013452337/article/details/80847113
  • https://www.runoob.com/w3cnote/set-ssh-login-key.html

Set DNS of a custom domain for your website

less than 1 minute read

Published:

什么是DNS域名解析

我们首先要了解域名和IP地址的区别。IP地址是互联网上计算机唯一的逻辑地址,通过IP地址实现不同计算机之间的相互通信,每台联网计算机都需要通过IP地址来互相联系和分别。

但由于IP地址是由一串容易混淆的数字串构成,人们很难记忆所有计算机的IP地址,这样对于我们日常工作生活访问不同网站是很困难的。基于这种背景,人们在IP地址的基础上又发展出了一种更易识别的符号化标识,这种标识由人们自行选择的字母和数字构成,相比IP地址更易被识别和记忆,逐渐代替IP地址成为互联网用户进行访问互联的主要入口。这种符号化标识就是域名。

域名虽然更易被用户所接受和使用,但计算机只能识别纯数字构成的IP地址,不能直接读取域名。因此要想达到访问效果,就需要将域名翻译成IP地址。而DNS域名解析承担的就是这种翻译效果。

对于托管在github上的我们的个人网站,如果我们想给它新的域名,我们需要把新的域名映射到github的服务器上去。

域名解析详细过程参见原文章

我个人主要学习了原文章中的以下知识:

域名解析记录

主要分为A记录、MX记录、CNAME记录、NS记录和TXT记录:

1、A记录

A代表Address,用来指定域名对应的IP地址,如将item.taobao.com指定到115.238.23.xxx,将switch.taobao.com指定到121.14.24.xxx。A记录可以将多个域名解析到一个IP地址,但是不能将一个域名解析到多个IP地址

2、MX记录

Mail Exchange,就是可以将某个域名下的邮件服务器指向自己的Mail Server,如taobao.com域名的A记录IP地址是115.238.25.xxx,如果将MX记录设置为115.238.25.xxx,即[email protected]的邮件路由,DNS会将邮件发送到115.238.25.xxx所在的服务器,而正常通过Web请求的话仍然解析到A记录的IP地址

3、CNAME记录

Canonical Name,即别名解析。所谓别名解析就是可以为一个域名设置一个或者多个别名,如将aaa.com解析到bbb.net、将ccc.com也解析到bbb.net,其中bbb.net分别是aaa.com和ccc.com的别名

4、NS记录

为某个域名指定DNS解析服务器,也就是这个域名由指定的IP地址的DNS服务器取解析

5、TXT记录

为某个主机名或域名设置说明,如可以为ddd.net设置TXT记录为”这是XXX的博客”这样的说明

所以我将我买了域名之后,设置如下: set_dns

购买域名以及如何设置新域名,我另一篇文章有描述。

Reference

  1. https://blog.csdn.net/bangshao1989/article/details/121913780

Custom domain for your website

1 minute read

Published:

You may think that yourname.github.io is not cool enough compared with yourname.com. No worries, we can change the domain.

NOte: the following steps could also be found on official documentation

  1. Buy your prefered domain. I bought it from namecheap. I strongly recommend you to also buy the SSL which can make your website more safe by starting with https instead of http. Some browsers will alear unsafe if your website starts with http.
  2. After you bought your domain, you can go to the setting of you repository username.github.io, click Code and automation-> Pages -> Custom domain -> enter www.yourname.com -> Save.
  3. Set DNS. Go to your domain provider website namecheap, for example. Go to Dashboard -> manage -> Advanced DNS, set DNS like the following figure. The IP address in the following screenshot is the IP address of github servers. You can also copy them from the official documentation set_dns
  4. Then you need to wait for several hours to 24 hours. You can check the DNS status on setting of your repository. It looks like the followinig figures. If it shows unsuccessful, you can try again.

    dns_checking

  5. Set SSL.

    SSL can change your domain from http to https, so that browser would think your website become Safe. Otherwise, your website would look like: ssl_unsafe

    So how to set SSL? `At first you need to buy it from your domain provider. After that you need to install and activate it.` DO NOT BUY SSL! Because Github itself provides very simple method to achieve it. Just go to the setting of your repository username.github.io, click Code and automation-> Pages -> Check Enforce HTTPS. You may need to wait for several minutes to observe its influence, and you may see the layout of the whole website broke down during these minites. You can try another browser, or clear the browser cache (e.g. Chrome incognito) and try to visit the website again. Please be patient and you will see the perfect performance finally.

What is face/edge/cornor connectivity?

less than 1 minute read

Published:

3D image

3d box

From the above image (several voxels in 3D image) we can observe that in 3D image, two neighboring pixels could have 3 different relationships: face-connected (red voxel VS gray voxel, distance=1), edge-connected (red voxel VS blue voxel, distance=sqrt(2)) and corner-connected (red voxel VS green voxel, distance=sqrt(3)).

Therefore, given red voxel is the object, drawing its border could have 3 different results: face-connected (only gray voxel is the border, distance=1), edge-connected (gray and blue voxels are the border, where the distance is equal to or less than sqrt(2)) and corner-connected (gray, blue and green voxels are all the border, where the distance is equal to or less than sqrt(3)).

2D image

2D image only has face-connected neighbors and edge-connected neighbors.

Note: This post is to illustrate the fully_connected argumennt in seg-metrics

AI可解释性

1 minute read

Published:

简介

CNN通常由一些卷积块(卷积层+激活层+池化层)和全连接层组成。

CNN的可解释性可以通过以下几种方式进行。

特征图可视化

通过直接展示中间层的特征图(一般要归一化到0-256之间),观察经过每一层卷积之后图片的变化。如下图 上图为某CNN 5-8 层输出的某喵星人的特征图的可视化结果(一个卷积核对应一个小图片)。可以发现越是低的层,捕捉的底层次像素信息越多,特征图中猫的轮廓也越清晰。越到高层,图像越抽象,稀疏程度也越高。这符合我们一直强调的特征提取概念。

可视化卷积核

想要观察卷积神经网络学到的过滤器,一种简单的方法是获取每个过滤器所响应的视觉模式。我们可以将其视为一个优化问题,即从空白输入图像开始,将梯度上升应用于卷积神经网络的输入图像,让某个过滤器的响应最大化,最后得到的图像是选定过滤器具有较大响应的图像。 更多图片和解释见此文

把目标特征图通过DeConv, unpool, unrelu等逆向操作,映射回原图的分辨率

从而得到下面2张。图中前几层看起来是纹理,后几层是更高级的综合性的信息


问题1: 为什么上面2张图中后面的特征图反而像素更高?难道不是后面的特征图的尺寸更小,像素更低吗? 答案1: 因为这并不是直接的特征图可视化,而是DeConv: 把中间层的特征,一层一层反传到输入端口,使得特征图的尺寸逐步放大到和输入图片相同 (原文:”map these activities back to the input pixel space, showing what input pattern originally caused a given activation in the feature maps”)。

问题2 这个DeConvnet生成的图,输入是什么?是最后一个卷积块的输出(全连接层之前)? 答案2 论文原话:”To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature maps as input to the attached deconvnet layer”.

问题2:backpropagation, deconvnet 和guided backgpropagati的区别是什么? **答案2 导向反向传播与反卷积网络的区别在于对ReLU的处理方式。在反卷积网络中使用ReLU处理梯度,只回传梯度大于0的位置,而在普通反向传播中只回传feature map中大于0的位置,在导向反向传播中结合这两者,只回传输入和梯度都大于0的位置,这相当于在普通反向传播的基础上增加了来自更高层的额外的指导信号,这阻止了负梯度的反传流动,梯度小于0的神经元降低了正对应更高层单元中我们想要可视化的区域的激活值(https://blog.csdn.net/KANG157/article/details/113154590)

CAM Series 类别激活图系列

CAM是一系列类似的激活图。大致原理是把最后一层卷积所输出的特征图通过加权和组成一张图。权重的设计不同产生了一系列不同的论文。 下面一个一个介绍。

(GAP-)CAM (不需要梯度)

该方法的缺点是只能适用于最后一层特征图和全连接之间是GAP操作。 如果不是,就需要用户修改网络并重新训练(或 fine-tune)。所以对本文简单概括即为:修改网络全连接为GAP形式,利用GAP层与全连接的权重作为特征融合权重,对特征图进行线性融合获取CAM。

Grad-CAM

Grad-CAM和 CAM 的区别?

  1. CAM 只能用于最后一层特征图和输出之间是GAP的操作,grad-CAM可适用非GAP连接的网络结构;
  2. CAM只能提取最后一层特征图的热力图,而gard-CAM可以提取任意一层;

Score-CAM (不需要梯度)

对获取的特征图进行channel-wise遍历,对每层特征图进行上采样+归一化,与原始图片进行pixel-wise相乘融合,然后送进网络获取目标类别score(softmax后),减去baseline(全黑图片)的目标类别score,获取CIC。再进行softmax操作来保证所有CIC之和为1。最后将CIC作为特征融合权重融合需要可视化的特征层。

RAM回归激活图

Saliency map 显著性图

Saliency map

If we change one pixel in our input image, how much will it affect the final probability score?

Well, first of all, one way to calculate this is to perform a backpropagation and to calculate a gradient of the score with respect to this pixel value. This is easily doable in PyTorch. Then, we can repeat this process for all pixels and record the gradient values. As a result, we will get high values for the location of a dog. Note, that in this example we get a “ghostly image” of a dog, signifying that the network is looking in the right direction!

为什么Saliency map似乎没有CAM那么火呢?

Occlusion sensitifity 遮挡敏感度

可视化工具与项目

CNN可视化技术总结(四)–可视化工具与项目

参考文献:

  1. https://rpmarchildon.com/wp-content/uploads/2018/06/RM-CNN-Schematic-1.jpg
  2. https://miro.medium.com/max/638/0*qgBQt9dMbUtntbpn.jpg
  3. https://blog.csdn.net/fu6543210/article/details/80407911
  4. https://www.tinymind.net.cn/articles/3dedb66b996232
  5. https://hackmd.io/@machine-learning/ByaTE80BI#ZFNetDeconvNet-Summary-and-Implementation
  6. https://datahacker.rs/028-visualization-and-understanding-of-convolutional-neural-networks-in-pytorch/
  7. https://zhuanlan.zhihu.com/p/269702192
  8. https://blog.51cto.com/u_14411234/3115810
  9. https://github.com/utkuozbulak/pytorch-cnn-visualizations#inverted-image-representations

Classical papers on CNN

less than 1 minute read

Published:

  • AlexNet (2012)
    1. 5个卷积块+3个全连接,建立了CNN的标准流程。
    2. 加速策略:模型并行。
  • ZF Net (2013)
    1. 可视化和理解神经网络。CNN领域可视化理解的开山之作,这篇文献告诉我们CNN的每一层到底学习到了什么特征,然后作者通过可视化进行调整网络,提高了精度。
    2. DeConvNet
  • VGG (2014)
    1. 相当经典!

Reference:

  1. https://adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
  2. https://blog.csdn.net/fu6543210/article/details/80407911
  3. https://www.tinymind.net.cn/articles/3dedb66b996232
  4. https://hackmd.io/@machine-learning/ByaTE80BI#ZFNetDeconvNet-Summary-and-Implementation

Why Pooling? 为什么要用Pooling?

less than 1 minute read

Published:

我发现Pooling会让CAM的分辨率降低,所以,为什么要用Pooling层或者stride>1的卷积层呢?

Answer:

  1. Push the model to be invariant to small shift/rotation/scale/perturbations of the data。
  2. Reduce network complexity, make training easier
  3. Save time/computing
  4. Increase reception field-,%E6%B1%A0%E5%8C%96%E5%B1%82%E5%8F%AF%E4%BB%A5%E5%A2%9E%E5%8A%A0%E6%84%9F%E5%8F%97%E9%87%8E.,-%E4%B8%8D%E8%BF%87%E5%8D%B7%E7%A7%AF)

I do not believe it. Or I think PFT regression net should be sensitive to smalll (organ) perturbation. Pooling will lose such information. I want to try if No-Pooling can lead to the same performance.

P value

less than 1 minute read

Published:

假设检验(hypothesis testing),又称统计假设检验,是用来判断样本与样本样本与总体的差异是由抽样误差引起还是本质差别造成的统计推断方法。显著性检验是假设检验中最常用的一种方法,也是一种最基本的统计推断形式,其基本原理是先对总体的特征做出某种假设,然后通过抽样研究的统计推理,对此假设应该被拒绝还是接受做出推断。常用的假设检验方法有Z检验、t检验、卡方检验、F检验等。

假设检验的基本思想是“小概率事件”原理,其统计推断方法是带有某种概率性质的反证法。小概率思想是指小概率事件在一次试验中基本上不会发生。反证法思想是先提出检验假设,再用适当的统计方法,利用小概率原理,确定假设是否成立。即为了检验一个假设H0是否正确,首先假定该假设H0正确,然后根据样本对假设H0做出接受或拒绝的决策。如果样本观察值导致了“小概率事件”发生,就应拒绝假设H0,否则应接受假设H0。

个人笔记:比如我假设A方法和B方法效果相同(这是针对总体特征的假设),为了检验这个假设是否正确,首先假定该假设H0正确。然后根据样本(A方法产生a组数据100个,均值为3.4,B方法产生b组数据100个,均值为3.6)对假设H0做出接受或拒绝的决策。如果样本观察值导致了“小概率事件”发生(计算a组的SD和b组的SD,结合),就应拒绝假设H0,否则应接受假设H0。

标准差与标准误差Standard

Interesting tools/packages/blogs

less than 1 minute read

Published:

  1. CAM: https://github.com/jacobgil/pytorch-grad-cam
  2. Python scripts: https://github.com/metafy-social/python-scripts
  3. SPSS’s several open source alternative:
    1. https://www.jamovi.org/
    2. https://jasp-stats.org/
    3. https://spssau.com/

安装Python包的N种方式

less than 1 minute read

Published:

安装Python包的N种方式

通过pip或者conda安装已经发布的包

最经典最常用的方式:pip install [package_name]

绝大部分的Python包可以通过pip install [package_name]来安装。因为绝大部分Python包都会被上传到PyPi.

conda install [package_name]

对于conda的用户,另一种常用的方式是conda install [package_name]。但是其实并不是所有的包都会被上传到conda所维护的服务器上。对于Python包的开发者而言,第一选择肯定是传到Python官方的包托管平台PyPi上,如果还有时间或者想进一步推广自己的包的话,会选择再把自己的包上传到conda的包托管平台。而且conda有两个包托管的源:Anaconda公司自己维护的源(包质量比较稳定,更新较慢)和社区维护的conda-forge(更新较快)。

安装本地的包

一般我们自己写好了Python包之后,如果只给自己用,一般可以通过两种方式:

相对导入。比如:from .. import [module_name]

绝对导入之把包的根目录加载到环境变量。

  • 用Python导入:
      import sys
      sys.path.append(r'E:\src\ttlayer')
      sys.path.insert(0,r'E:\src\ttlayer')
    
  • 在Linux里导入

    方法1:

    export PATH=/home/uusama/mysql/bin:$PATH
    # 或者把PATH放在前面
    export PATH=$PATH:/home/uusama/mysql/bin
    

    注意事项: 生效时间:立即生效 生效期限:当前终端有效,窗口关闭后无效 生效范围:仅对当前用户有效 配置的环境变量中不要忘了加上原来的配置,即$PATH部分,避免覆盖原来配置

    方法2: 通过修改用户目录下的~/.bashrc文件进行配置.在最后一行加上: export PATH=$PATH:/home/uusama/mysql/bin 注意事项: 生效时间:使用相同的用户打开新的终端时生效,或者手动source ~/.bashrc生效 生效期限:永久有效 生效范围:仅对当前用户有效 如果有后续的环境变量加载文件覆盖了PATH定义,则可能不生效

绝对导入之把包进行安装

  1. python setup.py install
  2. python setup.py develop
  3. pip install .
  4. pip install -e .

What is the difference between the above 4 commands? Answer: This link

  1. conda develop .

安装open3d

less than 1 minute read

Published:

It took me a long time to install open3d

At first, I tried to install it on cluster, but enven it successfully installed, it raised ERROR during importing it in Python. This is because it require glibc higher than 2.18 or 2.27 (newest version). But my cluster has glibc of version 2.17.

Then, I tried to install it on my own workstation but still faied because my own workstation does not have a good GPU I think.

Finally, I installed it on my own PC using “pip install open3d” in a brand new conda environment.

How to index torch.Tensor using Tensor

1 minute read

Published:

How to index torch.Tensor using Tensor?

In numpy.Aarray or torch.Tensor, we have the following methods to index an Array or Tensor:

np_a = np.random.randn((30,4,5))
ts_a = torch.randn((30,4,5))

1. Basic indexing: int number to index one element

tmp = np_a[1]
tmp = ts_a[1]

2. Less basic indexing: start:end to index a range of continuous elements

tmp = np_a[10:13]
tmp = ts_a[:13]
tmp = np_a[:-2]

3. Advanced indexing: index multiple elements in different positions (Indexing with Arrays/Tensors of Indices)

e.g. select the 1st, 3rd, 8th, and 29th element from np_a.

e.g. select the [0,0,0] and [1,1,1] and [20,1,2] three elements. #

Array with shape of [N,]

a = np.arange(12)**2  # the first 12 square numbers
i = np.array([1,1,3,8,5])
a[i]  # array([ 1,  1,  9, 64, 25])

j = np.array([[3, 4], [9, 7]])  # a bidimensional array of indices
a[j]  # the same shape as `j` array([[ 9, 16], [81, 49]])

Array with shape of [N,M,…]

https://numpy.org/doc/stable/user/quickstart.html#advanced-indexing-and-index-tricks

Using one Array_A to index another Array_B, the new Array has the same shape with Array_A.

What will happen if using multiple Arrays to index another Array_T?

Answer: The ‘multiple Arrays’ should have the same shapes and the number of Arrays should be the same as the dimensions of the target Array_T. Then the output Array has the same shape with the indexing Arrays.

What will happen if the dimension of target Array is smaller than the indexing Array?

e.g.

Arr_T = np.ones((5,6))
Arr_idx = np.ones((2,3,4))
Arr_T[Arr_idx,Arr_idx]

It does not matter. The output Array has the same shape with Arr_idx.

Note: The first and second method also apply to list.

MLflow FAQ

1 minute read

Published:

MLflow with Databricks

Today I learnt how to use MLFlow in Databricks [1].

By the way, today I just found that MLFlow is released and maintained by Databricks! Databricks is so great!

This tutorial covers the following steps:

  1. Import data from your local machine into the Databricks File System (DBFS)
  2. Visualize the data using Seaborn and matplotlib
  3. Run a parallel hyperparameter sweep to train machine learning models on the dataset
  4. Explore the results of the hyperparameter sweep with MLflow
  5. Register the best performing model in MLflow
  6. Apply the registered model to another dataset using a Spark UDF
  7. Set up model serving for low-latency requests (This is new to me!)

References:

  1. https://docs.databricks.com/mlflow/end-to-end-example.html

MLflow, restore runs/experiments

restore experiments

In python [2]:

# restore experiments
from mlflow import MlflowClient

def print_experiment_info(experiment):
    print("Name: {}".format(experiment.name))
    print("Experiment Id: {}".format(experiment.experiment_id))
    print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

# Create and delete an experiment
client = MlflowClient()
experiment_id = client.create_experiment("New Experiment")
client.delete_experiment(experiment_id)

# Examine the deleted experiment details.
experiment = client.get_experiment(experiment_id)
print_experiment_info(experiment)
print("--")

# Restore the experiment and fetch its info
client.restore_experiment(experiment_id)
experiment = client.get_experiment(experiment_id)
print_experiment_info(experiment)

restore runs

In python [2]:


from mlflow import MlflowClient

# Create a run under the default experiment (whose id is '0').
client = MlflowClient()
experiment_id = "0"
run = client.create_run(experiment_id)
run_id = run.info.run_id
print("run_id: {}; lifecycle_stage: {}".format(run_id, run.info.lifecycle_stage))
client.delete_run(run_id)
del_run = client.get_run(run_id)
print("run_id: {}; lifecycle_stage: {}".format(run_id, del_run.info.lifecycle_stage))
client.restore_run(run_id)
rest_run = client.get_run(run_id)
print("run_id: {}; lifecycle_stage: {}".format(run_id, res_run.info.lifecycle_stage))

MLflow, how to delete the runs/experiments Permanently?

In CLI [1]:

mlflow experiments delete [OPTIONS]
mlflow experiments restore [OPTIONS]
mlflow experiments search [OPTIONS]

mlflow gc [OPTIONS]  # delete the experiments permanently 

In CLI [1]:

mlflow runs delete [OPTIONS]
mlflow runs restore [OPTIONS]  # --run-id <run_id> 
mlflow runs list [OPTIONS]

mlflow gc [OPTIONS]  # delete the experiments permanently 

References:

  1. https://mlflow.org/docs/latest/cli.html#mlflow-gc
  2. https://mlflow.org/docs/latest/python_api/mlflow.client.html#mlflow.client.MlflowClient.restore_run

Distributed file system

less than 1 minute read

Published:

直连存储(DAS):

存储和数据直连,拓展性、灵活性差。 为了扩展,将文件和服务分离,通过网络连接——

中心化存储(NAS、SAN):

设备类型丰富,通过网络互连,具有一定的拓展性,但是受到控制器能力限制,拓展能力有限。同时,设备到了生命周期要进行更换,数据迁移需要耗费大量的时间和精力。

分布式存储:

通过网络使用企业中的每台机器上的磁盘空间,并将这些分散的存储资源构成一个虚拟的存储设备,数据分散的存储在企业的各个角落。

主流分布式文件存储系统

目前主流的分布式文件系统有:GFS、HDFS、Ceph、Lustre、MogileFS、MooseFS、FastDFS、TFS、GridFS等。

Reference:

  1. https://zhuanlan.zhihu.com/p/350096155

FLOPs and parameters in Neural Networks

less than 1 minute read

Published:

How to compare two neural networks?

There are several metrics:

  1. Number of parameters
  2. FLOPs
  3. GPU occupancy (nvidia-smi)
  4. Training/inference time

Here we only talk about FLOPs and parameters. Note: Floating point of operations (FLOPs) is different with floating point of per second (FLOPS). The FLOPS is the same for the same hardware device, but the FLOPs are different for different networks.

FLOPs is related to network design: Number of layers, activation layer selection, parameters, etc.

The difference between FLOPs and parameters is shown at the top figure.

Because Convolution layer can share the kernel, its parameters is far lower than the FLOPs.

PS: Throughput refers to the number of examples (or tokens) that are processed within a specific period of time, e.g., “examples (or tokens) per second”.

PS: Normally MACs (multiply–accumulate operation) are the half of FLOPs.

YAML for data split

less than 1 minute read

Published:

Why do we need YAML for data split?

Normally I used a fixed random seed to split dataset. However, I found that my dataset has been slightly and gradually updated with the development of the project.

For instance, in the latter experiments, I found that some patients should be excluded. Then should I re-train all the previous expreiments again? if not, how to ensure the following experiments use the same training/validation/testing data with the previous experiment? (The same seed for different lenth of patient list will lead to very different data split)

So let’s use a YAML file to split dataset so that we can always have the almost same split for training/validation/testing data.

A complete YAML tutorial could be found at Real Python

Difference between YAML, JSON and XML is here

FAQ for Git

less than 1 minute read

Published:

Commonly used commands

git push origin master

ref

How do I change the URI (URL) for a remote Git repository?

  1. For adding or changing the remote origin:
    git remote set-url origin new.git.url/here
    
  2. To see which remote URL you have currently in this local repository:
    git remote show origin
    

Git push requires username and password?

How to let git remember my username and password?

$ git config credential.helper store
$ git push https://github.com/owner/repo.git

Username for 'https://github.com': <USERNAME>
Password for 'https://[email protected]': <PASSWORD>

提示没有权限

可能是自己的用户名和邮箱号不对,Error提示一般如下:

$ git push origin master
remote: Permission to Jingnan-Jia/segmentation_metrics.git denied to jingnan222.
fatal: unable to access 'https://github.com/Jingnan-Jia/segmentation_metrics.git/': The requested URL returned error: 403

这个问题通常由以下原因造成: 因为你在用Windows!!!把仓库放到linux上就没有这些问题了!!!

为什么我在本地创建标签后,把代码推送到github远程仓库。我在远程github仓库可以查询到最新的代码修改,但是github远程仓库却不显示我最新的标签?

如果您在本地创建了标签,但是在GitHub的远程仓库中不显示,很可能是因为标签没有被推送到远程仓库。Git的标签是不会自动随着git push命令推送的,除非您特别指定。

为了将标签推送到远程仓库,您需要执行以下命令:

推送单个标签:

git push origin <tagname>

替换为您的实际标签名。

推送所有标签:

git push origin --tags

这会将所有本地的标签推送到远程仓库。

请确保您执行了上述命令之一来推送标签。完成后,您应该能够在GitHub仓库的“Tags”部分看到您的新标签。

Use Cloudflare to accelerate your website hosted on GitHub

1 minute read

Published:

Some of my friends in mainland of China complained that they could not visit my website fluently. I need to do something to solve it!

Actually, I have tried to use Gitee or Coding (Chinese GitHub) to backup my current website and use soome technique to forward the visits to Gitee/Coding or GitHub according to the visitors’ location. However, Gitee and Coding is not free and they are not stable. The worse thing is that I have to push my code to Gitee/Coding and GitHub for each update of my website.

So I decided to accelerate my website using Cloudflare. It is free and stable.

Now lets see the steps.

  1. Register your account at https://dash.cloudflare.com/
  2. Select the ‘free plan’ to accelerate your website.
  3. Enter your website domain (e.g. domain-name.com) without any ‘https’ or ‘www’.
  4. “continue”, then you will see the screenshot below:
  5. Go to your domain’s provider website (mine is ‘namespace’), add the nameservers proveded by ‘cloudflare’ to it.

  6. I found that aftet the above 5 steps, our website cannot be visited at all. ERRO is “too many redirects …”. Let me research the reason.
  7. The reason is that my github setting checked the ‘SSL’ which means taht all http://user-domain.com wil be redirected to https://user-domain.com. But Cloudflare only receive http domans. So I unchecked the ‘SSL’ in github. But the website is still not available. ‘too many redirects’ still exists.
  8. No worries. Go to Cloudflare, finish its 4-step beginers’ guide to make your website safer and faster. Sorry I forgot to take a screenshot duirng I did that. After that, you can invite your Chinese friends to visit your website to see if it works.

References:

  1. https://www.sdwebseo.com/cloudflare/
  2. https://community.cloudflare.com/t/community-tip-fixing-err-too-many-redirects/42335
  3. https://www.sdwebseo.com/cloudflare/
  4. https://developers.cloudflare.com/ssl/troubleshooting/too-many-redirects/#:~:text=Redirect%20loops%20will%20occur%20if,all%20HTTP%20requests%20to%20HTTPS.&text=To%20solve%20this%20issue%2C%20either,configured%20at%20your%20origin%20server).

FAQ for PyTorch

1 minute read

Published:

Error: RuntimeError: Trying to resize storage that is not resizable

The shapes of different data are not the same, so they cannot be alligned or collated correctly.

What is the shape for different loss functions?

The shape of the input and target for the loss function is very annoying. So I summarize them here.

CrossEntropyLoss

  1. shape
    predictions = torch.rand(2, 3, 4)
    target = torch.rand(2, 3)
    print(predictions.shape)
    print(target.shape)
    nn.CrossEntropyLoss(predictions.transpose(1, 2), target)  # the shape should be transposed!
    
  2. type

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target'

input can be in any format, just targets should be in long.

# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)  # target should be long
output = loss(input, target)
output.backward()

# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)  # target with probabilities should be converted to between [0,1].
output = loss(input, target)
output.backward()

NLLLoss

Dice loss

Because I used the monai.DiceLoss, the shape should be

error: TypeError: only integer tensors of a single element can be converted to an index

Answer: change x to torch.tensor(x)

Point cloud for classification

1 minute read

State of the art networks for different tasks

  1. paperswithcode

Ranking of point cloud classification networks

  1. 3D Point Cloud Classification

From the above link, it is observed that the most pupular datasets for point cloud classification is MOdelNet40 and ScanObjectNN.

For MOdelNet40 (released in 2015), in the top 10 networks, I first exclude the networks with extra training dataset, I can obtain:

  • PointView-GCN
  • RepSurf-U
  • PointMLP+HyCoRe
  • PointMLP
  • PointNet2+PointCMT
  • CurveNet
  • RPNet

They are all published in 2021 or 2022.

For ScanObjectNN (released in 2019), I first exclude the networks with extra training dataset, I can obtain:

  • PointNeXt+Local
  • PointNeXt+GAM
  • PointNeXt+HyCoRe
  • PointNeXt
  • PointStack

They are all published in 2022.

Why the top networks for two datasets are so different? Which network should I choose for my dataset (PFT regression from binary vessel tree)?

ModelNet40 is synthetic, while ScanObjectNN is real-world dataset.

So I prefer to try the top networks in ScanObjectNN, which includes:

  • PointNet
  • PointNet++
  • PointCNN
  • PointMLP
  • PointNeXt
  • PointNeXt+Local (idea is clear, but code seems not complete)

Different networks for point cloud classification

  • PointNet. Processed raw point sets through Multi-Layer Perceptrons (MLPs). While aggregating features at the global level using max-pooling operation, lost valuable local geometric information.

  • PointNet++. employed ball querying and k-Nearest Neighbor (k-NN) querying to query local neighborhoods to extract local semantic information. But it still lost contextual information due to the max-pooling operation

  • PointNeXt.
    • Data augmentation
    • we find that neither appending more SA blocks nor using more channels leads to a noticeable improvement in accuracy, while causing a significant drop in throughput
  • PointNeXt+Local. In previous paper, once the local features are obtained, the original neighborhood points, the directional vectors, and the distance computed (in the case of ball querying) are discarded. In this paper, we use the radius-normalized distance and directional vectors as additional local neighborhood features with minimal additional memory or computational costs.

How to get the fiel creation or docification time with seconds?

less than 1 minute read

Published:

I want to get the fiel creation or docification time with seconds!!!

In windows, it seems impossible to see the exect time in Windows explorer, because it only shows the date and hour and minutes, missing the seconds!

Luckily, we have Python!

modified = os.path.getmtime(file)
print("Date modified: "+time.ctime(modified))
print("Date modified:",datetime.datetime.fromtimestamp(modified))

# out: 
# Date modified: Tue Apr 21 11:50:46 2015
# Date modified: 2015-04-21 11:50:46


Sort files from Glob by time

files.sort(key=os.path.getctime) from here.

Python debugger

less than 1 minute read

Published:

pnb

https://www.tjelvarolsson.com/blog/five-exercises-to-master-the-python-debugger/

FAQ for VS Code

less than 1 minute read

Published:

Timed out waiting for debuggee to spawn

This issue is pretty annoying! I coundnot find the solution.

FAQ for Overleaf and LaTex

1 minute read

Published:

  1. How to balance the LaTex and Word?

    Answer: Sending Word to supervisors if they preferred (convert LaTex to Word with pandoc if we already have LaTex version). Go back to LaTex in the final version.

    From: https://www.zhihu.com/question/22316670

  2. How to convert LaTex to Word? Answer1 (easiest and most correct way to convert PF to Word): use adobe online: https://www.adobe.com/acrobat/online/pdf-to-word.html Answer2: pandoc

    Example:

     # in windows terminal
     pandoc main.tex -o main.docx
    

    We need at first download pandoc.exe from https://pandoc.org/installing.html and save the pandoc.exe to the same directory with LaTex files.

    Advanced Example

     pandoc input.tex  --filter pandoc-crossref --citeproc --csl springer-basic-note.csl  --bibliography=reference.bib -M reference-section-title=Reference  -M autoEqnLabels -M tableEqns  -t docx+native_numbering --number-sections -o output.docx
    

    In the above example, the springer-basic-note.csl should be downloaded from https://www.zotero.org/styles?format=numeric and save it to the same directory with pandoc.exe. we can also download other styles like ieee from there.

    The above advanced example requires pandoc-crossref which could be installed from: https://github.com/lierdakil/pandoc-crossref/releases

    From:

    1. https://blog.csdn.net/weixin_39504048/article/details/80999030
    2. [Markdown 与 Pandoc](https://sspai.com/post/64842)
    3. [使用 Pandoc 将 Latex 转化为 Word (进阶版本,包括引用图标和references) ](https://xhan97.github.io/latex/PandocLatex2Word.html)
    
  3. How to cleaning up a .bib file?

    Answer: manually clean it and check it.

  4. Workflow: Write papers on overflow, download the source files, convert them to docx, send the docx to supervisors to review it, get the feedback, update the overflow (or just update the overflow in the final step before submission).

FAQ

1 minute read

Published:

  1. When I successfully installed WandB, I could not use it. The erroe is: ``` Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving. wandb: Currently logged in as: jiajingnan (lkeb). Use wandb login --relogin to force relogin Thread HandlerThread: Traceback (most recent call last): File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py”, line 49, in run self._run() File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py”, line 100, in _run self._process(record) File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/internal/internal.py”, line 280, in _process self._hm.handle(record) File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/internal/handler.py”, line 136, in handle handler(record) File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/internal/handler.py”, line 146, in handle_request handler(record) File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/internal/handler.py”, line 697, in handle_request_run_start self._tb_watcher = tb_watcher.TBWatcher( File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/internal/tb_watcher.py”, line 120, in init wandb.tensorboard.reset_state() File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/lib/lazyloader.py”, line 60, in getattr module = self._load() File “/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/sdk/lib/lazyloader.py”, line 35, in _load module = importlib.import_module(self.name) File “/home/jjia/.conda/envs/py38/lib/python3.8/importlib/init.py”, line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File “", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/integration/tensorboard/__init__.py", line 5, in from .log import _log, log, reset_state, tf_summary_to_dict # noqa: F401 File "/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/wandb/integration/tensorboard/log.py", line 35, in Summary = pb.Summary if pb else None File "/home/jjia/.conda/envs/py38/lib/python3.8/importlib/util.py", line 245, in __getattribute__ self.__spec__.loader.exec_module(self) File "/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in from tensorboard.compat.proto import histogram_pb2 as tensorboard_dot_compat_dot_proto_dot_histogram__pb2 File "/home/jjia/.conda/envs/py38/lib/python3.8/site-packages/tensorboard/compat/proto/histogram_pb2.py", line 36, in _descriptor.FieldDescriptor( File "/home/jjia/.local/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 561, in __new__ _message.Message._CheckCalledFromGeneratedFile() TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:
  2. Downgrade the protobuf package to 3.20.x or lower.
  3. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates wandb: ERROR Internal wandb error: file data was not synced

```

I havenot found the reason.

Python Environments variables

less than 1 minute read

Published:

环境变量是指当在终端运行命令的时候,从哪些文件夹去找这个命令。比如python有很多个也许,用哪一个文件夹里的python呢?就需要沿着环境变量里的所有文件夹一个一个去找,一旦找到就停止,不再继续。

  1. check python environment variables:
    1. echo $PATH # in one line
    2. echo -e ${PATH//:/\\n} # line by line
  2. python在import的时候是怎样的查找顺序呢?
    1. built-in list 中先找。
    2. 在sys.path中找,这是一个列表,包括下面5部分(有先后顺序!):
      1. 程序的根目录 (即当前运行python文件的目录。PYTHONPATH环境变量设置的目录
      2. 标准库的目录
      3. 任何能够找到的.pth文件的内容。 # 所以可以把自定义的包路径添加到这个文件里面
      4. 第三方扩展的site-package目录
  3. 如何手动添加路径?
    1. 在site-package/*.pth文件中添加
    2. 在python代码最开头添加,如下
      import sys
      sys.path.append("/home/my/path)
      
  4. 为什么我的python不是我想要的python?比如我明明在一个python3.8的conda环境里,但是which python显示的并不是pythonn3.8。 答:这是因为默认的python是从$PATH中的文件夹一个一个按照顺序查找出来的第一个python。通过echo $PATH可以验证。这时候需要搞清楚$PATH怎么就被改动了?去看看.bashrc,那里面可能会发现export PATH=”your/new/path:$PATH”,大概率就是对$PATH进行改动的代码(新路径插入到开头去了)。注释掉或者调整路径添加的顺序就好了。

Reference:

  1. https://blog.csdn.net/qq_27825451/article/details/100552739

point cloud

less than 1 minute read

Published:

Datasets

  1. ModelNet ModelNet是帶顏色的! (reference: https://blog.csdn.net/weixin_47142735/article/details/120223827)

    .off那个是CAD模型,modelnet40_ply_hdf5_2048是基于原来的采样了一下,只有点的坐标,还有一个版本是带法向量的。

    Reference: https://blog.csdn.net/qq_41895003/article/details/105431335

    一、介绍

    物体文件格式(.off)文件用于表示给定了表面多边形的模型的几何体。这里的多边形可以有任意数量的顶点。

    普林斯顿形状Banchmark中的.off文件遵循以下标准:

    1、.off文件为ASCII文件,以OFF关键字开头。

    2、下一行是该模型的顶点数,面数和边数。边数可以忽略,对模型不会有影响(可以为0)。

    3、顶点以x,y,z坐标列出,每个顶点占一行。

    4、在顶点列表之后是面列表,每个面占一行。对于每个面,首先指定其包含的顶点数,随后是这个面所包含的各顶点在前面顶点列表中的索引。

    即以下格式:

    OFF

    顶点数 面数 边数 x y z x y z … n个顶点 顶点1的索引 顶点2的索引 … 顶点n的索引 …

    一个立方体的简单例子:

    COFF 8 6 0 -0.500000 -0.500000 0.500000 0.500000 -0.500000 0.500000 -0.500000 0.500000 0.500000 0.500000 0.500000 0.500000 -0.500000 0.500000 -0.500000 0.500000 0.500000 -0.500000 -0.500000 -0.500000 -0.500000 0.500000 -0.500000 -0.500000 4 0 1 3 2 4 2 3 5 4 4 4 5 7 6 4 6 7 1 0 4 1 7 5 3 4 6 0 2 4

    Reference: https://blog.csdn.net/A_L_A_N/article/details/84874463?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

How to absolutely remove big or sensitive files from git history?

less than 1 minute read

Published:

git-filter-repo

Install it

python3 -m pip install --user git-filter-repo

Source: https://superuser.com/questions/1563034/how-do-you-install-git-filter-repo

How to use it?

  1. cd YOUR_REPOSITORY
  2. git filter-repo –analyze # analyze big files, you will see the report in .git/filter-repo/analyze/
  3. git filter-repo –invert-paths –path FILE_NAME1 –path FILE_NAME2

Reference:

  • https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-the-git-repository
  • https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html

自动调参算法

less than 1 minute read

Published:

自动调参方法

常用的自动调参方法主要分为两类: 简单搜索方法基于模型的序贯优化方法.

简单搜索方法

简单搜索方法指的是一些通用的, 朴素的搜索策略. 常用的方法主要包括: 随机搜索(Random Search)和网格搜索(Grid Search).

随机搜索采用随机的搜索方式生成数据点, 由于完全随机, 随机搜索效果并不稳定, 但也不会陷入最优, 当不对随机搜索次数做限制时, 会产生令人意想不到的效果.

网格搜索则会遍历用户设置的所有数据点, 由于用户配置的网格点之间是有间隙的, 因此, 网格搜索很可能会遗漏一些优异样本点, 网格搜索的效果完全取决于用户的配置.

Alt text

如图所示, 当最优超参配置存在于网格的间隙中时, 通过网格搜索无法得到最优配置, 而通过随机搜索却有一定可能.

基于模型的序贯优化方法

基于模型的序贯优化方法(SMBO, Sequential Model-Based Optimization)是一种贝叶斯优化的范式. 这类优化方法适合于在资源有限的情况下获得较好的超参数.