<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://cendok.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://cendok.github.io/" rel="alternate" type="text/html" /><updated>2026-04-04T12:02:19+00:00</updated><id>https://cendok.github.io/feed.xml</id><title type="html">Cendok</title><subtitle>💪Exploration yields rewards.</subtitle><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><entry><title type="html">Handwritten-Digit-Recognition-System</title><link href="https://cendok.github.io/2026/03/22/Handwritten-Digit-Recognition-System" rel="alternate" type="text/html" title="Handwritten-Digit-Recognition-System" /><published>2026-03-22T00:00:00+00:00</published><updated>2026-03-22T00:00:00+00:00</updated><id>https://cendok.github.io/2026/03/22/Handwritten-Digit-Recognition-System</id><content type="html" xml:base="https://cendok.github.io/2026/03/22/Handwritten-Digit-Recognition-System"><![CDATA[<blockquote>
  <p>Handwritten-Digit-Recognition-System</p>
</blockquote>

<p>附上码源：</p>

<p><a href="https://github.com/Cendok/Handwritten-Digit-Recognition-System">Cendok/Handwritten-Digit-Recognition-System</a></p>

<h2 id="pycharm中新建虚拟环境">Pycharm中新建虚拟环境</h2>

<p>Python3.9(Handwritten-Digit-Recognition-System)</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pip</span> <span class="n">install</span> <span class="n">torch</span>
<span class="n">pip</span> <span class="n">install</span> <span class="n">torchvision</span>
<span class="n">pip</span> <span class="n">install</span> <span class="n">flask</span>
</code></pre></div></div>

<h2 id="pthpy">pth.py</h2>

<p>预先训练好模型，模型参数已上传至GitHub</p>

<p>添加保存模型</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">save_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">filename</span><span class="p">):</span>
    <span class="n">torch</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">state_dict</span><span class="p">(),</span> <span class="n">filename</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"保存模型至 </span><span class="si">{</span><span class="n">filename</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>

<span class="c1">#一块用
</span><span class="n">save_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s">'./mnist_cnn_model.pth'</span><span class="p">)</span>
</code></pre></div></div>

<p>图像处理首选卷积神经网络，MNIST数据集压缩包已上传至GitHub</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#添加这段、调整后段代码缩进，避免启动Flask同时也启动训练，不必等待训练完10轮再启动网页
</span><span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="n">loss_fn</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">CrossEntropyLoss</span><span class="p">()</span><span class="c1">#交叉熵损失函数，订正试卷
</span>    <span class="n">optimizer</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">optim</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">(),</span><span class="n">lr</span> <span class="o">=</span> <span class="mf">0.001</span><span class="p">)</span>
    <span class="n">epochs</span> <span class="o">=</span> <span class="mi">10</span> <span class="c1">#到底选择多少呢？
</span>    <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">epochs</span><span class="p">):</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Epoch </span><span class="si">{</span><span class="n">t</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="se">\n</span><span class="s">-------------------------------"</span><span class="p">)</span>
        <span class="n">train</span><span class="p">(</span><span class="n">train_dataloader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">loss_fn</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">)</span>
        <span class="n">test</span><span class="p">(</span><span class="n">test_dataloader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">loss_fn</span><span class="p">)</span>
    <span class="n">save_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="s">'./mnist_cnn_model.pth'</span><span class="p">)</span>

    <span class="k">print</span><span class="p">(</span><span class="s">"Done!"</span><span class="p">)</span>
    <span class="n">test</span><span class="p">(</span><span class="n">test_dataloader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">loss_fn</span><span class="p">)</span>
</code></pre></div></div>

<h2 id="apppy">app.py</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#设置成能找到静态文件的路径，CSS、JavaScript、图像等，static_url_path="/static"
</span><span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="n">__name__</span><span class="p">,</span> <span class="n">static_url_path</span><span class="o">=</span><span class="s">"/static"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># predict()函数必须跟在这段下面
# 路径名称"/predict"必须跟定义的函数一致def predict()
</span><span class="o">@</span><span class="n">app</span><span class="p">.</span><span class="n">route</span><span class="p">(</span><span class="s">"/predict"</span><span class="p">,</span> <span class="n">methods</span><span class="o">=</span><span class="p">[</span><span class="s">"GET"</span><span class="p">,</span> <span class="s">"POST"</span><span class="p">])</span>
<span class="o">@</span><span class="n">torch</span><span class="p">.</span><span class="n">no_grad</span><span class="p">()</span>

<span class="k">def</span> <span class="nf">predict</span><span class="p">():</span>
    <span class="n">info</span> <span class="o">=</span> <span class="p">{}</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="n">image_file</span> <span class="o">=</span> <span class="n">request</span><span class="p">.</span><span class="n">files</span><span class="p">[</span><span class="s">"file0"</span><span class="p">]</span>  <span class="c1"># 从前端获取文件
</span>        <span class="n">img_bytes</span> <span class="o">=</span> <span class="n">image_file</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>  <span class="c1"># 读取文件内容
</span>        <span class="n">image_path</span> <span class="o">=</span> <span class="s">'./number/digit1.png'</span>  <span class="c1"># 保存路径
</span>        <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">image_path</span><span class="p">,</span> <span class="s">'wb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
            <span class="n">f</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">img_bytes</span><span class="p">)</span>  <span class="c1"># 保存图像文件
</span>
        <span class="c1"># 加载并处理图像
</span>        <span class="n">digit_image</span> <span class="o">=</span> <span class="n">load_digit_image</span><span class="p">(</span><span class="n">image_path</span><span class="p">)</span>
        <span class="n">predicted_digit</span> <span class="o">=</span> <span class="n">predict_digit</span><span class="p">(</span><span class="n">digit_image</span><span class="p">)</span>  <span class="c1"># 预测数字
</span>
        <span class="n">info</span><span class="p">[</span><span class="s">"result"</span><span class="p">]</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"预测的数字是：</span><span class="si">{</span><span class="n">predicted_digit</span><span class="si">}</span><span class="s">"</span>  <span class="c1"># 返回结果
</span>    <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="n">info</span><span class="p">[</span><span class="s">"err"</span><span class="p">]</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">jsonify</span><span class="p">(</span><span class="n">info</span><span class="p">)</span>  <span class="c1"># 返回json格式结果
</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">'__main__'</span><span class="p">:</span>
    <span class="n">app</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">host</span><span class="o">=</span><span class="s">"0.0.0.0"</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">1235</span><span class="p">)</span>
<span class="c1">#app.run(debug=True, host="0.0.0.0", port=1235)
#关闭调试，否则无限循环训练，无法打开网页
</span></code></pre></div></div>

<h2 id="indexhtml">index.html</h2>

<p>html模板下载链接：</p>

<p><a href="https://sc.chinaz.com/tag_moban/html.html">HTML模板_HTML网页模板下载</a></p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;head&gt;</span>
改名称
</code></pre></div></div>

<p>功能模块直接放在&lt;body&gt;内，别套壳</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;section</span> <span class="na">class=</span><span class="s">"bg-upcoming-events"</span><span class="nt">&gt;</span>
            <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"container"</span><span class="nt">&gt;</span>
                <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"row"</span><span class="nt">&gt;</span>
                    <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"upcoming-events"</span><span class="nt">&gt;</span>
                        <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"section-header"</span><span class="nt">&gt;</span>
                            <span class="nt">&lt;h1&gt;</span><span class="ni">&amp;#128519;</span><span class="nt">&lt;/h1&gt;</span>
                            <span class="nt">&lt;p&gt;</span>上传待识别的数字图像，点击预测按钮进行识别<span class="nt">&lt;/p&gt;</span>
                        <span class="nt">&lt;/div&gt;</span>
                        <span class="nt">&lt;style&gt;</span>
                            <span class="nc">.section-header</span> <span class="p">{</span>
                                <span class="nl">text-align</span><span class="p">:</span> <span class="nb">center</span><span class="p">;</span> <span class="c">/* 使文本居中 */</span>
                                <span class="nl">margin</span><span class="p">:</span> <span class="m">20px</span><span class="p">;</span> <span class="c">/* 添加一些外边距，便于视觉效果 */</span>
                            <span class="p">}</span>
                        <span class="nt">&lt;/style&gt;</span>
                        <span class="c">&lt;!-- .section-header --&gt;</span>
                        <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"row"</span><span class="nt">&gt;</span>
                            <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"col-lg-6"</span><span class="nt">&gt;</span>
                                <span class="nt">&lt;h3</span> <span class="na">style=</span><span class="s">"color: black;"</span><span class="nt">&gt;</span>待识别图像<span class="nt">&lt;/h3&gt;</span>
                                <span class="nt">&lt;div&gt;</span>
                                    <span class="c">&lt;!--                 href="javascript:;"--&gt;</span>
                                    <span class="nt">&lt;input</span> <span class="na">href=</span><span class="s">"javascript:;"</span> <span class="na">class=</span><span class="s">"btn btn-default"</span> <span class="na">tabindex=</span><span class="s">"0"</span> <span class="na">type=</span><span class="s">"file"</span> <span class="na">name=</span><span class="s">"file"</span>
                                           <span class="na">id=</span><span class="s">"file0"</span><span class="nt">&gt;</span>

                                    <span class="nt">&lt;/input&gt;</span>
                                    <span class="nt">&lt;p&gt;&lt;/p&gt;</span>
                                    <span class="nt">&lt;img</span> <span class="na">src=</span><span class="s">""</span> <span class="na">id=</span><span class="s">"img0"</span><span class="nt">&gt;</span>
                                <span class="nt">&lt;/div&gt;</span>
                            <span class="nt">&lt;/div&gt;</span>
                        <span class="nt">&lt;/div&gt;</span>
                        <span class="c">&lt;!-- .col-lg-6 --&gt;</span>
                        <span class="nt">&lt;div&gt;</span>
                            <span class="c">&lt;!--                style="margin-top:20px;width: 35rem;height: 30rem; padding-left: 20px"--&gt;</span>
                            <span class="nt">&lt;input</span> <span class="na">class=</span><span class="s">"btn btn-default"</span> <span class="na">type=</span><span class="s">"button"</span> <span class="na">id=</span><span class="s">"b0"</span>
                                   <span class="na">onclick=</span><span class="s">"test0()"</span> <span class="na">style=</span><span class="s">"color: #000000"</span>
                                   <span class="na">value=</span><span class="s">"预测"</span><span class="nt">&gt;</span>
                            <span class="nt">&lt;p&gt;&lt;/p&gt;</span>
                            <span class="nt">&lt;pre</span> <span class="na">id=</span><span class="s">"out"</span><span class="nt">&gt;</span>点击预测获取识别结果<span class="nt">&lt;/pre&gt;</span>
                            <span class="c">&lt;!--                &lt;pre id="out" style="width:320px;height:50px;line-height: 50px;margin-top:20px;"&gt;&lt;/pre&gt;--&gt;</span>
                        <span class="nt">&lt;/div&gt;</span>
                        <span class="c">&lt;!-- .row --&gt;</span>
                    <span class="nt">&lt;/div&gt;</span>
                    <span class="c">&lt;!-- .upcoming-events --&gt;</span>
                <span class="nt">&lt;/div&gt;</span>
                <span class="c">&lt;!-- .row --&gt;</span>
            <span class="nt">&lt;/div&gt;</span>
            <span class="c">&lt;!-- .container --&gt;</span>
        <span class="nt">&lt;/section&gt;</span>
</code></pre></div></div>

<p>Javascript用模板自带的别改</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">&lt;!-- All js here --&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/modernizr-3.5.0.min.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/jquery-1.12.4.min.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/popper.min.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/bootstrap.min.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/one-page-nav-min.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/slick.min.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/wow.min.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/plugins.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/jquery.meanmenu.min.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
        <span class="nt">&lt;script </span><span class="na">src=</span><span class="s">"../static/js/main.js"</span><span class="nt">&gt;&lt;/script&gt;</span>
</code></pre></div></div>

<h2 id="访问">访问</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>http://localhost:1235/

默认
http://127.0.0.1:1235/
</code></pre></div></div>

<h2 id="界面展示">界面展示</h2>

<p>选择number文件夹下的图片识别数字大小</p>

<p><img src="/images/posts/2026-03-22-Handwritten-Digit-Recognition-System/display1.png" alt="display1" /></p>

<p><img src="/images/posts/2026-03-22-Handwritten-Digit-Recognition-System/display2.png" alt="display2" /></p>

<h3 id="移动端">移动端</h3>

<p><img src="/images/posts/2026-03-22-Handwritten-Digit-Recognition-System/display1-mobile.jpg" alt="display1-mobile" /></p>

<p><img src="/images/posts/2026-03-22-Handwritten-Digit-Recognition-System/display2-mobile.jpg" alt="display2-mobile" /></p>

<blockquote>
  <p>这样就实现了拍摄数字，识别数字大小的功能</p>
</blockquote>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="System" /><category term="System" /><summary type="html"><![CDATA[基于CNN网络和Flask框架的手写数字识别系统]]></summary></entry><entry><title type="html">MRS模板匹配</title><link href="https://cendok.github.io/2026/03/16/MRS-BaseLine%E6%A8%A1%E6%9D%BF%E5%8C%B9%E9%85%8D" rel="alternate" type="text/html" title="MRS模板匹配" /><published>2026-03-16T00:00:00+00:00</published><updated>2026-03-16T00:00:00+00:00</updated><id>https://cendok.github.io/2026/03/16/MRS-BaseLine%E6%A8%A1%E6%9D%BF%E5%8C%B9%E9%85%8D</id><content type="html" xml:base="https://cendok.github.io/2026/03/16/MRS-BaseLine%E6%A8%A1%E6%9D%BF%E5%8C%B9%E9%85%8D"><![CDATA[<h2 id="baseline模板匹配">Baseline模板匹配</h2>

<p>从音频信号中识别出主音系统、主音音高、模式模式和模式类型，自动标注乐曲到对应的五音调式。</p>

<p>librosa库提取色度特征，求和得到十二位色度向量Tensor，不含八度信息。</p>

<p>构建tonggong System模板，循环移动得到其余模板。</p>

<h3 id="system">System</h3>

<p>对于System，提取未知音频的十二位色度向量，与已有的十二给模板比较，计算皮尔森相关系数，系数最大的即匹配程度最高的，就是该乐曲的System。</p>

<h3 id="tonic">Tonic</h3>

<p>对于Tonic，librosa库提取主音音高特征，十二位色度向量，把最后500帧的色度特征相加就是音高名称。</p>

<h3 id="pattern">Pattern</h3>

<p>对于Pattern，根据以下推断方法得到。</p>

<p><strong>Pattern推断方法：</strong></p>

<p>t(Tonic),s(System)</p>

<p>当t = s时，为Gong模式。当t比s高2个半音时，它是尚模式。高4个半音为觉式，高7个半音为直式，高9个半音为余式。</p>

<h3 id="type">Type</h3>

<p>Type，模板由0和1构成，识别方法类似System。</p>

<p>实现音频分析的基本步骤，使用librosa包来处理音频数据：</p>

<ol>
  <li><strong>获取色度特征</strong>：使用librosa包获取整个音频的色度特征。色度特征是一个十二维向量，表示音频中各个音高的能量分布，不考虑八度信息。</li>
  <li><strong>求和色度向量</strong>：将获取到的色度特征向量求和，得到一个十二维的色度向量，该向量反映了整个音频中每个音高的总能量。</li>
  <li><strong>TongGong体系分类</strong>：首先定义C TongGong体系的模板（1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0），其他TongGong体系的模板可以通过循环移动该模板获得。</li>
  <li><strong>计算皮尔森相关系数</strong>：分别计算色度向量与每个TongGong体系模板之间的皮尔森相关系数，以评估它们之间的匹配程度。</li>
  <li><strong>识别TongGong体系</strong>：选取具有最大皮尔森相关系数的模板，该模板对应的TongGong体系即为识别结果。</li>
  <li><strong>识别主音音高</strong>：分析音频的最后500帧的色度特征，按音高求和，最大值对应的音高即视为主音音高。</li>
  <li><strong>调式类型识别</strong>：根据每种调式对应的音阶，构造由0和1组成的模板。使用与TongGong体系识别类似的方法计算每个模板与色度向量之间的匹配度，以识别调式类型。</li>
  <li><strong>得出调式的模式</strong>：结合TongGong体系和主音音高的识别结果，最终确定音频的调式模式。</li>
</ol>

<h3 id="输入">输入</h3>

<p>一维音频转换成二维的频谱图，可以传入整个频谱图训练，也可以切割之后传入训练，在组合训练结果。</p>

<h3 id="结果评估">结果评估</h3>

<p>开发了<strong>7个精度度量</strong>来评估识别结果</p>

<p><img src="/images/posts/2026-3-16-MRS-BaseLine模板匹配/ACC.png" alt="image" /></p>

<p>ACC1为System的精确值、ACC2为Tonic的精确值、ACC3为Pattern的精确值、ACC4为Tonic和Pattern的精确值的均值、ACC5为Type的精确值、ACC6为Tonic、Pattern和Type精确值的均值</p>

<h3 id="实现">实现</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># -*- coding: gb2312 -*-
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">librosa</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">pearsonr</span>

<span class="k">def</span> <span class="nf">analyze_audio</span><span class="p">(</span><span class="n">file_path</span><span class="p">):</span>
    <span class="n">y</span><span class="p">,</span> <span class="n">sr</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span>
    <span class="n">chroma</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">feature</span><span class="p">.</span><span class="n">chroma_stft</span><span class="p">(</span><span class="n">y</span><span class="o">=</span><span class="n">y</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="n">sr</span><span class="p">)</span>
    <span class="n">chroma_sum</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">chroma</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>

 <span class="c1"># System/Tonic模板
</span>    <span class="n">pitch_names</span> <span class="o">=</span> <span class="p">[</span><span class="s">'C'</span><span class="p">,</span> <span class="s">'C#'</span><span class="p">,</span> <span class="s">'D'</span><span class="p">,</span> <span class="s">'D#'</span><span class="p">,</span> <span class="s">'E'</span><span class="p">,</span> <span class="s">'F'</span><span class="p">,</span> <span class="s">'F#'</span><span class="p">,</span> <span class="s">'G'</span><span class="p">,</span> <span class="s">'G#'</span><span class="p">,</span> <span class="s">'A'</span><span class="p">,</span> <span class="s">'A#'</span><span class="p">,</span> <span class="s">'B'</span><span class="p">]</span>
    <span class="n">c_tonggong</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">])</span>  <span class="c1"># C TongGong体系System的模板
</span>    <span class="n">templates</span> <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">roll</span><span class="p">(</span><span class="n">c_tonggong</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">12</span><span class="p">)]</span>  <span class="c1"># System循环生成其他十二个模板TongGong体系的模板，即从C到B按顺序
</span>    <span class="n">correlations</span> <span class="o">=</span> <span class="p">[</span><span class="n">pearsonr</span><span class="p">(</span><span class="n">chroma_sum</span><span class="p">,</span> <span class="n">template</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">template</span> <span class="ow">in</span> <span class="n">templates</span><span class="p">]</span>
    <span class="n">tonggong_index</span> <span class="o">=</span> <span class="n">pitch_names</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">correlations</span><span class="p">)]</span><span class="c1"># 同宫系统映射关系，映射到标签，enumerate(pitch_names) 会生成 (0, 'C'), (1, 'D'), (2, 'E')
</span>    <span class="n">tonggong_system_mapping</span> <span class="o">=</span> <span class="p">{</span><span class="n">name</span><span class="p">:</span> <span class="n">i</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">name</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">pitch_names</span><span class="p">)}</span><span class="c1"># 映射到数字
</span>    <span class="n">System_number</span> <span class="o">=</span> <span class="n">tonggong_system_mapping</span><span class="p">[</span><span class="n">tonggong_index</span><span class="p">]</span>

<span class="c1">#tonic映射到数字，主音提取后500帧的色度特征
</span>    <span class="n">tonic_index</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">chroma</span><span class="p">[:,</span> <span class="o">-</span><span class="mi">500</span><span class="p">:],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">))</span>
    <span class="n">tonic</span> <span class="o">=</span> <span class="n">pitch_names</span><span class="p">[</span><span class="n">tonic_index</span><span class="p">]</span>
    <span class="n">tonic_number</span> <span class="o">=</span> <span class="n">tonggong_system_mapping</span><span class="p">[</span><span class="n">tonic</span><span class="p">]</span><span class="c1"># 主音音高/Pitch of Tonic：规则与同宫系统/TongGong System相同，所以直接用tonggong_system_mapping映射到数字
</span>
<span class="c1"># Type调式模板
</span>    <span class="n">mode_templates_Type</span> <span class="o">=</span> <span class="p">{</span>
        <span class="s">'Heptatonic Yanyue'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]),</span>
        <span class="s">'Heptatonic Qingyue'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]),</span>
        <span class="s">'Heptatonic Yayue'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]),</span>
        <span class="s">'Hexatonic (Biangong)'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]),</span>
        <span class="s">'Hexatonic (Qingjue)'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]),</span>
        <span class="s">'Pentatonic'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
    <span class="p">}</span>
    <span class="n">type_mapping</span> <span class="o">=</span> <span class="p">{</span>
        <span class="s">'Pentatonic'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
        <span class="s">'Hexatonic (Qingjue)'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
        <span class="s">'Hexatonic (Biangong)'</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
        <span class="s">'Heptatonic Yayue'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
        <span class="s">'Heptatonic Qingyue'</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span>
        <span class="s">'Heptatonic Yanyue'</span><span class="p">:</span> <span class="mi">5</span>
    <span class="p">}</span><span class="c1">#Type调式映射关系
</span>    <span class="n">mode_correlations</span> <span class="o">=</span> <span class="p">{</span><span class="n">Type</span><span class="p">:</span> <span class="n">pearsonr</span><span class="p">(</span><span class="n">chroma_sum</span><span class="p">,</span> <span class="n">template</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">Type</span><span class="p">,</span> <span class="n">template</span> <span class="ow">in</span> <span class="n">mode_templates_Type</span><span class="p">.</span><span class="n">items</span><span class="p">()}</span>
    <span class="n">identified_Type</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">mode_correlations</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">mode_correlations</span><span class="p">.</span><span class="n">get</span><span class="p">)</span>
    <span class="n">identified_Type_number</span> <span class="o">=</span> <span class="n">type_mapping</span><span class="p">[</span><span class="n">identified_Type</span><span class="p">]</span>  <span class="c1"># 映射到数字
</span>
<span class="c1">#Pattern计算
</span>    <span class="s">"""
    调式样式/Type Pattern：
    0--宫/Gong
    1--商/Shang
    2--角/Jue
    3--徵/Zhi
    4--羽/Yu
    """</span>
    <span class="n">half_tone_difference</span> <span class="o">=</span> <span class="p">(</span><span class="n">tonic_number</span> <span class="o">-</span> <span class="n">System_number</span><span class="p">)</span> <span class="o">%</span> <span class="mi">12</span><span class="c1"># 计算半音差距
</span>    <span class="k">if</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span><span class="c1"># 根据半音差距判断模式
</span>        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">elif</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">elif</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">4</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">2</span>
    <span class="k">elif</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">7</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">3</span>
    <span class="k">elif</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">9</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">4</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">9</span>  <span class="c1"># pattern_number = 9无法确定模式Pattern
</span>
    <span class="k">return</span> <span class="n">System_number</span><span class="p">,</span> <span class="n">tonic_number</span><span class="p">,</span> <span class="n">identified_Type_number</span><span class="p">,</span><span class="n">pattern_number</span>

<span class="c1"># 文件夹路径
</span><span class="n">folder_path</span> <span class="o">=</span> <span class="sa">r</span><span class="s">"E:\Code\CNPM_audio"</span><span class="c1"># 文件夹路径
</span><span class="n">true_labels_df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="sa">r</span><span class="s">'E:\Code\label.csv'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">)</span>

<span class="n">correct_System</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">correct_Tonic</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">correct_Type</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">correct_Pattern</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">total_files</span> <span class="o">=</span> <span class="mi">0</span>

<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">true_labels_df</span><span class="p">.</span><span class="n">iterrows</span><span class="p">():</span>
    <span class="n">file_name</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s">'File_Name'</span><span class="p">]</span>
    <span class="n">true_tonggong</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s">'System'</span><span class="p">]</span>
    <span class="n">true_tonic</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">]</span>
    <span class="n">true_Type</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s">'Type'</span><span class="p">]</span>
    <span class="n">true_Pattern</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">]</span>

    <span class="n">file_path</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">folder_path</span><span class="p">,</span> <span class="n">file_name</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">exists</span><span class="p">(</span><span class="n">file_path</span><span class="p">):</span>
        <span class="n">total_files</span> <span class="o">+=</span> <span class="mi">1</span>
        <span class="n">tonggong_index</span><span class="p">,</span> <span class="n">tonic</span><span class="p">,</span> <span class="n">identified_Type</span><span class="p">,</span><span class="n">identified_Pattern</span> <span class="o">=</span> <span class="n">analyze_audio</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">tonggong_index</span> <span class="o">==</span> <span class="n">true_tonggong</span><span class="p">:</span>
            <span class="n">correct_System</span> <span class="o">+=</span> <span class="mi">1</span>
        <span class="k">if</span> <span class="n">tonic</span> <span class="o">==</span> <span class="n">true_tonic</span><span class="p">:</span>
            <span class="n">correct_Tonic</span> <span class="o">+=</span> <span class="mi">1</span>
        <span class="k">if</span> <span class="n">identified_Type</span> <span class="o">==</span> <span class="n">true_Type</span><span class="p">:</span>
            <span class="n">correct_Type</span> <span class="o">+=</span> <span class="mi">1</span>
        <span class="k">if</span> <span class="n">identified_Pattern</span> <span class="o">==</span> <span class="n">true_Pattern</span><span class="p">:</span>
            <span class="n">correct_Pattern</span> <span class="o">+=</span> <span class="mi">1</span>

<span class="k">if</span> <span class="n">total_files</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC1(System accuracy): </span><span class="si">{</span><span class="n">correct_System</span> <span class="o">/</span> <span class="n">total_files</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC2(Tonic accuracy): </span><span class="si">{</span><span class="n">correct_Tonic</span> <span class="o">/</span> <span class="n">total_files</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC3I(Pattern accuracy): </span><span class="si">{</span><span class="n">correct_Pattern</span> <span class="o">/</span> <span class="n">total_files</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC4(Tonic and Pattern Average accuracy): </span><span class="si">{</span><span class="p">(</span><span class="n">correct_Pattern</span><span class="o">+</span><span class="n">correct_Tonic</span><span class="p">)</span>  <span class="o">/</span> <span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">total_files</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC5(Type accuracy): </span><span class="si">{</span><span class="n">correct_Type</span> <span class="o">/</span> <span class="n">total_files</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC6(Tonic, Pattern and Type Average accuracy): </span><span class="si">{</span><span class="p">(</span><span class="n">correct_Pattern</span><span class="o">+</span><span class="n">correct_Tonic</span><span class="o">+</span><span class="n">correct_Type</span><span class="p">)</span>  <span class="o">/</span> <span class="p">(</span><span class="mi">3</span><span class="o">*</span><span class="n">total_files</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"No files were analyzed."</span><span class="p">)</span>
</code></pre></div></div>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="MRS" /><category term="MRS" /><summary type="html"><![CDATA[BaseLine模板匹配]]></summary></entry><entry><title type="html">MRS单任务卷积递归神经网络</title><link href="https://cendok.github.io/2026/03/16/MRS-CRNN%E5%8D%95%E4%BB%BB%E5%8A%A1" rel="alternate" type="text/html" title="MRS单任务卷积递归神经网络" /><published>2026-03-16T00:00:00+00:00</published><updated>2026-03-16T00:00:00+00:00</updated><id>https://cendok.github.io/2026/03/16/MRS-CRNN%E5%8D%95%E4%BB%BB%E5%8A%A1</id><content type="html" xml:base="https://cendok.github.io/2026/03/16/MRS-CRNN%E5%8D%95%E4%BB%BB%E5%8A%A1"><![CDATA[<h2 id="crnn">CRNN</h2>

<h3 id="单任务实现">单任务实现</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">torch.utils.data</span> <span class="kn">import</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">DataLoader</span>
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">train_test_split</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">librosa</span>
<span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="n">nn</span>
<span class="kn">import</span> <span class="nn">torch.nn.functional</span> <span class="k">as</span> <span class="n">F</span>

<span class="k">def</span> <span class="nf">generate_cqt_spectrogram</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">resample_rate</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">max_length</span><span class="o">=</span><span class="mi">100</span><span class="p">):</span>
    <span class="n">y</span><span class="p">,</span> <span class="n">sr</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
    <span class="n">cqt</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">cqt</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="n">sr</span><span class="p">,</span> <span class="n">fmin</span><span class="o">=</span><span class="n">librosa</span><span class="p">.</span><span class="n">note_to_hz</span><span class="p">(</span><span class="s">'C1'</span><span class="p">),</span> <span class="n">n_bins</span><span class="o">=</span><span class="mi">168</span><span class="p">,</span> <span class="n">bins_per_octave</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
    <span class="n">cqt_amplitude</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">cqt</span><span class="p">)</span>
    <span class="n">cqt_resampled</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">resample</span><span class="p">(</span><span class="n">cqt_amplitude</span><span class="p">,</span> <span class="n">orig_sr</span><span class="o">=</span><span class="n">sr</span><span class="p">,</span> <span class="n">target_sr</span><span class="o">=</span><span class="n">resample_rate</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"cqt_resampled shape:"</span><span class="p">,</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="c1"># 调整长度
</span>    <span class="k">if</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&lt;</span> <span class="n">max_length</span><span class="p">:</span>
        <span class="n">pad_width</span> <span class="o">=</span> <span class="n">max_length</span> <span class="o">-</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
        <span class="n">cqt_resampled</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">pad</span><span class="p">(</span><span class="n">cqt_resampled</span><span class="p">,</span> <span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">pad_width</span><span class="p">)),</span> <span class="s">'constant'</span><span class="p">)</span>
    <span class="k">elif</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&gt;</span> <span class="n">max_length</span><span class="p">:</span>
        <span class="n">cqt_resampled</span> <span class="o">=</span> <span class="n">cqt_resampled</span><span class="p">[:,</span> <span class="p">:</span><span class="n">max_length</span><span class="p">]</span>

    <span class="k">return</span> <span class="n">cqt_resampled</span>
<span class="c1">#输出一个spectrogram频谱图
</span>
<span class="k">class</span> <span class="nc">AudioDataset</span><span class="p">(</span><span class="n">Dataset</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">transform</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">transform</span> <span class="o">=</span> <span class="n">transform</span>

    <span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">idx</span><span class="p">):</span>
        <span class="n">audio_path</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">idx</span><span class="p">][</span><span class="s">'audio'</span><span class="p">]</span>
        <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">generate_cqt_spectrogram</span><span class="p">(</span><span class="n">audio_path</span><span class="p">)</span>
        <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"CQT Spectrogram shape:"</span><span class="p">,</span> <span class="n">spectrogram</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span><span class="c1">#检查频谱图形状，确保为单通道
</span>        <span class="n">eps</span> <span class="o">=</span> <span class="mf">1e-10</span><span class="c1">#避免除以零
</span>        <span class="n">spectrogram</span> <span class="o">=</span> <span class="p">(</span><span class="n">spectrogram</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">)</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span><span class="c1">#z-score标准化
</span>        <span class="n">label</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">idx</span><span class="p">][</span><span class="s">'System'</span><span class="p">]</span>
        <span class="k">return</span> <span class="n">torch</span><span class="p">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">).</span><span class="nb">float</span><span class="p">(),</span><span class="n">label</span>

<span class="k">def</span> <span class="nf">custom_collate_fn</span><span class="p">(</span><span class="n">batch</span><span class="p">):</span>
    <span class="n">spectrograms</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">batch</span><span class="p">)</span><span class="c1">#分离频谱图和标签
</span>    <span class="n">spectrograms</span> <span class="o">=</span> <span class="p">[</span><span class="n">torch</span><span class="p">.</span><span class="n">Tensor</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">spectrograms</span><span class="p">]</span>
    <span class="n">spectrograms_padded</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">rnn</span><span class="p">.</span><span class="n">pad_sequence</span><span class="p">(</span><span class="n">spectrograms</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">padding_value</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
    <span class="n">labels</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Batch shape:"</span><span class="p">,</span> <span class="n">spectrograms_padded</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span><span class="c1">#验证最终形状
</span>    <span class="k">return</span> <span class="n">spectrograms_padded</span><span class="p">,</span> <span class="n">labels</span>


<span class="k">class</span> <span class="nc">CRNN</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">num_classes</span><span class="p">,</span> <span class="n">input_height</span><span class="o">=</span><span class="mi">168</span><span class="p">,</span> <span class="n">input_width</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">rnn_hidden_size</span><span class="o">=</span><span class="mi">128</span><span class="p">,</span> <span class="n">rnn_num_layers</span><span class="o">=</span><span class="mi">2</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">CRNN</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">input_height</span> <span class="o">=</span> <span class="n">input_height</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">input_width</span> <span class="o">=</span> <span class="n">input_width</span>
        <span class="c1">#更新CNN层，以适应较窄的输入
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_channels</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">out_channels</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">stride</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">bn1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="mi">64</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">pool1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">MaxPool2d</span><span class="p">(</span><span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">stride</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span><span class="c1">#只在高度上池化，保持宽度不变
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">conv2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">stride</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">bn2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="mi">128</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">pool2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">MaxPool2d</span><span class="p">(</span><span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">stride</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span><span class="c1">#同上
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">conv3</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">stride</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">bn3</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="mi">256</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">pool3</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">MaxPool2d</span><span class="p">(</span><span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">stride</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span><span class="c1">#同上
</span>        <span class="c1">#确定展平后的尺
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">_to_linear</span> <span class="o">=</span> <span class="bp">None</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">_forward_conv</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">autograd</span><span class="p">.</span><span class="n">Variable</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">input_height</span><span class="p">,</span> <span class="n">input_width</span><span class="p">)))</span>
        <span class="c1">#RNN层
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">lstm</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">LSTM</span><span class="p">(</span><span class="n">input_size</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">_to_linear</span><span class="p">,</span> <span class="n">hidden_size</span><span class="o">=</span><span class="n">rnn_hidden_size</span><span class="p">,</span> <span class="n">num_layers</span><span class="o">=</span><span class="n">rnn_num_layers</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">bidirectional</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
        <span class="c1">#分类器
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">fc</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">rnn_hidden_size</span> <span class="o">*</span> <span class="mi">2</span><span class="p">,</span> <span class="n">num_classes</span><span class="p">)</span><span class="c1">#*2因为它是双向的
</span>
    <span class="k">def</span> <span class="nf">_forward_conv</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">pool1</span><span class="p">(</span><span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">bn1</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">))))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">pool2</span><span class="p">(</span><span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">bn2</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv2</span><span class="p">(</span><span class="n">x</span><span class="p">))))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">pool3</span><span class="p">(</span><span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">bn3</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv3</span><span class="p">(</span><span class="n">x</span><span class="p">))))</span>
        <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">_to_linear</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">_to_linear</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">*</span> <span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span><span class="c1">#动态计算
</span>        <span class="k">return</span> <span class="n">x</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="c1">#卷积层
</span>        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_forward_conv</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="c1">#为RNN输入重塑输出
</span>        <span class="n">batch_size</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">_to_linear</span><span class="p">)</span>  <span class="c1"># (batch, seq_len, features)
</span>        <span class="c1">#RNN层
</span>        <span class="n">x</span><span class="p">,</span> <span class="p">(</span><span class="n">h_n</span><span class="p">,</span> <span class="n">c_n</span><span class="p">)</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">lstm</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="c1">#只使用最后一个RNN层的输出
</span>        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">fc</span><span class="p">(</span><span class="n">x</span><span class="p">[:,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:])</span>
        <span class="k">return</span> <span class="n">x</span>

<span class="c1">#从这里开始处理数据
</span><span class="n">file_path</span> <span class="o">=</span> <span class="s">'./label.csv'</span>
<span class="c1"># file_path = r"D:\0-2024英文文献\0-代码部分\Code\label.csv"
</span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'gbk'</span><span class="p">)</span>
<span class="n">audio_dir</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span>

<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="s">'File_Name'</span><span class="p">,</span> <span class="s">'System'</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">]]</span>
<span class="n">df</span><span class="p">[</span><span class="s">'audio'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">'File_Name'</span><span class="p">].</span><span class="nb">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">audio_dir</span><span class="p">,</span> <span class="s">'CNPM_audio'</span><span class="p">,</span> <span class="n">x</span><span class="p">))</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="s">'audio'</span><span class="p">,</span> <span class="s">'System'</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">]]</span>

<span class="n">train_df</span><span class="p">,</span> <span class="n">test_df</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
<span class="n">train_df</span><span class="p">,</span> <span class="n">val_df</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">train_df</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span><span class="c1">#0.25 x 0.8 = 0.2
</span>
<span class="n">train_dataset</span> <span class="o">=</span> <span class="n">AudioDataset</span><span class="p">(</span><span class="n">train_df</span><span class="p">)</span>
<span class="n">val_dataset</span> <span class="o">=</span> <span class="n">AudioDataset</span><span class="p">(</span><span class="n">val_df</span><span class="p">)</span>
<span class="n">test_dataset</span> <span class="o">=</span> <span class="n">AudioDataset</span><span class="p">(</span><span class="n">test_df</span><span class="p">)</span>

<span class="n">train_loader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">train_dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">custom_collate_fn</span><span class="p">)</span>
<span class="n">val_loader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">val_dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">custom_collate_fn</span><span class="p">)</span>
<span class="n">test_loader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">test_dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">custom_collate_fn</span><span class="p">)</span>

<span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">device</span><span class="p">(</span><span class="s">"cuda"</span> <span class="k">if</span> <span class="n">torch</span><span class="p">.</span><span class="n">cuda</span><span class="p">.</span><span class="n">is_available</span><span class="p">()</span> <span class="k">else</span> <span class="s">"cpu"</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">CRNN</span><span class="p">(</span><span class="n">num_classes</span><span class="o">=</span><span class="mi">12</span><span class="p">).</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">optim</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="n">lr</span><span class="o">=</span><span class="mf">0.001</span><span class="p">)</span>
<span class="n">criterion</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">CrossEntropyLoss</span><span class="p">()</span>

<span class="k">def</span> <span class="nf">train_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">data_loader</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">,</span> <span class="n">device</span><span class="p">):</span>
    <span class="n">model</span><span class="p">.</span><span class="n">train</span><span class="p">()</span>
    <span class="n">total_loss</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">for</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="ow">in</span> <span class="n">data_loader</span><span class="p">:</span>
        <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">inputs</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">),</span> <span class="n">labels</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
        <span class="n">optimizer</span><span class="p">.</span><span class="n">zero_grad</span><span class="p">()</span>
        <span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>
        <span class="n">loss</span> <span class="o">=</span> <span class="n">criterion</span><span class="p">(</span><span class="n">outputs</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span>
        <span class="n">loss</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>
        <span class="n">optimizer</span><span class="p">.</span><span class="n">step</span><span class="p">()</span>
        <span class="n">total_loss</span> <span class="o">+=</span> <span class="n">loss</span><span class="p">.</span><span class="n">item</span><span class="p">()</span>
    <span class="k">return</span> <span class="n">total_loss</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">data_loader</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">evaluate_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">data_loader</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">device</span><span class="p">):</span>
    <span class="n">model</span><span class="p">.</span><span class="nb">eval</span><span class="p">()</span>
    <span class="n">total_loss</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="n">total_correct</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="n">total_samples</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">with</span> <span class="n">torch</span><span class="p">.</span><span class="n">no_grad</span><span class="p">():</span>
        <span class="k">for</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="ow">in</span> <span class="n">data_loader</span><span class="p">:</span>
            <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">inputs</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">),</span> <span class="n">labels</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
            <span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>
            <span class="n">loss</span> <span class="o">=</span> <span class="n">criterion</span><span class="p">(</span><span class="n">outputs</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span>
            <span class="n">total_loss</span> <span class="o">+=</span> <span class="n">loss</span><span class="p">.</span><span class="n">item</span><span class="p">()</span><span class="c1">#计算准确度
</span>            <span class="n">_</span><span class="p">,</span> <span class="n">predicted</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">outputs</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span><span class="c1">#获取最大概率的预测结果
</span>            <span class="n">total_correct</span> <span class="o">+=</span> <span class="p">(</span><span class="n">predicted</span> <span class="o">==</span> <span class="n">labels</span><span class="p">).</span><span class="nb">sum</span><span class="p">().</span><span class="n">item</span><span class="p">()</span>
            <span class="n">total_samples</span> <span class="o">+=</span> <span class="n">labels</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
        <span class="n">average_loss</span> <span class="o">=</span> <span class="n">total_loss</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">data_loader</span><span class="p">)</span>
        <span class="n">accuracy</span> <span class="o">=</span> <span class="n">total_correct</span> <span class="o">/</span> <span class="n">total_samples</span>
    <span class="k">return</span> <span class="n">average_loss</span><span class="p">,</span> <span class="n">accuracy</span>

<span class="c1">#实际训练和验证循环
</span><span class="n">epochs</span> <span class="o">=</span> <span class="mi">10</span>
<span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">epochs</span><span class="p">):</span>
    <span class="n">train_loss</span> <span class="o">=</span> <span class="n">train_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">train_loader</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">,</span> <span class="n">device</span><span class="p">)</span>
    <span class="n">val_loss</span><span class="p">,</span> <span class="n">val_accuracy</span> <span class="o">=</span> <span class="n">evaluate_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">val_loader</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">device</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Epoch </span><span class="si">{</span><span class="n">epoch</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="s">, Train Loss: </span><span class="si">{</span><span class="n">train_loss</span><span class="si">}</span><span class="s">, Validation Loss: </span><span class="si">{</span><span class="n">val_loss</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"System Validation Accuracy: </span><span class="si">{</span><span class="n">val_accuracy</span><span class="si">:</span> <span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="MRS" /><category term="MRS" /><summary type="html"><![CDATA[单任务卷积递归神经网络CRNN]]></summary></entry><entry><title type="html">MRS单任务残差网络</title><link href="https://cendok.github.io/2026/03/16/MRS-ResNet%E5%8D%95%E4%BB%BB%E5%8A%A1" rel="alternate" type="text/html" title="MRS单任务残差网络" /><published>2026-03-16T00:00:00+00:00</published><updated>2026-03-16T00:00:00+00:00</updated><id>https://cendok.github.io/2026/03/16/MRS-ResNet%E5%8D%95%E4%BB%BB%E5%8A%A1</id><content type="html" xml:base="https://cendok.github.io/2026/03/16/MRS-ResNet%E5%8D%95%E4%BB%BB%E5%8A%A1"><![CDATA[<h2 id="resnet">ResNet</h2>

<p><a href="https://zhuanlan.zhihu.com/p/378037292">ResNet从理论到实践（一）ResNet原理 - 知乎 (zhihu.com)</a></p>

<p><img src="/images/posts/2026-3-16-MRS-ResNet单任务/Resnet.png" alt="image" /></p>

<ul>
  <li>
    <p>1）直接使用两个单任务模型分别预测主音和模式；</p>
  </li>
  <li>
    <p>2）使用两个单任务模型分别预测体系和主音，然后间接计算这两个结果的模式；</p>
  </li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>公式：System同宫系统+Tonic主音=Pattern模式
</code></pre></div></div>

<ul>
  <li>3）使用一个多任务模型识别系统、主音和模式，其中模式既可以直接也可以间接导出。</li>
</ul>

<p><strong>类型识别Type</strong>，由于其数据分布不均、识别难度大且与其他三项关系不大，我们直接使用单一模型进行预测，而不将其加入多任务模型。</p>

<h3 id="resnet18单任务网络架构">ResNet18单任务网络架构</h3>

<p><img src="/images/posts/2026-3-16-MRS-ResNet单任务/ResNet-Single-task.png" alt="image" /></p>

<ol>
  <li><strong>初始卷积层（Conv）</strong>：
    <ul>
      <li>卷积核大小（k）为 7x7，步长（s）为 1，输出通道数（c）为 64。</li>
      <li>这一层用于初步提取特征。</li>
    </ul>
  </li>
  <li><strong>最大池化层（Max Pooling）</strong>：
    <ul>
      <li>池化核大小为 3x3，步长为 2。</li>
      <li>这一层用于减少特征维度和提高模型的空间不变性。</li>
    </ul>
  </li>
  <li><strong>残差块（Residual Block）</strong>：
    <ul>
      <li>包括两个 3x3 卷积层，每层后面跟着批归一化（Batch Normalization）和 ReLU 激活函数。</li>
      <li>每个卷积层的输出通道数（c）为 ci，ci 是可变的，取决于具体块中的设置。</li>
      <li>步长（s）在第一卷积层为 1 或 2，第二卷积层始终为 1，步长为 2 用于降采样。</li>
      <li>每个块的最后通过相加操作（+）融合主路径和捷径（shortcut）的输出，然后再应用 ReLU 激活函数。</li>
    </ul>
  </li>
  <li><strong>全局平均池化层（AdaptiveAvgPool）</strong>：
    <ul>
      <li>这是全局平均池化层，将特征图缩减为 1x1，减少参数数量，同时保持特征。</li>
    </ul>
  </li>
  <li><strong>重复</strong>：
    <ul>
      <li>指示残差块重复的次数，这里是 8 次。</li>
    </ul>
  </li>
</ol>

<h3 id="实现">实现</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">torch.utils.data</span> <span class="kn">import</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">DataLoader</span>
<span class="kn">from</span> <span class="nn">torch.nn.utils.rnn</span> <span class="kn">import</span> <span class="n">pad_sequence</span>
<span class="kn">from</span> <span class="nn">torchvision</span> <span class="kn">import</span> <span class="n">datasets</span><span class="p">,</span> <span class="n">transforms</span><span class="p">,</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">train_test_split</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">librosa</span>
<span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="n">nn</span>
<span class="kn">import</span> <span class="nn">torch.nn.functional</span> <span class="k">as</span> <span class="n">F</span>

<span class="k">def</span> <span class="nf">generate_cqt_spectrogram</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">resample_rate</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">segment_duration</span><span class="o">=</span><span class="mi">20</span><span class="p">):</span>
    <span class="n">y</span><span class="p">,</span> <span class="n">sr</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
    <span class="n">cqt</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">cqt</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="n">sr</span><span class="p">,</span> <span class="n">fmin</span><span class="o">=</span><span class="n">librosa</span><span class="p">.</span><span class="n">note_to_hz</span><span class="p">(</span><span class="s">'C1'</span><span class="p">),</span> <span class="n">n_bins</span><span class="o">=</span><span class="mi">168</span><span class="p">,</span> <span class="n">bins_per_octave</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
    <span class="n">cqt_amplitude</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">cqt</span><span class="p">)</span><span class="c1">#标准化
</span>    <span class="n">cqt_resampled</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">resample</span><span class="p">(</span><span class="n">cqt_amplitude</span><span class="p">,</span> <span class="n">orig_sr</span><span class="o">=</span><span class="n">sr</span><span class="p">,</span> <span class="n">target_sr</span><span class="o">=</span><span class="n">resample_rate</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="c1">#下采样#计算一个片段中的样本数
</span>    <span class="n">samples_per_segment</span> <span class="o">=</span> <span class="n">resample_rate</span> <span class="o">*</span> <span class="n">segment_duration</span><span class="c1">#5*20=100个时间点
</span>    <span class="n">total_segments</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">/</span> <span class="n">samples_per_segment</span><span class="p">))</span>
    <span class="n">segments</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">total_segments</span><span class="p">):</span><span class="c1">#如果末尾超过 CQT 的长度，则用零填充剩余部分，达到指定长度
</span>        <span class="n">start</span> <span class="o">=</span> <span class="n">i</span> <span class="o">*</span> <span class="n">samples_per_segment</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">start</span> <span class="o">+</span> <span class="n">samples_per_segment</span>
        <span class="k">if</span> <span class="n">end</span> <span class="o">&gt;</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
            <span class="n">padding_length</span> <span class="o">=</span> <span class="n">end</span> <span class="o">-</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
            <span class="n">padding</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">padding_length</span><span class="p">))</span>
            <span class="n">segment</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">cqt_resampled</span><span class="p">[:,</span> <span class="n">start</span><span class="p">:</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]],</span> <span class="n">padding</span><span class="p">))</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">segment</span> <span class="o">=</span> <span class="n">cqt_resampled</span><span class="p">[:,</span> <span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">]</span>
        <span class="n">segments</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">segment</span><span class="p">)</span>
    <span class="n">segments</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">segments</span><span class="p">)</span>  <span class="c1"># 输出为数组格式
</span>    <span class="k">print</span><span class="p">(</span><span class="s">"segments shape:"</span><span class="p">,</span> <span class="n">segments</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">segments</span>

<span class="k">def</span> <span class="nf">generate_cqt_spectrogram_Tonic</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">resample_rate</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">segment_duration</span><span class="o">=</span><span class="mi">20</span><span class="p">):</span>
    <span class="n">y</span><span class="p">,</span> <span class="n">sr</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
    <span class="n">cqt</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">cqt</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="n">sr</span><span class="p">,</span> <span class="n">fmin</span><span class="o">=</span><span class="n">librosa</span><span class="p">.</span><span class="n">note_to_hz</span><span class="p">(</span><span class="s">'C1'</span><span class="p">),</span> <span class="n">n_bins</span><span class="o">=</span><span class="mi">168</span><span class="p">,</span> <span class="n">bins_per_octave</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
    <span class="n">cqt_amplitude</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">cqt</span><span class="p">)</span>
    <span class="n">cqt_resampled</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">resample</span><span class="p">(</span><span class="n">cqt_amplitude</span><span class="p">,</span> <span class="n">orig_sr</span><span class="o">=</span><span class="n">sr</span><span class="p">,</span> <span class="n">target_sr</span><span class="o">=</span><span class="n">resample_rate</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">500</span><span class="p">:</span>
        <span class="n">cqt_resampled</span> <span class="o">=</span> <span class="n">cqt_resampled</span><span class="p">[:,</span> <span class="o">-</span><span class="mi">500</span><span class="p">:]</span><span class="c1">#取最后500帧分析主音
</span>    <span class="k">else</span><span class="p">:</span>
        <span class="k">pass</span>
    <span class="n">samples_per_segment</span> <span class="o">=</span> <span class="n">resample_rate</span> <span class="o">*</span> <span class="n">segment_duration</span>  <span class="c1"># 5 * 20 = 100
</span>    <span class="n">total_segments</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">/</span> <span class="n">samples_per_segment</span><span class="p">))</span>
    <span class="n">segments_Tonic</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">total_segments</span><span class="p">):</span>
        <span class="n">start</span> <span class="o">=</span> <span class="n">i</span> <span class="o">*</span> <span class="n">samples_per_segment</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">start</span> <span class="o">+</span> <span class="n">samples_per_segment</span>
        <span class="k">if</span> <span class="n">end</span> <span class="o">&gt;</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
            <span class="n">padding_length</span> <span class="o">=</span> <span class="n">end</span> <span class="o">-</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
            <span class="n">padding</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">padding_length</span><span class="p">))</span>
            <span class="n">segment</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">cqt_resampled</span><span class="p">[:,</span> <span class="n">start</span><span class="p">:</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]],</span> <span class="n">padding</span><span class="p">))</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">segment</span> <span class="o">=</span> <span class="n">cqt_resampled</span><span class="p">[:,</span> <span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">]</span>
        <span class="n">segments_Tonic</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">segment</span><span class="p">)</span>
    <span class="n">segments_Tonic</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">segments_Tonic</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"segments_Tonic shape:"</span><span class="p">,</span> <span class="n">segments_Tonic</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">segments_Tonic</span>

<span class="k">class</span> <span class="nc">AudioDataset</span><span class="p">(</span><span class="n">Dataset</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">label_column</span><span class="p">,</span> <span class="n">transform</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">label_column</span> <span class="o">=</span> <span class="n">label_column</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">transform</span> <span class="o">=</span> <span class="n">transform</span>

    <span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">idx</span><span class="p">):</span>
        <span class="n">audio_path</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">idx</span><span class="p">][</span><span class="s">'audio'</span><span class="p">]</span>

        <span class="c1"># 根据任务类型决定使用哪个函数生成频谱图
</span>        <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">label_column</span> <span class="o">==</span> <span class="s">'Tonic'</span><span class="p">:</span>
            <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">generate_cqt_spectrogram_Tonic</span><span class="p">(</span><span class="n">audio_path</span><span class="p">)</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">generate_cqt_spectrogram</span><span class="p">(</span><span class="n">audio_path</span><span class="p">)</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"CQT Spectrogram shape for </span><span class="si">{</span><span class="bp">self</span><span class="p">.</span><span class="n">label_column</span><span class="si">}</span><span class="s">:"</span><span class="p">,</span> <span class="n">spectrogram</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
        <span class="n">eps</span> <span class="o">=</span> <span class="mf">1e-10</span>  <span class="c1"># 避免除以零
</span>        <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">spectrogram</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">)</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span>  <span class="c1"># 标准化到[0,1]
</span>        <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>  <span class="c1"># 增加通道维度
</span>        <span class="n">label</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">idx</span><span class="p">][</span><span class="bp">self</span><span class="p">.</span><span class="n">label_column</span><span class="p">]</span>
        <span class="k">return</span> <span class="n">torch</span><span class="p">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">).</span><span class="nb">float</span><span class="p">(),</span> <span class="n">label</span>

<span class="k">def</span> <span class="nf">custom_collate_fn</span><span class="p">(</span><span class="n">batch</span><span class="p">):</span>
    <span class="n">spectrograms</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">batch</span><span class="p">)</span>    <span class="c1"># 分离频谱图和标签
</span>    <span class="n">spectrograms</span> <span class="o">=</span> <span class="p">[</span><span class="n">s</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">spectrograms</span><span class="p">]</span>  <span class="c1"># 移除不必要的维度
</span>    <span class="n">spectrograms_padded</span> <span class="o">=</span> <span class="n">pad_sequence</span><span class="p">(</span><span class="n">spectrograms</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">padding_value</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span><span class="c1"># 填充频谱图使它们在时间维度上的长度相同
</span>    <span class="k">print</span><span class="p">(</span><span class="s">"Batch shape:"</span><span class="p">,</span> <span class="n">spectrograms_padded</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span><span class="c1"># 验证最终形状
</span>    <span class="n">labels</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span><span class="c1"># 将标签转换为Tensor
</span>    <span class="k">print</span><span class="p">(</span><span class="s">"Batch shape after padding:"</span><span class="p">,</span> <span class="n">spectrograms_padded</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">spectrograms_padded</span><span class="p">,</span> <span class="n">labels</span>

<span class="k">class</span> <span class="nc">BasicBlock</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="n">expansion</span> <span class="o">=</span> <span class="mi">1</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">in_planes</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">BasicBlock</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>

        <span class="bp">self</span><span class="p">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_planes</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">bn1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="n">planes</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">conv2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">planes</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">bn2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="n">planes</span><span class="p">)</span>

        <span class="bp">self</span><span class="p">.</span><span class="n">shortcut</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">()</span>

        <span class="k">if</span> <span class="n">stride</span> <span class="o">!=</span> <span class="mi">1</span> <span class="ow">or</span> <span class="n">in_planes</span> <span class="o">!=</span> <span class="bp">self</span><span class="p">.</span><span class="n">expansion</span> <span class="o">*</span> <span class="n">planes</span><span class="p">:</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">shortcut</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
                <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_planes</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">expansion</span> <span class="o">*</span> <span class="n">planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">),</span>
                <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">expansion</span> <span class="o">*</span> <span class="n">planes</span><span class="p">)</span>
            <span class="p">)</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">out</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">bn1</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">bn2</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv2</span><span class="p">(</span><span class="n">out</span><span class="p">))</span>
        <span class="n">out</span> <span class="o">+=</span> <span class="bp">self</span><span class="p">.</span><span class="n">shortcut</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">out</span>

<span class="k">class</span> <span class="nc">SingletaskResNet</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">block</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">,</span> <span class="n">num_classes</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">SingletaskResNet</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>

<span class="c1">#由两个大小为 3x3 的卷积层组成，步长（s）为 1 或 2。
</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">in_planes</span> <span class="o">=</span> <span class="mi">64</span><span class="c1"># 修改输入层通道数为1，并移除降采样
</span>        <span class="c1"># self.conv1 = nn.Conv2d(in_channels=1, out_channels=64, kernel_size=7, stride=1, padding=3, bias=False)#第一个卷积层self.conv1被设置为接受单通道输入#64卷积层的输出通道数;bias=False：指示该层不使用偏置参数（bias）;
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_channels</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">out_channels</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
        <span class="c1">#是stride=1还是stride=2？
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">bn1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="mi">64</span><span class="p">)</span>
        <span class="c1"># self.maxpool = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)  # 修改步长为1
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">maxpool</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">MaxPool2d</span><span class="p">(</span><span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        <span class="c1"># 是stride=1还是stride=2
</span>        <span class="c1">#构建多个残差层_make_layer
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">layer1</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer2</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer3</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer4</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">512</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="c1"># 在网络的末尾使用了全局平均池化，以将特征图的尺寸从任意大小减少到1x1，进而为全连接层（self.fc）提供输入
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">avgpool</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">AdaptiveAvgPool2d</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">fc</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="o">*</span><span class="n">block</span><span class="p">.</span><span class="n">expansion</span><span class="p">,</span> <span class="n">num_classes</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">_make_layer</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">block</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">,</span> <span class="n">stride</span><span class="p">):</span>
        <span class="n">strides</span> <span class="o">=</span> <span class="p">[</span><span class="n">stride</span><span class="p">]</span> <span class="o">+</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">num_blocks</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
        <span class="n">layers</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="k">for</span> <span class="n">stride</span> <span class="ow">in</span> <span class="n">strides</span><span class="p">:</span>
            <span class="n">layers</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">block</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">in_planes</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">stride</span><span class="p">))</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">in_planes</span> <span class="o">=</span> <span class="n">planes</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span>
        <span class="k">return</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span><span class="o">*</span><span class="n">layers</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">out</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">bn1</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">maxpool</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer1</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer2</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer3</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer4</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">avgpool</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">flatten</span><span class="p">(</span><span class="n">out</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">fc</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">out</span>

<span class="k">def</span> <span class="nf">initialize_model</span><span class="p">(</span><span class="n">num_classes</span><span class="p">,</span> <span class="n">device</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.001</span><span class="p">):</span>

    <span class="n">num_blocks</span> <span class="o">=</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span>
    <span class="c1"># num_blocks = [2, 2, 2, 2]，残差块2个一层，一共4层，8个残差块。
</span>    <span class="n">model</span> <span class="o">=</span> <span class="n">SingletaskResNet</span><span class="p">(</span><span class="n">BasicBlock</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">,</span> <span class="n">num_classes</span><span class="p">).</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
    <span class="n">criterion</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">CrossEntropyLoss</span><span class="p">()</span>
    <span class="n">optimizer</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">optim</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="n">lr</span><span class="o">=</span><span class="n">learning_rate</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span>

<span class="c1"># 训练模型的函数
</span><span class="k">def</span> <span class="nf">train_model</span><span class="p">(</span><span class="n">train_loader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">,</span> <span class="n">device</span><span class="p">,</span> <span class="n">num_epochs</span><span class="o">=</span><span class="mi">10</span><span class="p">):</span>
    <span class="n">model</span><span class="p">.</span><span class="n">train</span><span class="p">()</span>  <span class="c1"># 确保模型处于训练模式，不修改epoch的值#进入训练模式，权重参数不可修改
</span>    <span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_epochs</span><span class="p">):</span>
        <span class="n">total_loss</span> <span class="o">=</span> <span class="mi">0</span>
        <span class="k">for</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="ow">in</span> <span class="n">train_loader</span><span class="p">:</span>
            <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">inputs</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">),</span> <span class="n">labels</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>  <span class="c1"># 确保inputs和labels都在同一个设备上
</span>            <span class="c1"># labels = labels.to(device)
</span>            <span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>

            <span class="n">loss</span> <span class="o">=</span> <span class="n">criterion</span><span class="p">(</span><span class="n">outputs</span><span class="p">,</span> <span class="n">labels</span><span class="p">)</span>

            <span class="n">optimizer</span><span class="p">.</span><span class="n">zero_grad</span><span class="p">()</span>
            <span class="n">loss</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>
            <span class="n">optimizer</span><span class="p">.</span><span class="n">step</span><span class="p">()</span>
            <span class="n">total_loss</span> <span class="o">+=</span> <span class="n">loss</span><span class="p">.</span><span class="n">item</span><span class="p">()</span>
        <span class="n">avg_loss</span> <span class="o">=</span> <span class="n">total_loss</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">train_loader</span><span class="p">)</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Epoch </span><span class="si">{</span><span class="n">epoch</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="n">num_epochs</span><span class="si">}</span><span class="s">, Average Loss: </span><span class="si">{</span><span class="n">avg_loss</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>

<span class="c1"># 在验证集上评估模型
</span><span class="k">def</span> <span class="nf">evaluate_model</span><span class="p">(</span><span class="n">val_loader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">device</span><span class="p">):</span>
    <span class="n">model</span><span class="p">.</span><span class="nb">eval</span><span class="p">()</span>  <span class="c1"># 设置模型为评估模式
</span>    <span class="n">total_correct</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="n">total_samples</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">with</span> <span class="n">torch</span><span class="p">.</span><span class="n">no_grad</span><span class="p">():</span>
        <span class="k">for</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="ow">in</span> <span class="n">val_loader</span><span class="p">:</span>
            <span class="n">inputs</span> <span class="o">=</span> <span class="n">inputs</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
            <span class="n">labels</span> <span class="o">=</span> <span class="n">labels</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
            <span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>
            <span class="n">_</span><span class="p">,</span> <span class="n">predicted</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">outputs</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
            <span class="n">total_samples</span> <span class="o">+=</span> <span class="n">labels</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
            <span class="n">total_correct</span> <span class="o">+=</span> <span class="p">(</span><span class="n">predicted</span> <span class="o">==</span> <span class="n">labels</span><span class="p">).</span><span class="nb">sum</span><span class="p">().</span><span class="n">item</span><span class="p">()</span>
    <span class="k">return</span> <span class="n">total_correct</span> <span class="o">/</span> <span class="n">total_samples</span>

<span class="n">file_path</span> <span class="o">=</span> <span class="s">'./label.csv'</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'gbk'</span><span class="p">)</span>
<span class="n">audio_dir</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span>

<span class="n">df</span><span class="p">[</span><span class="s">'audio'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">'File_Name'</span><span class="p">].</span><span class="nb">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">audio_dir</span><span class="p">,</span> <span class="s">'CNPM_audio'</span><span class="p">,</span> <span class="n">x</span><span class="p">))</span><span class="c1">#x即为File_Name列中的元素，为文件名，audio_dir路径+'CNPM_audio_old'+x文件名=完整路径
</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="s">'audio'</span><span class="p">,</span> <span class="s">'System'</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">,</span> <span class="s">'Type'</span><span class="p">]]</span> <span class="c1"># 包含所有列
</span>
<span class="c1"># 转换过程
</span><span class="n">transform</span> <span class="o">=</span> <span class="n">transforms</span><span class="p">.</span><span class="n">Compose</span><span class="p">([</span><span class="n">transforms</span><span class="p">.</span><span class="n">ToTensor</span><span class="p">()])</span>

<span class="c1"># 假设train_df和val_df已经定义并包含正确的列
</span><span class="n">task_types</span> <span class="o">=</span> <span class="p">[</span><span class="s">'System'</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">,</span> <span class="s">'Type'</span><span class="p">]</span>
<span class="n">num_classes</span> <span class="o">=</span> <span class="p">{</span><span class="s">'System'</span><span class="p">:</span> <span class="mi">12</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">:</span> <span class="mi">12</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span> <span class="s">'Type'</span><span class="p">:</span> <span class="mi">6</span><span class="p">}</span>

<span class="n">dataloaders</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">models</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">criterions</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">optimizers</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">accuracies</span> <span class="o">=</span> <span class="p">{}</span>

<span class="c1">#单任务不同之处在于每次预测不同类的时候，处理数据之后需要各自再传入模型
# 划分训练集和验证集并创建相应的DataLoader， df 是包含音频路径和标签的 DataFrame
</span><span class="n">train_df</span><span class="p">,</span> <span class="n">val_df</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>  <span class="c1"># 以 80-20 的比例划分训练集和验证集，为训练集和验证集创建相应的 DataLoader
</span>
<span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">task_types</span><span class="p">:</span><span class="c1">#循环4次，task_types = ['System', 'Tonic', 'Pattern', 'Type']
</span>    <span class="c1"># 创建数据集实例
</span>    <span class="n">train_dataset</span> <span class="o">=</span> <span class="n">AudioDataset</span><span class="p">(</span><span class="n">train_df</span><span class="p">,</span> <span class="n">label_column</span><span class="o">=</span><span class="n">task</span><span class="p">,</span> <span class="n">transform</span><span class="o">=</span><span class="n">transform</span><span class="p">)</span>
    <span class="n">val_dataset</span> <span class="o">=</span> <span class="n">AudioDataset</span><span class="p">(</span><span class="n">val_df</span><span class="p">,</span> <span class="n">label_column</span><span class="o">=</span><span class="n">task</span><span class="p">,</span> <span class="n">transform</span><span class="o">=</span><span class="n">transform</span><span class="p">)</span>
    <span class="c1"># 创建数据加载器
</span>    <span class="n">train_loader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">train_dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">custom_collate_fn</span><span class="p">)</span>
    <span class="n">val_loader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">val_dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">custom_collate_fn</span><span class="p">)</span>
    <span class="n">dataloaders</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">train_loader</span><span class="p">,</span> <span class="n">val_loader</span><span class="p">)</span>

    <span class="c1"># 初始化模型
</span>    <span class="n">device</span> <span class="o">=</span> <span class="s">"cuda"</span> <span class="k">if</span> <span class="n">torch</span><span class="p">.</span><span class="n">cuda</span><span class="p">.</span><span class="n">is_available</span><span class="p">()</span> <span class="k">else</span> <span class="s">"mps"</span> <span class="k">if</span> <span class="n">torch</span><span class="p">.</span><span class="n">backends</span><span class="p">.</span><span class="n">mps</span><span class="p">.</span><span class="n">is_available</span><span class="p">()</span> <span class="k">else</span> <span class="s">"cpu"</span>
    <span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span> <span class="o">=</span> <span class="n">initialize_model</span><span class="p">(</span><span class="n">num_classes</span><span class="p">[</span><span class="n">task</span><span class="p">],</span> <span class="n">device</span><span class="p">)</span>

    <span class="n">models</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="o">=</span> <span class="n">model</span>
    <span class="n">criterions</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="o">=</span> <span class="n">criterion</span>
    <span class="n">optimizers</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="o">=</span> <span class="n">optimizer</span>

    <span class="n">epochs</span> <span class="o">=</span> <span class="mi">3</span>
    <span class="c1"># 训练和评估模型
</span>    <span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">epochs</span><span class="p">):</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Epoch </span><span class="si">{</span><span class="n">epoch</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="se">\n</span><span class="s">-------------------------------"</span><span class="p">)</span>
        <span class="c1"># 使用已划分的数据集
</span>        <span class="n">train_model</span><span class="p">(</span><span class="n">train_loader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">,</span> <span class="n">device</span><span class="p">,</span> <span class="n">num_epochs</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># 为了示例，设置为1个epoch
</span>        <span class="n">accuracies</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="o">=</span> <span class="n">evaluate_model</span><span class="p">(</span><span class="n">val_loader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">device</span><span class="p">)</span>
        <span class="c1"># accuracies[task] = accuracies[task] / len(val_loader.dataset)
</span>    <span class="k">if</span> <span class="n">task</span> <span class="o">==</span> <span class="s">"System"</span><span class="p">:</span>
        <span class="n">ACC1</span> <span class="o">=</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'System'</span><span class="p">]</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"ACC1:"</span><span class="p">,</span> <span class="n">ACC1</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">task</span> <span class="o">==</span> <span class="s">"Tonic"</span><span class="p">:</span>
        <span class="n">ACC2</span> <span class="o">=</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">]</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"ACC2:"</span><span class="p">,</span> <span class="n">ACC2</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">task</span> <span class="o">==</span> <span class="s">"Pattern"</span><span class="p">:</span>
        <span class="n">ACC3</span> <span class="o">=</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">]</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"ACC3:"</span><span class="p">,</span> <span class="n">ACC3</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">task</span> <span class="o">==</span> <span class="s">"Type"</span><span class="p">:</span>
        <span class="n">ACC5</span> <span class="o">=</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Type'</span><span class="p">]</span>
        <span class="k">print</span><span class="p">(</span><span class="s">"ACC5:"</span><span class="p">,</span><span class="n">ACC5</span><span class="p">)</span>

<span class="n">ACC4</span> <span class="o">=</span> <span class="p">(</span><span class="n">accuracies</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">]</span> <span class="o">+</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">])</span> <span class="o">/</span> <span class="mi">2</span>
<span class="k">print</span><span class="p">(</span><span class="s">"ACC4:"</span><span class="p">,</span><span class="n">ACC4</span><span class="p">)</span>
<span class="n">ACC6</span> <span class="o">=</span> <span class="p">(</span><span class="n">accuracies</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">]</span> <span class="o">+</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">]</span> <span class="o">+</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Type'</span><span class="p">])</span> <span class="o">/</span> <span class="mi">3</span>
<span class="k">print</span><span class="p">(</span><span class="s">"ACC6:"</span><span class="p">,</span><span class="n">ACC6</span><span class="p">)</span>

<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC1(System Accuracy): </span><span class="si">{</span><span class="n">ACC1</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC2(Tonic Accuracy): </span><span class="si">{</span><span class="n">ACC2</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC3(Pattern Accuracy): </span><span class="si">{</span><span class="n">ACC3</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC4(Average Tonic and Pattern Accuracy): </span><span class="si">{</span><span class="n">ACC4</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC5(Type Accuracy): </span><span class="si">{</span><span class="n">ACC5</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC6(Average of Tonic, Pattern, and Type Accuracy): </span><span class="si">{</span><span class="n">ACC6</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Done!"</span><span class="p">)</span>
</code></pre></div></div>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="MRS" /><category term="MRS" /><summary type="html"><![CDATA[单任务残差网络ResNet]]></summary></entry><entry><title type="html">MRS多任务残差网络</title><link href="https://cendok.github.io/2026/03/16/MRS-ResNet%E5%A4%9A%E4%BB%BB%E5%8A%A1" rel="alternate" type="text/html" title="MRS多任务残差网络" /><published>2026-03-16T00:00:00+00:00</published><updated>2026-03-16T00:00:00+00:00</updated><id>https://cendok.github.io/2026/03/16/MRS-ResNet%E5%A4%9A%E4%BB%BB%E5%8A%A1</id><content type="html" xml:base="https://cendok.github.io/2026/03/16/MRS-ResNet%E5%A4%9A%E4%BB%BB%E5%8A%A1"><![CDATA[<h2 id="resnet18多任务网络架构">ResNet18多任务网络架构</h2>

<p><img src="/images/posts/2026-3-16-MRS-ResNet多任务/ResNet-Multi-task.png" alt="image" /></p>

<ol>
  <li>
    <p><strong>输入层</strong>：</p>

    <ul>
      <li>接受输入图像数据，通常是经过一些预处理步骤的图像张量。</li>
    </ul>
  </li>
  <li>
    <p><strong>初始卷积层（Conv）</strong>：</p>

    <ul>
      <li>卷积核大小（k）: 7x7</li>
      <li>步长（s）: 1</li>
      <li>输出通道数（c）: 64</li>
      <li>作用：用于提取图像的初步特征。</li>
    </ul>
  </li>
  <li>
    <p><strong>最大池化层（Max Pooling）</strong>：</p>

    <ul>
      <li>池化核大小: 3x3</li>
      <li>步长: 2</li>
      <li>作用：用于降低特征的空间维度，并提高对输入变化的不变性。</li>
    </ul>
  </li>
  <li>
    <p><strong>残差块（Residual Blocks）</strong>：</p>

    <ul>
      <li>由两个大小为 3x3 的卷积层组成，步长（s）为 1 或 2。</li>
      <li>每个卷积层后面接着批归一化和ReLU激活函数。</li>
      <li>输出通道数（c）: 取决于残差块的设置。</li>
      <li>重复次数：ResNet18特定的重复次数，一般为 2, 2, 2, 2。</li>
      <li>残差连接：<strong>每个块的输出与输入通过相加操作融合</strong>，再通过ReLU激活。</li>
    </ul>
  </li>
  <li>
    <p><strong>全局平均池化层（AdaptiveAvgPool）</strong>：</p>

    <ul>
      <li>缩减特征图至 1x1 的尺寸，为连接全连接层做准备。</li>
    </ul>
  </li>
  <li>
    <p><strong>多任务分支</strong>：</p>

    <ul>
      <li>
        <p>每个任务有独立的全连接层和分类器。</p>
      </li>
      <li>
        <p>分支1：</p>

        <ul>
          <li>全连接层（Linear）: 输入特征数与ResNet18最后一层输出特征数相同，输出特征数为 128。</li>
          <li>激活函数（ReLU）: 非线性激活。</li>
          <li>第二个全连接层（Linear）: 输出特征数为任务1的分类数。<strong>（12个分类，’C’, ‘C#’, ‘D’, ‘D#’, ‘E’, ‘F’, ‘F#’, ‘G’, ‘G#’, ‘A’, ‘A#’, ‘B’）</strong></li>
          <li>分类器（Softmax）: 将输出转化为概率分布。</li>
        </ul>
      </li>
      <li>
        <p>分支2：</p>

        <ul>
          <li>
            <p>全连接层（Linear）: 同上。</p>
          </li>
          <li>
            <p>激活函数（ReLU）: 同上。</p>
          </li>
          <li>
            <p>第二个全连接层（Linear）: 输出特征数为任务2的分类数。<strong>（12个分类，’C’, ‘C#’, ‘D’, ‘D#’, ‘E’, ‘F’, ‘F#’, ‘G’, ‘G#’, ‘A’, ‘A#’, ‘B’）</strong></p>
          </li>
          <li>
            <p>分类器（Softmax）: 同上。</p>

            <p>分支3：</p>

            <ul>
              <li>全连接层（Linear）: 同上。</li>
              <li>激活函数（ReLU）: 同上。</li>
              <li>第二个全连接层（Linear）: 输出特征数为任务3的分类数。<strong>（5个分类，宫商角徵羽）</strong></li>
              <li>分类器（Softmax）: 同上。</li>
            </ul>
          </li>
        </ul>
      </li>
    </ul>
  </li>
</ol>

<h3 id="实现">实现</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">librosa</span>
<span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="n">nn</span>
<span class="kn">import</span> <span class="nn">torch.nn.functional</span> <span class="k">as</span> <span class="n">F</span>
<span class="kn">from</span> <span class="nn">torch.utils.data</span> <span class="kn">import</span> <span class="n">Dataset</span><span class="p">,</span> <span class="n">DataLoader</span>
<span class="kn">from</span> <span class="nn">torchvision</span> <span class="kn">import</span> <span class="n">transforms</span>
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">train_test_split</span>
<span class="kn">from</span> <span class="nn">torch.nn.utils.rnn</span> <span class="kn">import</span> <span class="n">pad_sequence</span>

<span class="k">def</span> <span class="nf">generate_cqt_spectrogram</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">resample_rate</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">segment_duration</span><span class="o">=</span><span class="mi">20</span><span class="p">):</span>
    <span class="n">y</span><span class="p">,</span> <span class="n">sr</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
    <span class="n">cqt</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">cqt</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">sr</span><span class="o">=</span><span class="n">sr</span><span class="p">,</span> <span class="n">fmin</span><span class="o">=</span><span class="n">librosa</span><span class="p">.</span><span class="n">note_to_hz</span><span class="p">(</span><span class="s">'C1'</span><span class="p">),</span> <span class="n">n_bins</span><span class="o">=</span><span class="mi">168</span><span class="p">,</span> <span class="n">bins_per_octave</span><span class="o">=</span><span class="mi">24</span><span class="p">)</span>
    <span class="n">cqt_amplitude</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">cqt</span><span class="p">)</span><span class="c1">#标准化
</span>    <span class="n">cqt_resampled</span> <span class="o">=</span> <span class="n">librosa</span><span class="p">.</span><span class="n">resample</span><span class="p">(</span><span class="n">cqt_amplitude</span><span class="p">,</span> <span class="n">orig_sr</span><span class="o">=</span><span class="n">sr</span><span class="p">,</span> <span class="n">target_sr</span><span class="o">=</span><span class="n">resample_rate</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="c1">#下采样
</span>
<span class="c1">#计算一个片段中的样本数
</span>    <span class="n">samples_per_segment</span> <span class="o">=</span> <span class="n">resample_rate</span> <span class="o">*</span> <span class="n">segment_duration</span><span class="c1">#5*20=100个时间点
</span>    <span class="n">total_segments</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">/</span> <span class="n">samples_per_segment</span><span class="p">))</span>

    <span class="n">segments</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">total_segments</span><span class="p">):</span><span class="c1">#如果末尾超过 CQT 的长度，则用零填充剩余部分，达到指定长度
</span>        <span class="n">start</span> <span class="o">=</span> <span class="n">i</span> <span class="o">*</span> <span class="n">samples_per_segment</span>
        <span class="n">end</span> <span class="o">=</span> <span class="n">start</span> <span class="o">+</span> <span class="n">samples_per_segment</span>
        <span class="k">if</span> <span class="n">end</span> <span class="o">&gt;</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
            <span class="n">padding_length</span> <span class="o">=</span> <span class="n">end</span> <span class="o">-</span> <span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
            <span class="n">padding</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">padding_length</span><span class="p">))</span>
            <span class="n">segment</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">cqt_resampled</span><span class="p">[:,</span> <span class="n">start</span><span class="p">:</span><span class="n">cqt_resampled</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]],</span> <span class="n">padding</span><span class="p">))</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">segment</span> <span class="o">=</span> <span class="n">cqt_resampled</span><span class="p">[:,</span> <span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">]</span>
        <span class="n">segments</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">segment</span><span class="p">)</span>
    <span class="n">segments</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">segments</span><span class="p">)</span>  <span class="c1"># 输出为数组格式
</span>    <span class="k">print</span><span class="p">(</span><span class="s">"segments shape:"</span><span class="p">,</span> <span class="n">segments</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">segments</span>

<span class="k">class</span> <span class="nc">AudioDataset</span><span class="p">(</span><span class="n">Dataset</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">transform</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">transform</span> <span class="o">=</span> <span class="n">transform</span>

    <span class="k">def</span> <span class="nf">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">idx</span><span class="p">):</span>
        <span class="n">audio_path</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">idx</span><span class="p">][</span><span class="s">'audio'</span><span class="p">]</span>
        <span class="n">spectrograms</span> <span class="o">=</span> <span class="n">generate_cqt_spectrogram</span><span class="p">(</span><span class="n">audio_path</span><span class="p">)</span>

        <span class="c1"># 选择一个片段进行演示，通常你会基于某种逻辑选择或使用所有片段
</span>        <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">spectrograms</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="c1">#只取了第一列的，没有完全采用。 也只需要第一列，后面列全都是0 # 示例中使用第一个片段
</span>        <span class="n">eps</span> <span class="o">=</span> <span class="mf">1e-10</span>  <span class="c1"># 避免除以零
</span>        <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">spectrogram</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">)</span> <span class="o">+</span> <span class="n">eps</span><span class="p">)</span>  <span class="c1"># 标准化到[0,1]
</span>        <span class="n">spectrogram</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>

        <span class="n">labels</span> <span class="o">=</span> <span class="p">{</span>
            <span class="s">'System'</span><span class="p">:</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">idx</span><span class="p">][</span><span class="s">'System'</span><span class="p">]),</span>
            <span class="s">'Tonic'</span><span class="p">:</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">idx</span><span class="p">][</span><span class="s">'Tonic'</span><span class="p">]),</span>
            <span class="s">'Pattern'</span><span class="p">:</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">df</span><span class="p">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">idx</span><span class="p">][</span><span class="s">'Pattern'</span><span class="p">]),</span>
        <span class="p">}</span>
        <span class="k">return</span> <span class="n">torch</span><span class="p">.</span><span class="n">from_numpy</span><span class="p">(</span><span class="n">spectrogram</span><span class="p">).</span><span class="nb">float</span><span class="p">(),</span> <span class="n">labels</span>

<span class="k">def</span> <span class="nf">custom_collate_fn</span><span class="p">(</span><span class="n">batch</span><span class="p">):</span><span class="c1">#数据批处理
</span>    <span class="n">spectrograms</span><span class="p">,</span> <span class="n">labels_batch</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">batch</span><span class="p">)</span>
    <span class="n">spectrograms_padded</span> <span class="o">=</span> <span class="n">pad_sequence</span><span class="p">(</span><span class="n">spectrograms</span><span class="p">,</span> <span class="n">batch_first</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">padding_value</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
    <span class="n">labels</span> <span class="o">=</span> <span class="p">{</span><span class="n">task</span><span class="p">:</span> <span class="n">torch</span><span class="p">.</span><span class="n">tensor</span><span class="p">([</span><span class="n">label</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="k">for</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">labels_batch</span><span class="p">])</span> <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">labels_batch</span><span class="p">[</span><span class="mi">0</span><span class="p">]}</span>
    <span class="k">return</span> <span class="n">spectrograms_padded</span><span class="p">,</span> <span class="n">labels</span>

<span class="k">class</span> <span class="nc">BasicBlock</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span><span class="c1">#构建残差块
</span>    <span class="n">expansion</span> <span class="o">=</span> <span class="mi">1</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">in_planes</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">BasicBlock</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>

        <span class="bp">self</span><span class="p">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_planes</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span><span class="c1"># 卷积层
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">bn1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="n">planes</span><span class="p">)</span><span class="c1"># 批归一化
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">conv2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">planes</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span><span class="c1"># 卷积层
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">bn2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="n">planes</span><span class="p">)</span><span class="c1"># 批归一化
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">shortcut</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">()</span>

        <span class="c1"># 初始化shortcut连接，如果条件满足则在后面修改此结构
</span>        <span class="k">if</span> <span class="n">stride</span> <span class="o">!=</span> <span class="mi">1</span> <span class="ow">or</span> <span class="n">in_planes</span> <span class="o">!=</span> <span class="bp">self</span><span class="p">.</span><span class="n">expansion</span> <span class="o">*</span> <span class="n">planes</span><span class="p">:</span>
            <span class="c1"># 检查是否需要调整shortcut路径的维度或步长
</span>            <span class="bp">self</span><span class="p">.</span><span class="n">shortcut</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
                <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_planes</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">expansion</span> <span class="o">*</span> <span class="n">planes</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">),</span>
                <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">expansion</span> <span class="o">*</span> <span class="n">planes</span><span class="p">)</span>
            <span class="p">)</span>
            <span class="c1"># 如果需要，通过1x1卷积调整维度并匹配主路径的步长。
</span>
    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">out</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">bn1</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>  <span class="c1"># 卷积层
</span>        <span class="n">out</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">bn2</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv2</span><span class="p">(</span><span class="n">out</span><span class="p">))</span>  <span class="c1"># 归一化层
</span>        <span class="n">out</span> <span class="o">+=</span> <span class="bp">self</span><span class="p">.</span><span class="n">shortcut</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">out</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">out</span>

<span class="k">class</span> <span class="nc">MultiTaskResNet</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">block</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">,</span> <span class="n">num_classes_dict</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">MultiTaskResNet</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>

        <span class="bp">self</span><span class="p">.</span><span class="n">in_planes</span> <span class="o">=</span> <span class="mi">64</span><span class="c1"># 修改输入层通道数为1,移除降采样
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="n">in_channels</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">out_channels</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">bn1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="mi">64</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">maxpool</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">MaxPool2d</span><span class="p">(</span><span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>

<span class="c1">#num_blocks = [2, 2, 2, 2]，残差块2个一层，一共4层，8个残差块。
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">layer1</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer2</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer3</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer4</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">512</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>

        <span class="bp">self</span><span class="p">.</span><span class="n">avgpool</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">AdaptiveAvgPool2d</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>

        <span class="c1"># 为每个任务添加全连接层，多任务左图的Linear、ReLu、Linear、SoftMax部分
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">system_fc</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span><span class="p">,</span> <span class="n">num_classes_dict</span><span class="p">[</span><span class="s">'System'</span><span class="p">])</span><span class="c1">#num_classes_dict['System'] = 12
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">tonic_fc</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span><span class="p">,</span> <span class="n">num_classes_dict</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">])</span><span class="c1">#num_classes_dict['Tonic'] = 12
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">pattern_fc</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span><span class="p">,</span> <span class="n">num_classes_dict</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">])</span><span class="c1">#num_classes_dict['Pattern'] = 5
</span>
    <span class="k">def</span> <span class="nf">_make_layer</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">block</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">,</span> <span class="n">stride</span><span class="p">):</span>
        <span class="n">strides</span> <span class="o">=</span> <span class="p">[</span><span class="n">stride</span><span class="p">]</span> <span class="o">+</span> <span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">num_blocks</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
        <span class="n">layers</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="k">for</span> <span class="n">stride</span> <span class="ow">in</span> <span class="n">strides</span><span class="p">:</span>
            <span class="n">layers</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">block</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">in_planes</span><span class="p">,</span> <span class="n">planes</span><span class="p">,</span> <span class="n">stride</span><span class="p">))</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">in_planes</span> <span class="o">=</span> <span class="n">planes</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span>
        <span class="k">return</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span><span class="o">*</span><span class="n">layers</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">bn1</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">maxpool</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>

        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer1</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer2</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer3</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer4</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>

        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">avgpool</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">flatten</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>

        <span class="c1"># 任务特定的预测
</span>        <span class="n">system_pred</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">system_fc</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">tonic_pred</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">tonic_fc</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">pattern_pred</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">pattern_fc</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>

        <span class="k">return</span> <span class="p">{</span><span class="s">'System'</span><span class="p">:</span> <span class="n">system_pred</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">:</span> <span class="n">tonic_pred</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">:</span> <span class="n">pattern_pred</span><span class="p">}</span>

<span class="k">def</span> <span class="nf">initialize_model</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">device</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.001</span><span class="p">):</span>
    <span class="c1"># 标签值不是从0开始的，可以通过减去最小值来调整它们
</span>    <span class="n">df</span><span class="p">[</span><span class="s">'System'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">'System'</span><span class="p">]</span> <span class="o">-</span> <span class="n">df</span><span class="p">[</span><span class="s">'System'</span><span class="p">].</span><span class="nb">min</span><span class="p">()</span>
    <span class="n">df</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">]</span> <span class="o">-</span> <span class="n">df</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">].</span><span class="nb">min</span><span class="p">()</span>
    <span class="n">df</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">]</span> <span class="o">-</span> <span class="n">df</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">].</span><span class="nb">min</span><span class="p">()</span>

<span class="c1">#为了实现多任务而定义的字典，方便训练不同任务的时候调取不同的参数
</span>    <span class="n">num_classes_dict</span> <span class="o">=</span> <span class="p">{</span>
        <span class="s">'System'</span><span class="p">:</span> <span class="mi">12</span><span class="p">,</span>
        <span class="s">'Tonic'</span><span class="p">:</span> <span class="mi">12</span><span class="p">,</span>
        <span class="s">'Pattern'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
    <span class="p">}</span>

    <span class="n">num_blocks</span> <span class="o">=</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">]</span>
    <span class="c1"># num_blocks = [2, 2, 2, 2]，残差块2个一层，一共4层，8个残差块。
</span>    <span class="n">model</span> <span class="o">=</span> <span class="n">MultiTaskResNet</span><span class="p">(</span><span class="n">BasicBlock</span><span class="p">,</span> <span class="n">num_blocks</span><span class="p">,</span> <span class="n">num_classes_dict</span><span class="p">).</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
    <span class="n">criterion</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">CrossEntropyLoss</span><span class="p">()</span>
    <span class="n">optimizer</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">optim</span><span class="p">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">(),</span> <span class="n">lr</span><span class="o">=</span><span class="n">learning_rate</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span>

<span class="c1"># 训练模型的函数需要对每个任务计算损失，并将这些损失合并来更新模型
</span><span class="k">def</span> <span class="nf">train_model</span><span class="p">(</span><span class="n">train_loader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">,</span> <span class="n">device</span><span class="p">):</span>
    <span class="n">model</span><span class="p">.</span><span class="n">train</span><span class="p">()</span>
    <span class="n">total_loss</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">for</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="ow">in</span> <span class="n">train_loader</span><span class="p">:</span>
        <span class="n">inputs</span> <span class="o">=</span> <span class="n">inputs</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
        <span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>

        <span class="n">loss</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">criterion</span><span class="p">(</span><span class="n">outputs</span><span class="p">[</span><span class="n">task</span><span class="p">],</span> <span class="n">labels</span><span class="p">[</span><span class="n">task</span><span class="p">].</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">))</span> <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">labels</span><span class="p">)</span>

        <span class="c1">#交叉熵损失函数
</span>        <span class="n">optimizer</span><span class="p">.</span><span class="n">zero_grad</span><span class="p">()</span>
        <span class="n">loss</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>
        <span class="n">optimizer</span><span class="p">.</span><span class="n">step</span><span class="p">()</span>
        <span class="n">total_loss</span> <span class="o">+=</span> <span class="n">loss</span><span class="p">.</span><span class="n">item</span><span class="p">()</span>

    <span class="n">avg_loss</span> <span class="o">=</span> <span class="n">total_loss</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">train_loader</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Average Loss: </span><span class="si">{</span><span class="n">avg_loss</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">evaluate_model</span><span class="p">(</span><span class="n">val_loader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">device</span><span class="p">):</span>
    <span class="n">model</span><span class="p">.</span><span class="nb">eval</span><span class="p">()</span>
    <span class="n">correct</span> <span class="o">=</span> <span class="p">{</span><span class="n">task</span><span class="p">:</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="p">[</span><span class="s">'System'</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">]}</span>
    <span class="n">total</span> <span class="o">=</span> <span class="p">{</span><span class="n">task</span><span class="p">:</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="p">[</span><span class="s">'System'</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">]}</span>
    <span class="k">with</span> <span class="n">torch</span><span class="p">.</span><span class="n">no_grad</span><span class="p">():</span>
        <span class="k">for</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">labels</span> <span class="ow">in</span> <span class="n">val_loader</span><span class="p">:</span>
            <span class="n">inputs</span> <span class="o">=</span> <span class="n">inputs</span><span class="p">.</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>
            <span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>
            <span class="k">for</span> <span class="n">task</span><span class="p">,</span> <span class="n">preds</span> <span class="ow">in</span> <span class="n">outputs</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
                <span class="n">_</span><span class="p">,</span> <span class="n">predicted</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">preds</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
                <span class="n">correct</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="o">+=</span> <span class="p">(</span><span class="n">predicted</span> <span class="o">==</span> <span class="n">labels</span><span class="p">[</span><span class="n">task</span><span class="p">].</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)).</span><span class="nb">sum</span><span class="p">().</span><span class="n">item</span><span class="p">()</span>
                <span class="n">total</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="o">+=</span> <span class="n">labels</span><span class="p">[</span><span class="n">task</span><span class="p">].</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>

    <span class="n">accuracies</span> <span class="o">=</span> <span class="p">{</span><span class="n">task</span><span class="p">:</span> <span class="n">correct</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="o">/</span> <span class="n">total</span><span class="p">[</span><span class="n">task</span><span class="p">]</span> <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">total</span><span class="p">}</span>
    <span class="k">return</span> <span class="n">accuracies</span>

<span class="c1"># 从这里开始处理数据
</span><span class="n">file_path</span> <span class="o">=</span> <span class="s">'./label.csv'</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'gbk'</span><span class="p">)</span>
<span class="n">audio_dir</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">file_path</span><span class="p">)</span>

<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="s">'File_Name'</span><span class="p">,</span> <span class="s">'System'</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">]]</span>
<span class="n">df</span><span class="p">[</span><span class="s">'audio'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">'File_Name'</span><span class="p">].</span><span class="nb">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">audio_dir</span><span class="p">,</span> <span class="s">'CNPM_audio'</span><span class="p">,</span> <span class="n">x</span><span class="p">))</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="s">'audio'</span><span class="p">,</span> <span class="s">'System'</span><span class="p">,</span> <span class="s">'Tonic'</span><span class="p">,</span> <span class="s">'Pattern'</span><span class="p">]]</span>

<span class="c1"># 转换过程
</span><span class="n">transform</span> <span class="o">=</span> <span class="n">transforms</span><span class="p">.</span><span class="n">Compose</span><span class="p">([</span><span class="n">transforms</span><span class="p">.</span><span class="n">ToTensor</span><span class="p">()])</span>

<span class="c1">#由路径到.wav文件，顺带分割训练集和验证集
</span><span class="n">train_df</span><span class="p">,</span> <span class="n">val_df</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
<span class="n">train_dataset</span> <span class="o">=</span> <span class="n">AudioDataset</span><span class="p">(</span><span class="n">train_df</span><span class="p">)</span>
<span class="n">val_dataset</span> <span class="o">=</span> <span class="n">AudioDataset</span><span class="p">(</span><span class="n">val_df</span><span class="p">)</span>

<span class="n">train_loader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">train_dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">custom_collate_fn</span><span class="p">)</span>
<span class="n">val_loader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">val_dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">collate_fn</span><span class="o">=</span><span class="n">custom_collate_fn</span><span class="p">)</span>

<span class="c1"># 设备选择
</span><span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">device</span><span class="p">(</span><span class="s">"cuda"</span> <span class="k">if</span> <span class="n">torch</span><span class="p">.</span><span class="n">cuda</span><span class="p">.</span><span class="n">is_available</span><span class="p">()</span> <span class="k">else</span> <span class="s">"cpu"</span><span class="p">)</span>
<span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span> <span class="o">=</span> <span class="n">initialize_model</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">device</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.001</span><span class="p">)</span>

<span class="c1"># 训练和评估模型
</span><span class="n">epochs</span> <span class="o">=</span> <span class="mi">10</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">epochs</span><span class="p">):</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Epoch </span><span class="si">{</span><span class="n">t</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="se">\n</span><span class="s">-------------------------------"</span><span class="p">)</span>
    <span class="n">train_model</span><span class="p">(</span><span class="n">train_loader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">criterion</span><span class="p">,</span> <span class="n">optimizer</span><span class="p">,</span> <span class="n">device</span><span class="p">)</span>
    <span class="n">accuracies</span> <span class="o">=</span> <span class="n">evaluate_model</span><span class="p">(</span><span class="n">val_loader</span><span class="p">,</span> <span class="n">model</span><span class="p">,</span> <span class="n">device</span><span class="p">)</span>

    <span class="c1"># 修改输出的精确值
</span>    <span class="n">ACC1</span> <span class="o">=</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'System'</span><span class="p">]</span>
    <span class="n">ACC2</span> <span class="o">=</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">]</span>
    <span class="n">ACC3</span> <span class="o">=</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">]</span>
    <span class="n">ACC4</span> <span class="o">=</span> <span class="p">(</span><span class="n">accuracies</span><span class="p">[</span><span class="s">'Tonic'</span><span class="p">]</span> <span class="o">+</span> <span class="n">accuracies</span><span class="p">[</span><span class="s">'Pattern'</span><span class="p">])</span> <span class="o">/</span> <span class="mi">2</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC1(System Accuracy): </span><span class="si">{</span><span class="n">ACC1</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC2(Tonic Accuracy): </span><span class="si">{</span><span class="n">ACC2</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC3(Pattern Accuracy): </span><span class="si">{</span><span class="n">ACC3</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"ACC4(Average Tonic and Pattern Accuracy): </span><span class="si">{</span><span class="n">ACC4</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Done!"</span><span class="p">)</span>
</code></pre></div></div>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="MRS" /><category term="MRS" /><summary type="html"><![CDATA[多任务残差网络ResNet]]></summary></entry><entry><title type="html">MRS数据集及音乐理论</title><link href="https://cendok.github.io/2026/03/16/MRS%E6%95%B0%E6%8D%AE%E9%9B%86%E5%8F%8A%E9%9F%B3%E4%B9%90%E7%90%86%E8%AE%BA" rel="alternate" type="text/html" title="MRS数据集及音乐理论" /><published>2026-03-16T00:00:00+00:00</published><updated>2026-03-16T00:00:00+00:00</updated><id>https://cendok.github.io/2026/03/16/MRS%E6%95%B0%E6%8D%AE%E9%9B%86%E5%8F%8A%E9%9F%B3%E4%B9%90%E7%90%86%E8%AE%BA</id><content type="html" xml:base="https://cendok.github.io/2026/03/16/MRS%E6%95%B0%E6%8D%AE%E9%9B%86%E5%8F%8A%E9%9F%B3%E4%B9%90%E7%90%86%E8%AE%BA"><![CDATA[<h1 id="music-recommendation-system">Music-Recommendation-System</h1>

<h2 id="文献下载">文献下载</h2>

<p><a href="https://archives.ismir.net/ismir2022/paper/000041.pdf">AUTOMATIC CHINESE NATIONAL PENTATONIC MODES
RECOGNITION USING CONVOLUTIONAL NEURAL NETWORK</a></p>

<h2 id="cnpm-database数据集">CNPM Database数据集</h2>

<p>一个用于计算音乐学的中国民族五声调式数据库</p>

<p>数据集：CNPM (Chinese National Pentatonic Modes) Dataset 中国五声数据集</p>

<h3 id="hugging-face下载">Hugging Face下载</h3>

<p>https://huggingface.co/datasets/ccmusic-database/CNPM</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git clone https://huggingface.co/datasets/ccmusic-database/CNPM
</code></pre></div></div>

<p><strong>音频</strong></p>

<p>音频经过裁剪，只有角和商音，时长在15-30s左右</p>

<p>26首商音乐曲，43首角音乐曲</p>

<p><strong>dataset文字结构</strong></p>

<table>
  <thead>
    <tr>
      <th>曲名/Title</th>
      <th>演奏者/Artist</th>
      <th>专辑/Album</th>
      <th>调式全称/Mode Name</th>
      <th>文件名/File Name</th>
      <th>同宫系统/System</th>
      <th>主音音名/Tonic</th>
      <th>样式/Pattern</th>
      <th>种类/Type</th>
      <th>时长/Length</th>
      <th>备注/Note</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1级-小鸟朝凤</td>
      <td>儿童歌曲</td>
      <td> </td>
      <td>D宫清乐七声</td>
      <td>1级-小鸟朝凤 - 儿童歌曲.mp3</td>
      <td>2</td>
      <td>2</td>
      <td>0</td>
      <td>4</td>
      <td>0:01:53</td>
      <td> </td>
    </tr>
    <tr>
      <td>暗香</td>
      <td>纯音乐</td>
      <td>月满西楼</td>
      <td>G宫七声清乐</td>
      <td>暗香 - 纯音乐.mp3</td>
      <td>7</td>
      <td>7</td>
      <td>0</td>
      <td>4</td>
      <td>0:03:35</td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p>命名格式：jue1.wav、shang1.wav</p>

<p>分析的都是.wav文件，MP3是加密过的。如同.docx文件转换为PDF文件。</p>

<p>将模式所属的<strong>同音系统称为“系统”</strong>，<strong>主音的音调称为“主音”，模式模式称为“模式”，模式类型称为“类型”</strong>（<strong>System/Tonic/Pattern/Type</strong>）。分类时的主要任务是识别<strong>模式（Pattern）和主音（Tonic）</strong>，以系统<strong>（System）</strong>作为辅助项目，然后是类型<strong>（Type）</strong>分类作为次要任务。根据主音t和系统s，我们可以推断出该模式的模式。当t等于s时，它是锣模式。当t比s高2个半音时，为尚模式。4个半音高是觉模，7个半音高是志模，9个半音高是于模。</p>

<h3 id="完整版ccmusic-dataset申请">完整版CCMUSIC DATASET申请</h3>

<p>数据集包含287段录音。</p>

<p><a href="https://github.com/ccmusic-database/ccmusic-database.github.io">ccmusic-database</a></p>

<p><a href="https://ccmusic-database.github.io/">CCMUSIC DATASET</a></p>

<p><a href="https://ccmusic-database.github.io/en/database/ccm.html">Multi-functional Music Database for MIR Research</a></p>

<p><a href="https://ccmusic-database.github.io/en/download.html">CCMUSIC DATASET</a></p>

<p>写邮件发过来压缩包</p>

<h3 id="扩充数据集">扩充数据集</h3>

<p>官方数据集完整218首，扩充至300首。</p>

<p>自行下载，裁剪至60s内，转换格式为.wav</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MP3文件存储路径：

E:<span class="se">\1</span>五音项目整体<span class="se">\1</span>_五音乐曲汇总<span class="se">\M</span>P3<span class="se">\2</span>0230829有效五音乐曲116首
</code></pre></div></div>

<p>乐曲下载完毕统一转换格式为.wav。</p>

<p><a href="https://mp.weixin.qq.com/s?__biz=Mzg3ODg2MzYyMQ==&amp;mid=2247484494&amp;idx=1&amp;sn=fcecc84e1e1234e64f2464c0b01cc8ff&amp;chksm=cf0c7024f87bf9321e70a4814b6bca4e1197d5c9d6e083bf13ec7df5272b3249daf1eb8959fa&amp;scene=178&amp;cur_album_id=2663172932113940480#rd">QQ音乐格式转换，解锁加密音乐，转mp3格式，酷狗、网易云也能用！</a></p>

<p><a href="https://pan.baidu.com/s/16Hni3qSj-y99eeRqfmU00w?pwd=fy4h#list/path=%2Fsharelink4139560028-1091669494308773%2F音乐解锁工具v1.10.3&amp;parentPath=%2Fsharelink4139560028-1091669494308773">音乐解锁工具v1.10.3_免费高速下载</a></p>

<p><strong>扩充后300首数据集下载</strong>
链接: <a href="https://pan.baidu.com/s/1oU8oHhHMHlrg9VrA6xLs4Q?pwd=v35s">https://pan.baidu.com/s/1oU8oHhHMHlrg9VrA6xLs4Q?pwd=v35s</a> 提取码: v35s 复制这段内容后打开百度网盘手机App，操作更方便哦</p>

<h2 id="理论介绍">理论介绍</h2>

<p><img src="/images/posts/2026-3-16-MRS数据集及音乐理论/Classification-basis.png" alt="image" /></p>

<p>看似四类，实际上只有两类，System和Tonic一类，Pattern和Type一类。Pattern由System和Tonic比较得到，Type由Pattern加偏音得到。所以只需要识别System和Tonic即可。</p>

<p>主要是为了识别Pattern和Tonic，System识别是辅助项目，其次是类别分类。</p>

<p>12个System，12个Tonic，5个Pattern，6个Type</p>

<p><strong>将模式所属的同音系统称为“系统”</strong>，<strong>主音的音调称为“主音”</strong>，<strong>模式模式称为“模式”</strong>，<strong>模式类型称为“类型”</strong>。</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>系统(System)：同宫系统

主音（Tonic）：GABDE

模式（Pattern）：宫商角徵羽

类型（Type）：五声、六声

标签中的12356对应的是 GABDE
</code></pre></div></div>

<h3 id="system">System</h3>

<p><strong>整首进去，匹配12个模板中最相关的</strong></p>

<p>模板由一个基础的移动得到</p>

<p><img src="/images/posts/2026-3-16-MRS数据集及音乐理论/System.png" alt="image" /></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>同宫系统/TongGong System：
0--C 
1--#C/bD 
2--D 
3--#D/bE
4--E
5--F
6--#F/bG
7--G
8--#G/bA
9--A
10--#A/bB
11--B
</code></pre></div></div>

<p>分析待识别乐曲的音阶模式，与12个模板分别比对，分别计算色度向量与每个模板之间的皮尔森相关系数，取最大的那一个模板即为待识别乐曲的TongGong体系System类型。</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>参考文献：C TongGong体系的模板是1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0，其他的可以通过在循环中移动模板来获得。

这些数组代表了一个十二音阶系统中的不同音级（从C到B）的音阶模式。每个数组都是一个音阶模式，其中 1 表示该音级在音阶中出现，而 0 表示不出现。这些模式是通过将基础模式（在这个例子中是C TongGong体系）沿着十二音阶系统循环移位来生成的。

让我们将每个数组与相应的音级对应起来：
C - <span class="o">[</span>1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0]：这是C TongGong体系的基础模式。
C# - <span class="o">[</span>0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]：这是将C TongGong模式向右移动一个音级得到的。
D - <span class="o">[</span>0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1]：这是将C TongGong模式向右移动两个音级得到的。
D# - <span class="o">[</span>1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0]：这是将C TongGong模式向右移动三个音级得到的。
E - <span class="o">[</span>0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1]：这是将C TongGong模式向右移动四个音级得到的。
F - <span class="o">[</span>1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0]：这是将C TongGong模式向右移动五个音级得到的。
F# - <span class="o">[</span>0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0]：这是将C TongGong模式向右移动六个音级得到的。
G - <span class="o">[</span>0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1]：这是将C TongGong模式向右移动七个音级得到的。
G# - <span class="o">[</span>1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0]：这是将C TongGong模式向右移动八个音级得到的。
A - <span class="o">[</span>0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1]：这是将C TongGong模式向右移动九个音级得到的。
A# - <span class="o">[</span>1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0]：这是将C TongGong模式向右移动十个音级得到的。
B - <span class="o">[</span>0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1]：这是将C TongGong模式向右移动十一个音级得到的。
每次移位都相当于在十二音阶系统中向上移动一个半音。这种方法可以用来生成任何音级的特定音阶模式。
</code></pre></div></div>

<h3 id="tonic">Tonic</h3>

<p><strong>最后500帧的色度特征按音高求和，最大值对应的音高名称</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Tonic：
0--C 
1--#C/bD 
2--D 
3--#D/bE
4--E
5--F
6--#F/bG
7--G
8--#G/bA
9--A
10--#A/bB
11--B
</code></pre></div></div>

<p>然后对于主音音高，由于我们分析的大多数音乐在最后都回到主音，因此我们使用一种简单的方法来识别主音：直接将最后500帧的色度特征按音高求和，并将最大值对应的音高名称视为主音音高。</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1"># 音高名称列表
</span>	<span class="n">pitch_names</span> <span class="o">=</span> <span class="p">[</span><span class="s">'C'</span><span class="p">,</span> <span class="s">'C#'</span><span class="p">,</span> <span class="s">'D'</span><span class="p">,</span> <span class="s">'D#'</span><span class="p">,</span> <span class="s">'E'</span><span class="p">,</span> <span class="s">'F'</span><span class="p">,</span> <span class="s">'F#'</span><span class="p">,</span> <span class="s">'G'</span><span class="p">,</span> <span class="s">'G#'</span><span class="p">,</span> <span class="s">'A'</span><span class="p">,</span> <span class="s">'A#'</span><span class="p">,</span> <span class="s">'B'</span><span class="p">]</span>
    <span class="n">tonic_index</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">chroma</span><span class="p">[:,</span> <span class="o">-</span><span class="mi">500</span><span class="p">:],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">))</span>
    <span class="n">tonic</span> <span class="o">=</span> <span class="n">pitch_names</span><span class="p">[</span><span class="n">tonic_index</span><span class="p">]</span>
</code></pre></div></div>

<p><img src="/images/posts/2026-3-16-MRS数据集及音乐理论/Tonic.png" alt="image" /></p>

<h3 id="type">Type</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0--五声/Pentatonic
1--六声（清角）/Hexatonic <span class="o">(</span>Qingjue<span class="o">)</span> 
2--六声（变宫）/Hexatonic <span class="o">(</span>Biangong<span class="o">)</span> 
3--七声雅乐/Heptatonic Yayue
4--七声清乐/Heptatonic Qingyue
5--七声燕乐/Heptatonic Yanyue
</code></pre></div></div>

<p>原文可找到模板，至于调式类型识别，根据每种调式对应的音阶，获得由0和1组成的模板，并使用与TongGong体系识别类似的计算方法获得结果。</p>

<p>原理同System的识别。</p>

<p><img src="/images/posts/2026-3-16-MRS数据集及音乐理论/Type.png" alt="image" /></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="s">'Heptatonic Yanyue'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]),</span>
    <span class="s">'Heptatonic Qingyue'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]),</span>
    <span class="s">'Heptatonic Yayue'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]),</span>
    <span class="s">'Hexatonic (Biangong)'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]),</span>
    <span class="s">'Hexatonic (Qingjue)'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]),</span>
    <span class="s">'Pentatonic'</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
</code></pre></div></div>

<h3 id="pattern">Pattern</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Pattern <span class="o">=</span> Tonic+System

调式样式/Mode Pattern：
0--宫/Gong
1--商/Shang
2--角/Jue
3--徵/Zhi
4--羽/Yu
</code></pre></div></div>

<p>通过System和Tonic得到</p>

<p><img src="/images/posts/2026-3-16-MRS数据集及音乐理论/Pattern.png" alt="image" /></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">half_tone_difference</span> <span class="o">=</span> <span class="p">(</span><span class="n">tonic_number</span> <span class="o">-</span> <span class="n">System_number</span><span class="p">)</span> <span class="o">%</span> <span class="mi">12</span><span class="c1"># 计算半音差距
</span>    <span class="k">if</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span><span class="c1"># 根据半音差距判断模式
</span>        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">elif</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">elif</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">4</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">2</span>
    <span class="k">elif</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">7</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">3</span>
    <span class="k">elif</span> <span class="n">half_tone_difference</span> <span class="o">==</span> <span class="mi">9</span><span class="p">:</span>
        <span class="n">pattern_number</span> <span class="o">=</span> <span class="mi">4</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="k">return</span> <span class="s">'无法确定模式Pattern'</span>
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>数组[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1] 与 Pentatonic的对应关系是基于音乐理论中五声音阶的结构得来的。在这个上下文中，数组中的每个数字代表一个特定的音高（或音级）在音阶中是否出现。1 表示该音级在音阶中出现，而 0 则表示不出现。

五声音阶由五个音组成，通常在西方音乐中以C大调五声音阶为例，其音级为 C, D, E, G, A。这个音阶中省略了F和B（在C大调中的第四和第七音级）。在一个完整的十二音级系统中，这可以表示为：

C <span class="o">(</span>出现<span class="o">)</span>
C# <span class="o">(</span>不出现<span class="o">)</span>
D <span class="o">(</span>出现<span class="o">)</span>
D# <span class="o">(</span>不出现<span class="o">)</span>
E <span class="o">(</span>出现<span class="o">)</span>
F <span class="o">(</span>不出现<span class="o">)</span>
F# <span class="o">(</span>不出现<span class="o">)</span>
G <span class="o">(</span>出现<span class="o">)</span>
G# <span class="o">(</span>不出现<span class="o">)</span>
A <span class="o">(</span>出现<span class="o">)</span>
A# <span class="o">(</span>不出现<span class="o">)</span>
B <span class="o">(</span>不出现<span class="o">)</span>
</code></pre></div></div>

<p>最终目的，根据已有数据集训练出五类乐曲（Pattern）的模板，再传入未知乐曲输出对应的Pattern。</p>

<p><strong>计算公式</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Tonic-System=&gt;Pattern

System+Pattern=&gt;Tonic

Tonic-Pattern=&gt;System
</code></pre></div></div>

<h3 id="标签">标签</h3>

<p><img src="/images/posts/2026-3-16-MRS数据集及音乐理论/Label.png" alt="image" /></p>

<h2 id="识别调式策略">识别调式策略</h2>

<p>System同宫系统+Tonic主音=Pattern模式</p>

<p>方法一：用两个单任务直接预测<strong>主音（Tonic）</strong>和<strong>模式（Pattern）</strong></p>

<p>方法二——BaseLine采用：用两个单任务预测<strong>系统（System）</strong>、<strong>主音（Tonic）</strong>，计算出<strong>模式（Pattern）</strong>——公式：System同宫系统+Tonic主音=Pattern模式</p>

<p>方法三：用一个多任务模型识别<strong>系统（System）</strong>、<strong>主音（Tonic）</strong>和<strong>模式（Pattern）</strong></p>

<p><strong>同音系统称为“系统”</strong>，<strong>主音的音调称为“主音”，模式模式称为“模式”，模式类型称为“类型”</strong>（<strong>System/Tonic/Pattern/Type</strong>）。分类时的主要任务是识别<strong>模式（Pattern）和主音（Tonic）</strong>，以系统<strong>（System）</strong>作为辅助项目，然后是类型<strong>（Type）</strong>分类作为次要任务。</p>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="MRS" /><category term="MRS" /><summary type="html"><![CDATA[包含了音乐推荐系统数据集和音乐理论]]></summary></entry><entry><title type="html">Database Notes</title><link href="https://cendok.github.io/2026/03/13/Database-Notes" rel="alternate" type="text/html" title="Database Notes" /><published>2026-03-13T00:00:00+00:00</published><updated>2026-03-13T00:00:00+00:00</updated><id>https://cendok.github.io/2026/03/13/Database%20Notes</id><content type="html" xml:base="https://cendok.github.io/2026/03/13/Database-Notes"><![CDATA[<h2 id="数据库知识点">数据库知识点</h2>

<h3 id="概念">概念</h3>

<p>DB，database，数据库</p>

<p>DBS，database system 数据库系统</p>

<p>DBMS，database system manage system 数据库管理系统</p>

<p>DBA，database administrator 数据库管理员</p>

<h3 id="完整性约束">完整性约束</h3>

<p>Null，空</p>

<p>unique约束，唯一，只能出现一次</p>

<p>check约束</p>

<p>primary key主键约束</p>

<p>foreign key外键约束，需要同主键保持一致</p>

<p>default约束，设置默认值</p>

<h3 id="查询语句">查询语句</h3>

<h4 id="对表格">对表格</h4>

<p>增删改查</p>

<p>增，create</p>

<p>删，delete from</p>

<p>改，update</p>

<h4 id="对数据库">对数据库</h4>

<p>增，add，create</p>

<p>删，drop</p>

<p>改，alter</p>

<h3 id="常用sql命令">常用SQL命令</h3>

<p>select，一次可以给多个变量赋值</p>

<p>select，一次可以输出多个变量</p>

<p>declare，一次定义一个变量</p>

<p>print，一次输出一个变量</p>

<h3 id="四大故障">四大故障</h3>

<p>事务内部的故障</p>

<p>系统故障</p>

<p>介质故障</p>

<p>计算机病毒</p>

<h3 id="事务的特性">事务的特性</h3>

<p>ACID</p>

<p>原子性，Atomicity</p>

<p>一致性，Consistency</p>

<p>隔离性，Isolation</p>

<p>持久性，Durability</p>

<h2 id="数据库sql语句">数据库SQL语句</h2>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">#</span> <span class="k">COUNT</span><span class="p">(),</span> <span class="k">SUM</span><span class="p">(),</span> <span class="k">AVG</span><span class="p">(),</span> <span class="k">MAX</span><span class="p">(),</span> <span class="k">MIN</span><span class="p">()</span>
<span class="k">FROM</span> <span class="o">#</span> <span class="k">JOIN</span> <span class="k">ON</span><span class="p">,</span> <span class="k">LEFT</span> <span class="k">JOIN</span> <span class="k">ON</span><span class="p">,</span> <span class="k">RIGHT</span> <span class="k">JOIN</span> <span class="k">ON</span>
<span class="k">WHERE</span> <span class="o">#</span> <span class="k">IS</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">IN</span><span class="p">,</span> <span class="k">NOT</span> <span class="k">IN</span><span class="p">,</span>
<span class="k">GROUP</span> <span class="k">BY</span>
<span class="k">HAVING</span> <span class="k">ON</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="o">#</span><span class="err">降序</span><span class="k">DESC</span><span class="p">,</span> <span class="err">升序</span><span class="k">ASC</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>desc(descend)降序

asc(ascend )升序
</code></pre></div></div>

<h3 id="例题1怎么查找在一个表内有而在另一个表内没有的数据">例题1怎么查找在一个表内有而在另一个表内没有的数据？</h3>

<p>要查找在一个表内有而在另一个表内没有的数据，可以使用 SQL 的 <code class="language-plaintext highlighter-rouge">LEFT JOIN</code> 和 <code class="language-plaintext highlighter-rouge">IS NULL</code> 语句。假设有两个表 <code class="language-plaintext highlighter-rouge">table1</code> 和 <code class="language-plaintext highlighter-rouge">table2</code>，它们都有一个共同的字段 <code class="language-plaintext highlighter-rouge">id</code>，我们想要找出在 <code class="language-plaintext highlighter-rouge">table1</code> 中有但在 <code class="language-plaintext highlighter-rouge">table2</code> 中没有的数据，可以使用以下 SQL 语句：</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">table1</span><span class="p">.</span><span class="o">*</span>
<span class="k">FROM</span> <span class="n">table1</span>
<span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">table2</span> <span class="k">ON</span> <span class="n">table1</span><span class="p">.</span><span class="n">id</span> <span class="o">=</span> <span class="n">table2</span><span class="p">.</span><span class="n">id</span>
<span class="k">WHERE</span> <span class="n">table2</span><span class="p">.</span><span class="n">id</span> <span class="k">IS</span> <span class="k">NULL</span><span class="p">;</span>

</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">LEFT JOIN</code> 是一种 SQL 连接操作，用于从两个或多个表中返回匹配的行。它会返回左表（第一个表）的所有行，以及右表（第二个表）中与左表匹配的行。如果在右表中没有匹配的行，则结果集中的右表列将包含 NULL 值。</p>

<p>具体来说，<code class="language-plaintext highlighter-rouge">LEFT JOIN</code> 的效果如下：</p>

<ol>
  <li>返回左表中的所有记录，即使右表中没有匹配的记录。</li>
  <li>如果右表中有匹配的记录，那么左表和右表的匹配记录将被合并在一起。</li>
  <li>如果右表中没有匹配的记录，那么左表的记录将与右表的所有列一起显示，其中右表的列值为 NULL。</li>
</ol>

<h3 id="例题2升序降序怎么表示">例题2升序，降序怎么表示？</h3>

<p>在SQL中，升序和降序可以通过关键字<code class="language-plaintext highlighter-rouge">ASC</code>（升序）和<code class="language-plaintext highlighter-rouge">DESC</code>（降序）来表示。这些关键字通常与<code class="language-plaintext highlighter-rouge">ORDER BY</code>子句一起使用，用于对查询结果进行排序。</p>

<p>例如，假设我们有一个名为<code class="language-plaintext highlighter-rouge">employees</code>的表，其中包含员工的信息，包括姓名（name）和工资（salary）。如果我们想要按照工资从低到高的顺序显示所有员工的信息，可以使用以下查询：</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">employees</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">salary</span> <span class="k">ASC</span><span class="p">;</span>
</code></pre></div></div>

<p>如果我们想要按照工资从高到低的顺序显示所有员工的信息，可以使用以下查询：</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">employees</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">salary</span> <span class="k">DESC</span><span class="p">;</span>
</code></pre></div></div>

<p>此外，还可以同时按多个列进行排序。例如，如果我们想要先按照部门（department）进行升序排序，然后在每个部门内按照工资进行降序排序，可以使用以下查询：</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">employees</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">department</span> <span class="k">ASC</span><span class="p">,</span> <span class="n">salary</span> <span class="k">DESC</span><span class="p">;</span>
</code></pre></div></div>

<p>在这个例子中，首先按照部门进行升序排序，然后在每个部门内部按照工资进行降序排序。</p>

<h3 id="例题3怎么连接两张表">例题3怎么连接两张表？</h3>

<p>在SQL中，可以使用JOIN语句来连接两张表。JOIN语句允许你根据两个或多个表中的共享列将它们组合在一起，以便在一个查询中检索相关数据。</p>

<p>以下是几种常见的JOIN类型：</p>

<ol>
  <li>
    <p>INNER JOIN（内连接）：返回两个表中匹配的行。只有当两个表中的指定列都有匹配值时，才会返回结果。
示例：查询员工和部门的信息，只显示有对应部门的员工。</p>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">employees</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_name</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">departments</span> <span class="k">ON</span> <span class="n">employees</span><span class="p">.</span><span class="n">department_id</span> <span class="o">=</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_id</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
  <li>
    <p>LEFT JOIN（左连接）：返回左表中的所有行，以及右表中与左表匹配的行。如果右表中没有匹配的行，则结果集中的右表列将包含NULL值。
示例：查询所有部门及其对应的员工信息，即使某些部门没有员工。</p>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_name</span><span class="p">,</span> <span class="n">employees</span><span class="p">.</span><span class="n">name</span>
<span class="k">FROM</span> <span class="n">departments</span>
<span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">employees</span> <span class="k">ON</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_id</span> <span class="o">=</span> <span class="n">employees</span><span class="p">.</span><span class="n">department_id</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
  <li>
    <p>RIGHT JOIN（右连接）：返回右表中的所有行，以及左表中与右表匹配的行。如果左表中没有匹配的行，则结果集中的左表列将包含NULL值。
示例：查询所有员工及其对应的部门信息，即使某些员工没有部门。</p>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">employees</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_name</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">RIGHT</span> <span class="k">JOIN</span> <span class="n">departments</span> <span class="k">ON</span> <span class="n">employees</span><span class="p">.</span><span class="n">department_id</span> <span class="o">=</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_id</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
  <li>
    <p>FULL JOIN（全连接）：返回两个表中所有的行。如果没有匹配的行，则结果集中的列将包含NULL值。
示例：查询所有员工及其对应的部门信息，包括没有员工的部门和没有部门的</p>
  </li>
</ol>

<h2 id="限制查询结果sql语句">限制查询结果SQL语句</h2>

<ol>
  <li>HAVING子句：HAVING子句用于对分组后的结果进行过滤。它通常与GROUP BY子句一起使用，用于筛选满足特定条件的分组。</li>
  <li>LIMIT子句：LIMIT子句用于限制查询结果的行数。它可以指定返回的最大行数或者从指定的起始位置开始返回一定数量的行。</li>
  <li>OFFSET子句：OFFSET子句与LIMIT子句一起使用，用于指定从哪一行开始返回结果。例如，LIMIT 10 OFFSET 5表示从第6行开始返回10行结果。</li>
  <li>IN子句：IN子句用于指定一个值列表，查询结果将只包含列中值在这个列表中的行。</li>
  <li>NOT IN子句：NOT IN子句与IN子句相反，用于排除列中值在指定列表中的行。</li>
  <li>EXISTS子句：EXISTS子句用于检查子查询是否至少返回一行数据，如果存在至少一行数据，则整个查询条件为真。</li>
  <li>NOT EXISTS子句：NOT EXISTS子句与EXISTS子句相反，用于检查子查询是否没有返回任何数据，如果没有数据，则整个查询条件为真。</li>
</ol>

<h2 id="分组sql语句">分组SQL语句</h2>

<p>HAVING子句用于对分组后的结果进行过滤，通常与GROUP BY子句一起使用。以下是一些常见的分组语句和例子：</p>

<ol>
  <li>COUNT()函数：计算每个分组中的行数。
示例：查询每个部门的员工数量。
    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">department</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">as</span> <span class="n">employee_count</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">department</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
  <li>
    <p>SUM()函数：计算每个分组中某列的总和。
示例：查询每个部门的总工资。</p>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">department</span><span class="p">,</span> <span class="k">SUM</span><span class="p">(</span><span class="n">salary</span><span class="p">)</span> <span class="k">as</span> <span class="n">total_salary</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">department</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
  <li>AVG()函数：计算每个分组中某列的平均值。
示例：查询每个部门的平均工资。
    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">department</span><span class="p">,</span> <span class="k">AVG</span><span class="p">(</span><span class="n">salary</span><span class="p">)</span> <span class="k">as</span> <span class="n">average_salary</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">department</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
  <li>MIN()函数：返回每个分组中某列的最小值。
示例：查询每个部门的最低工资。
    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">department</span><span class="p">,</span> <span class="k">MIN</span><span class="p">(</span><span class="n">salary</span><span class="p">)</span> <span class="k">as</span> <span class="n">min_salary</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">department</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
  <li>MAX()函数：返回每个分组中某列的最大值。
示例：查询每个部门的最高工资。
    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">department</span><span class="p">,</span> <span class="k">MAX</span><span class="p">(</span><span class="n">salary</span><span class="p">)</span> <span class="k">as</span> <span class="n">max_salary</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">department</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
  <li>
    <p>HAVING子句：在GROUP BY之后使用HAVING子句来进一步筛选满足特定条件的分组。
示例：查询员工数量超过10人的部门。</p>

    <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">department</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">as</span> <span class="n">employee_count</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">department</span>
<span class="k">HAVING</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">10</span><span class="p">;</span>
</code></pre></div>    </div>
  </li>
</ol>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="Notes" /><category term="Notes" /><summary type="html"><![CDATA[数据库笔记]]></summary></entry><entry><title type="html">CNN Notes</title><link href="https://cendok.github.io/2023/10/27/CNN-Notes" rel="alternate" type="text/html" title="CNN Notes" /><published>2023-10-27T00:00:00+00:00</published><updated>2023-10-27T00:00:00+00:00</updated><id>https://cendok.github.io/2023/10/27/CNN%20Notes</id><content type="html" xml:base="https://cendok.github.io/2023/10/27/CNN-Notes"><![CDATA[<h1 id="cnn卷积神经网络">CNN卷积神经网络</h1>

<p><a href="https://www.bilibili.com/video/BV1zF411V7xu?p=7&amp;spm_id_from=pageDriver&amp;vd_source=d7da92b687e2947a28aa75d0b809d2cd">7-特征图尺寸计算与参数共享_哔哩哔哩_bilibili</a></p>

<p><img src="/images/posts/2023-10-27-CNN Notes/2023-10-27-CNN.png" alt="CNN" /></p>

<h2 id="传统神经网络和cnn卷积神经网络的区别">传统神经网络和CNN卷积神经网络的区别：</h2>

<p>传统神经网络是一维的，只有向量。</p>

<p>CNN卷积神经网络是三维的。height、width、depth</p>

<p><strong>我的理解：</strong></p>

<p>把一堆木浆倒入水池（输入层），再用棍子搅动（卷积层），然后慢慢沉淀（池化层），排出水后，把沉淀物挤压成型（全连接层）</p>

<h2 id="卷积">卷积</h2>

<p>卷积的含义就是特征提取。</p>

<p>卷积做的事情就是把图片划分成多个小区域，计算每一块的特征值。</p>

<p>图像的颜色通道RGB，32x32x3</p>

<p>计算特征值的时候是每个通道分离开，分别计算特征值，然后再融合melt在一块，相加1+2+3=6。</p>

<p>总的是7x7x3，分成RGB，每个分别为7x7x1</p>

<p>卷积核大小决定了从多大的区域内计算出来一个特征值。</p>

<p>通过filter隐藏参数得到一个值。</p>

<p>特征值计算方法，用merge。</p>

<h3 id="卷积核内数值怎么设置的">卷积核内数值怎么设置的？</h3>

<p>一个通道，对应位置相乘再求和，做了一次内积。</p>

<p>再把RGB三个通道数值加起来，得到最终的一张特征图。</p>

<p>设置不同的卷积核得到不同的特征图。得到丰富的特征。</p>

<h3 id="注意">注意：</h3>

<p>计算同一层的特征图的时候，卷积核可以选择不同的值，但是维度必须相同，比如R层用的都是3x3的。</p>

<p>不同层直接卷积核可以选择不同的，比如R层用3x3，G层用4x4。</p>

<p>output volume中3x3x2的2表示的含义是由两张特征图叠加而来的。</p>

<p>卷积不是只做一次，但是不是对同一张图做卷积，而是对上一次卷积得到的特征图进行卷积。就像拼图。不断弱化非特征部分的特征值，突出边缘轮廓。</p>

<h2 id="涉及到的参数">涉及到的参数：</h2>

<h3 id="步长">步长：</h3>

<p>步长小细密度的提取特征，详细但是效率低。</p>

<p>步长大粗密度提取。</p>

<h3 id="卷积核尺寸">卷积核尺寸</h3>

<p>卷积核尺寸3x3常见，越小细密度，越大粗密度</p>

<h3 id="边缘填充">边缘填充</h3>

<p>边缘填充——图像灰色部分全填充了0</p>

<p>为了解决，在卷积核移动过程中有些点比如边界上的点天生被计算的次数少，所以在边缘填充了一圈0，让边缘的点不那么边缘，提高利用和计算的次数，弥补信息确实的问题（+pad 1）。填0不会产生其他的影响。</p>

<p>两个文本，一个100词，另一个120词，需要把100词的文本用0填充到120词。也是边缘填充的思想。</p>

<h3 id="卷积核个数">卷积核个数：</h3>

<p>卷积核个数决定最终算的过程中要得到多少个特征图，n个卷积核得到n个特征图。</p>

<p><strong>自己的理解：</strong></p>

<p>output特征图维度 = （input图像维度 - 卷积核维度 + 边缘填充层数*2）/卷积核步长 + 1</p>

<p>如下图：3 = （5 - 3+1*2）/2 + 1</p>

<p>输入，填充，核大小，步长</p>

<h3 id="卷积参数共享">卷积参数共享</h3>

<p><strong>条件：</strong>假设原图为32x32x3，卷积核为5x5x3，步长为1，边缘填充2圈</p>

<p><strong>CNN：</strong>其实就是RGB中每一层一张图中用同一个卷积核。这样<strong>比传统神经网络</strong>需要的<strong>权重参数少很多</strong>，传统的神经网络每一个区域用的是不同的卷积核，如左边的图</p>

<p>5x5x3x10+10=760</p>

<p>5x5的卷积核，3层，10个卷积核，10参数b</p>

<p><strong>传统的：</strong>原本是每移动一次换一个卷积核，需要的参数个数：</p>

<p>(32-5+2x2)/1+1=32（特征图尺寸）</p>

<p>3x32x32x5x5=51200</p>

<p>3层RGB，32x32的特征图尺寸，5x5的卷积核</p>

<h2 id="池化层">池化层：</h2>

<p>压缩作用，<strong>下采样</strong>。pool</p>

<p>提取出来很多特征但是不是所有的特征都是有用的。剔除部分不重要的，选择重要的。</p>

<p>224x224x64-&gt;112x112x64</p>

<p>特定值数量缩减了四分之一。</p>

<p>只能缩减，不能修改特征图的个数</p>

<p><strong>最大池化：</strong>没有任何计算只是进行筛选，提取出最大值。1，1，5，6-&gt;6</p>

<p><strong>平均池化：</strong>把每块区域的特征值求平均。1，1，5，6-&gt;3（缺点：用的很少，<strong>丢失了最大特征值</strong>）</p>

<h2 id="判断卷积神经网络层数">判断卷积神经网络层数：</h2>

<p>带参数的才能算作一层。</p>

<p>conv卷积层带参数，relu激活层不带参数，池化层也没有参数，FC也需要参数。所以下图的卷积神经网络，有6+1=7层，6层卷积层，1层FC</p>

<p>每一个relu激活层都有一个conv卷积层，成为一个组合。</p>

<p>两次卷积一次池化，提取、压缩，提取、压缩。</p>

<p><strong>怎么把特征值做成5分类的?</strong>（car、track、airplane、ship、horse）</p>

<p>通过FC，</p>

<p>FC[,5]</p>

<p>代表前面提取出来的特征，但是不能连接三维的，32x32x10，需要把这个特征图拉成一个特征向量连接，到全连接层</p>

<p>5代表5分类</p>

<p>因为是4维的，所以需要加上一个参数b，batch，为10</p>

<h2 id="感受野">感受野：</h2>

<p>后面的特征值能回溯到是由什么计算来的。感受到原始数据的大小。<strong>回溯到原始尺寸。</strong></p>

<h2 id="数据增强">数据增强：</h2>

<p>图像数据不够：将图像进行<strong>镜像翻转</strong>一张变成两张。数据量很重要，用大量的数据往里面堆。</p>

<p><strong>图片角度旋转</strong></p>

<p><strong>放大缩小</strong></p>

<p><strong>放大缩小同时镜像翻转</strong></p>

<p>关键是：使得图像像素点，特征点矩阵改变了就好。</p>

<p>重新调整图片输入大小，像素，因为<strong>VGG和Resnet</strong>等神经网络要求输入图片大小要24x24的，但是提供的图片是不规则的可能是1024x1024，或者256x256的</p>

<p>torchvision中的三大核心模块transform、datasets、models</p>

<p>transform模块用于数据预处理</p>

<h1 id="网络解读">网络解读</h1>

<p>P14，重新看，讲解一个简单的神经网络代码</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nn.Linear(32*7*7,10)

(w,b)两个维度，b是最后几分类，如果是分成10类，比如有10种车辆类型。w是全连接层权重参数的个数，
</code></pre></div></div>

<p>计算[w,b]中w的值</p>

<p>输入大小（1，28，28）1代表1RGB中的一层，28代表图原始尺寸</p>

<p>out_channels = 16，kernel_size = 5卷积核大小为5，stride = 1步长为1，padding = 2边缘填充2圈</p>

<p>计算（28-5+2*2）/1+1 = 28</p>

<h2 id="卷积层1">卷积层1</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">conv1</span><span class="err">（</span><span class="mi">16</span><span class="err">，</span><span class="mi">28</span><span class="err">，</span><span class="mi">28</span><span class="err">）</span>
</code></pre></div></div>

<p>16代表有16个不同的卷积核，输出16个特征图，28代表第一次卷积得到特征图的尺寸</p>

<h2 id="最大池化层1">最大池化层1</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">relu</span><span class="err">（</span><span class="mi">14</span><span class="err">，</span><span class="mi">14</span><span class="err">，</span><span class="mi">16</span><span class="err">）</span>

<span class="n">nn</span><span class="p">.</span><span class="n">MaxPool1d</span><span class="err">（</span><span class="n">kernel_size</span> <span class="o">=</span> <span class="mi">2</span><span class="err">）</span>

<span class="n">代表每个维度缩减一半</span>
</code></pre></div></div>

<h2 id="卷积层2">卷积层2</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Conv2d</span><span class="err">（</span><span class="mi">16</span><span class="err">，</span><span class="mi">32</span><span class="err">，</span><span class="mi">5</span><span class="err">，</span><span class="mi">1</span><span class="err">，</span><span class="mi">2</span><span class="err">）</span><span class="c1">#分别表示in_channels = 16，out_channels = 32，kernel_size = 5卷积核大小为5，stride = 1步长为1，padding = 2边缘填充2圈
</span>
<span class="n">conv2</span><span class="err">（</span><span class="mi">14</span><span class="err">，</span><span class="mi">14</span><span class="err">，</span><span class="mi">32</span><span class="err">）</span>
</code></pre></div></div>

<p>计算（14-5+2*2）/1+1 = 14</p>

<h2 id="最大池化层2">最大池化层2</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">nn</span><span class="p">.</span><span class="n">MaxPool2d</span><span class="err">（</span><span class="mi">2</span><span class="err">）</span><span class="c1">#省略了kernel_size = 2，直接写2
</span>
<span class="n">relu</span><span class="err">（</span><span class="mi">7</span><span class="err">，</span><span class="mi">7</span><span class="err">，</span><span class="mi">32</span><span class="err">）</span>
</code></pre></div></div>

<h2 id="全连接层">全连接层</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">7</span><span class="n">x7x32</span>

<span class="p">[</span><span class="n">w</span><span class="p">,</span><span class="n">b</span><span class="p">]</span><span class="o">-&gt;</span><span class="p">[</span><span class="mi">1568</span><span class="p">,</span><span class="mi">10</span><span class="p">]</span>
</code></pre></div></div>

<h1 id="经典网络-alexnet">经典网络-Alexnet</h1>

<p>神经网络只有8层</p>

<p>11x11 filters卷积核尺寸，目前最多的是3x3的</p>

<p>stride 4 步长为4</p>

<p>pad 0，边缘填充0圈</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">AlexNet</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">AlexNet</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">conv2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="mi">192</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">conv3</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">192</span><span class="p">,</span> <span class="mi">384</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">conv4</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">384</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">conv5</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">256</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        
        <span class="bp">self</span><span class="p">.</span><span class="n">fc1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">256</span> <span class="o">*</span> <span class="mi">6</span> <span class="o">*</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">4096</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">fc2</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">4096</span><span class="p">,</span> <span class="mi">4096</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">fc3</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">4096</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span>  <span class="c1"># 1000 classes for ImageNet
</span>
    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">max_pool2d</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv2</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">max_pool2d</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv3</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv4</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">conv5</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">max_pool2d</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        
        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">fc1</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">training</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">training</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">F</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">fc2</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">fc3</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>

<h1 id="经典网络-vgg">经典网络-VGG</h1>

<p>所有卷积核大小都是3x3的，都是细密度提取的</p>

<p>神经网络有16、19层</p>

<p>使用maxpool，每次池化后损失了特征信息，怎么弥补回来呢？每次卷积之前使得上一次的特征图翻倍乘2，再进行下一次卷积。</p>

<p>层数越多效果越好吗？</p>

<p>发现16层时比30层效果好，不一定每一次卷积效果都好，如果出现了效果差的一次，把差的特征继续卷积，效果反而不如意。</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">VGG</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">features</span><span class="p">,</span> <span class="n">num_classes</span><span class="o">=</span><span class="mi">1000</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">VGG</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">features</span> <span class="o">=</span> <span class="n">features</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">classifier</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span> <span class="o">*</span> <span class="mi">7</span> <span class="o">*</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">4096</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(</span><span class="bp">True</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Dropout</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">4096</span><span class="p">,</span> <span class="mi">4096</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(</span><span class="bp">True</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Dropout</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">4096</span><span class="p">,</span> <span class="n">num_classes</span><span class="p">),</span>
        <span class="p">)</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">features</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">classifier</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>

<h1 id="经典网络-resnet残差">经典网络-Resnet，残差：</h1>

<p><strong>从20层加到56层，其中肯定有训练的不好的一层，导致把成绩拉下来了。</strong></p>

<p>怎么把好的特征堆叠起来，但是不能受差的影响到作用。</p>

<p>怎么识别出卷积的不好的一层？</p>

<p>提出了，同等映射的方法。卷积层加进来就不能删除了，识别出不好的卷积之后给它的权重参数（就是提取出来的特征图中的特征值）设置成0，加进来但是不使用它。</p>

<p>具体实现：20层后的某一次，再进行两次卷积，原封不动的拿过来，做一个加法，堆叠。</p>

<p>会出现很多次白白跑，但是至少有所提升，不会比原来的效果差。</p>

<p>做科研，竞赛首选Resnet网络，深层网络</p>

<p>Resnet当作特征提取，不建议当作分类网络，因为一个问题是分类还是回归决定了损失函数和最后层（全连接层）是怎么连的。可以用到各种物体检测，物体追踪，分类，检索，识别，什么任务都能用，<strong>通用的神经网络</strong>。</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">ResNet</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">block</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">num_classes</span><span class="o">=</span><span class="mi">1000</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">ResNet</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">in_channels</span> <span class="o">=</span> <span class="mi">64</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">bn1</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="mi">64</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">relu</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">maxpool</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">MaxPool2d</span><span class="p">(</span><span class="n">kernel_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        
        <span class="bp">self</span><span class="p">.</span><span class="n">layer1</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span> <span class="n">layers</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer2</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="n">layers</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer3</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="n">layers</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layer4</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">_make_layer</span><span class="p">(</span><span class="n">block</span><span class="p">,</span> <span class="mi">512</span><span class="p">,</span> <span class="n">layers</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>

        <span class="bp">self</span><span class="p">.</span><span class="n">avgpool</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">AdaptiveAvgPool2d</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">fc</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span><span class="p">,</span> <span class="n">num_classes</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">_make_layer</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">block</span><span class="p">,</span> <span class="n">out_channels</span><span class="p">,</span> <span class="n">blocks</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
        <span class="n">downsample</span> <span class="o">=</span> <span class="bp">None</span>
        <span class="k">if</span> <span class="n">stride</span> <span class="o">!=</span> <span class="mi">1</span> <span class="ow">or</span> <span class="bp">self</span><span class="p">.</span><span class="n">in_channels</span> <span class="o">!=</span> <span class="n">out_channels</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span><span class="p">:</span>
            <span class="n">downsample</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
                <span class="n">nn</span><span class="p">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">in_channels</span><span class="p">,</span> <span class="n">out_channels</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="n">stride</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">),</span>
                <span class="n">nn</span><span class="p">.</span><span class="n">BatchNorm2d</span><span class="p">(</span><span class="n">out_channels</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span><span class="p">),</span>
            <span class="p">)</span>

        <span class="n">layers</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="n">layers</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">block</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">in_channels</span><span class="p">,</span> <span class="n">out_channels</span><span class="p">,</span> <span class="n">stride</span><span class="p">,</span> <span class="n">downsample</span><span class="p">))</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">in_channels</span> <span class="o">=</span> <span class="n">out_channels</span> <span class="o">*</span> <span class="n">block</span><span class="p">.</span><span class="n">expansion</span>
        <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">blocks</span><span class="p">):</span>
            <span class="n">layers</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">block</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">in_channels</span><span class="p">,</span> <span class="n">out_channels</span><span class="p">))</span>

        <span class="k">return</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span><span class="o">*</span><span class="n">layers</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">conv1</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">bn1</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">maxpool</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>

        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer1</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer2</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer3</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layer4</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>

        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">avgpool</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">fc</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">x</span>
</code></pre></div></div>

<h1 id="较为复杂的图片识别神经网络鸢尾花集">较为复杂的图片识别神经网络——鸢尾花集：</h1>

<p>torchvision需要自己另外安装，里面含有很多提前写好的代码块比如，resnet模型，VGG模型，Alexnet模型。</p>

<p>torchvision中的三大核心模块transform、datasets、models</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#通过此命令安装
</span><span class="n">pip</span> <span class="n">install</span> <span class="n">torchvision</span>
<span class="c1">#就可以使用torchvision中的三大核心模块了
</span></code></pre></div></div>

<p>比如：用torchvision中的datasets，直接复制代码块就好。写好了API，直接调用就好。</p>

<p><a href="https://pytorch.org/vision/stable/datasets.html">Datasets — Torchvision 0.16 documentation (pytorch.org)</a></p>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="Notes" /><category term="Notes" /><summary type="html"><![CDATA[CNN卷积神经网络]]></summary></entry><entry><title type="html">Pandas Notes</title><link href="https://cendok.github.io/2023/09/04/Pandas-Notes" rel="alternate" type="text/html" title="Pandas Notes" /><published>2023-09-04T00:00:00+00:00</published><updated>2023-09-04T00:00:00+00:00</updated><id>https://cendok.github.io/2023/09/04/Pandas%20Notes</id><content type="html" xml:base="https://cendok.github.io/2023/09/04/Pandas-Notes"><![CDATA[<h1 id="pandas数据分析">Pandas数据分析</h1>

<h2 id="前言">前言：</h2>

<p>一个开源的python类库，数据分析、数据处理、数据可视化</p>

<p>虽然用python本身可以实现，但是用Pandas可以更加高性能的实现。</p>

<p>比自己写for循环快很多，可以跟其他类库一块使用，numpy数学计算，scikit_learn机器学习。很好的配合完成数据分析和机器学习。</p>

<p>anaconda中已经安装好了，几乎所有机器学习所需要的类库。也可以解决环境问题。</p>

<p>jupyter交互性，探索性，适合反复回头修改，看每一步运行的结果</p>

<p>pycharm大而全的集成开发环境，适合复杂项目的开发</p>

<h2 id="读取数据将其他类型的文件读取成pandas数据结构">读取数据——将其他类型的文件读取成pandas数据结构</h2>

<p>读取表格类型的，二维的有行有列的读取。读取成pandas的对象</p>

<p>csv逗号分割，tsv用\t分割，txt文本分隔符随意</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>

<span class="n">pd</span><span class="p">.</span><span class="n">read_txt</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>

<span class="n">a</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_excel</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>

<span class="n">a</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_sql</span><span class="p">(</span><span class="s">"select * from 表名"</span><span class="p">,</span><span class="n">con</span><span class="o">=</span><span class="n">conn</span><span class="p">)</span><span class="c1">#数据库的连接，connection
</span></code></pre></div></div>

<p>pymysql.connect类库</p>

<h3 id="csv文件逗号分割">csv文件逗号分割</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">a</span><span class="p">.</span><span class="n">head</span><span class="p">()</span><span class="c1">#查看数据前几行
</span>
<span class="n">a</span><span class="p">.</span><span class="n">shape</span><span class="c1">#查看数据的形状、返回行数、列数
</span>
<span class="n">a</span><span class="p">.</span><span class="n">columns</span><span class="c1">#查看列名列表
</span>
<span class="n">a</span><span class="p">.</span><span class="n">index</span><span class="c1">#查看索引列
</span></code></pre></div></div>

<h2 id="pandas数据结构">Pandas数据结构</h2>

<p>——DataFrame、Series</p>

<p>为什么要区分出来一个Series，因为一维的可以抽象成一个字典，处理起来比二维的dataframe更加便捷。</p>

<p>DataFrame、Series二者均有索引。</p>

<h3 id="series的生成">Series的生成：</h3>

<h4 id="1通过转换列表得到">1通过转换列表得到</h4>

<p>创建以数字为索引的Series（默认）</p>

<p>创建自定义索引的Series</p>

<h4 id="2通过字典生成">2通过字典生成</h4>

<h3 id="series的读取">Series的读取：</h3>

<h2 id="pandas查询数据df为打开csv文件后创建的对象">Pandas查询数据——df为打开.csv文件后创建的对象</h2>

<h2 id="loc和iloc的区别是什么">.loc和.iloc的区别是什么？</h2>

<p>.loc和.iloc都是pandas工具中定位某一行的函数，其中loc是location的意思，而iloc中的 i 指的是Integer。二者的区别如下：</p>

<ul>
  <li>loc：通过行标签名称（tianqi）索引行数据。</li>
  <li>iloc：通过行号（0，1，2，3）索引行数据。</li>
</ul>

<p><strong>查询的时候会出现数据降维的情况：</strong></p>

<p>查询的是dataFrame返回的是Series的数据，查询的Series返回的是具体的数值，查询值返回的就是值了</p>

<p>索引为默认的从0开始计算的</p>

<p>数据：beijing_tianqi_2018.csv</p>

<p>把日期当作普通的字符串来处理</p>

<h3 id="loc单标签查询">.loc单标签查询</h3>

<h3 id="数值区间范围查询loc行列">数值区间范围查询：.loc[行，列]</h3>

<p>传入行的区间和某一列</p>

<p>传入列的区间和某一行</p>

<p>传入行和列的区间</p>

<h3 id="条件表达式查询类似数据的查询语句编写查询语句loc行列">条件表达式查询，类似数据的查询语句，编写查询语句：.loc[行，列]</h3>

<p>.loc[表达式1（行），表达式2（列）]</p>

<p>返回条件表达式中返回结果为True的数值。</p>

<p><strong>自己编写函数，并且调用。自己定查询规则。</strong></p>]]></content><author><name>Cendok</name><email>Syi19691@gmail.com</email></author><category term="Notes" /><category term="Notes" /><summary type="html"><![CDATA[一个开源的python类库]]></summary></entry></feed>