<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Multimodal Alignment | Tong Lu&#39;s Homepage</title>
    <link>https://lutong.space/tags/multimodal-alignment/</link>
      <atom:link href="https://lutong.space/tags/multimodal-alignment/index.xml" rel="self" type="application/rss+xml" />
    <description>Multimodal Alignment</description>
    <generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Thu, 01 Jan 2026 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://lutong.space/media/icon_hu_702a800cd775dbac.png</url>
      <title>Multimodal Alignment</title>
      <link>https://lutong.space/tags/multimodal-alignment/</link>
    </image>
    
    <item>
      <title>SciMKG: A Multimodal Knowledge Graph for Science Education with Text, Image, Video and Audio</title>
      <link>https://lutong.space/publications/aaai2026-scimkg/</link>
      <pubDate>Thu, 01 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://lutong.space/publications/aaai2026-scimkg/</guid>
      <description></description>
    </item>
    
    <item>
      <title>SciMKG  A Multimodal Knowledge Graph for Science Education with Text, Image, Video and Audio</title>
      <link>https://lutong.space/post/aaai2026-scimkg/</link>
      <pubDate>Mon, 15 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://lutong.space/post/aaai2026-scimkg/</guid>
      <description>&lt;h2 id=&#34;-introduction&#34;&gt;🚀 Introduction&lt;/h2&gt;
&lt;p&gt;SciMKG is a large-scale multimodal educational knowledge graph (MEKG) covering text, images, videos, and audio for K-12 science education. It is automatically constructed using a novel LLM-powered pipeline for concept extraction and multimodal alignment.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Four modalities covered: text, image, video, audio&lt;/li&gt;
&lt;li&gt;1,356 knowledge points&lt;/li&gt;
&lt;li&gt;34,630 multimodal concepts&lt;/li&gt;
&lt;li&gt;403,400 triples&lt;/li&gt;
&lt;li&gt;10,527 images · 10,425 videos · 34,630 audios&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-framework&#34;&gt;🔥 Framework&lt;/h2&gt;
&lt;p align=&#34;center&#34;&gt;
  &lt;img src=&#34;scimkgFramework.png&#34; alt=&#34;SciMKG Framework&#34; width=&#34;800&#34; /&gt;
&lt;/p&gt;
SciMKG is built using an Extraction–Verification–Integration–Augmentation (EVIA) pipeline:
&lt;ul&gt;
&lt;li&gt;Extraction
Use multiple LLMs to extract K–12 science concepts from MOOC subtitles.&lt;/li&gt;
&lt;li&gt;Verification
Apply self-feedback (SELF-REFINE) to prune ambiguous or irrelevant concepts.&lt;/li&gt;
&lt;li&gt;Integration
Use self-consistency voting to merge multiple LLM outputs.&lt;/li&gt;
&lt;li&gt;Augmentation
Expand concepts through ConceptNet &amp;amp; Wikipedia; generate rewritten text and audio.&lt;/li&gt;
&lt;li&gt;Multimodal Alignment
Align images, videos, and audio to concepts using multimodal LLMs (e.g., GPT-4o, Gemini).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This pipeline ensures robustness, high precision, and semantic consistency across modalities.&lt;/p&gt;
&lt;h2 id=&#34;-installation--usage&#34;&gt;📦 Installation &amp;amp; Usage&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Installation&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install scimkg
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Usage&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt;  &lt;span class=&#34;nn&#34;&gt;scimkg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;kg&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;scimkg&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;video_path,pdf_path&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;triples&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;kg&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;build&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;subject&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;rdf&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;triples&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rdf&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;-dataset-statistics&#34;&gt;📊 Dataset Statistics&lt;/h2&gt;
&lt;table&gt;
  &lt;tr&gt;
    &lt;th align=&#34;center&#34;&gt;Discipline&lt;/th&gt;
    &lt;th align=&#34;center&#34;&gt;Knowledge Points&lt;/th&gt;
    &lt;th align=&#34;center&#34;&gt;Concepts&lt;/th&gt;
    &lt;th align=&#34;center&#34;&gt;Exercises&lt;/th&gt;
    &lt;th align=&#34;center&#34;&gt;Triples&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align=&#34;center&#34;&gt;Biology&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;526&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;16,839&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;255&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;191,928&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align=&#34;center&#34;&gt;Physics&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;521&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;11,015&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;288&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;145,666&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align=&#34;center&#34;&gt;Chemistry&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;309&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;6,776&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;220&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;65,806&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;table&gt;
  &lt;tr&gt;
    &lt;th align=&#34;center&#34;&gt;Modality&lt;/th&gt;
    &lt;th align=&#34;center&#34;&gt;Items&lt;/th&gt;
    &lt;th align=&#34;center&#34;&gt;Concept Coverage&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align=&#34;center&#34;&gt;Image&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;10,527&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;39%&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align=&#34;center&#34;&gt;Video&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;10,425&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;80%&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align=&#34;center&#34;&gt;Audio&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;34,630&lt;/td&gt;
    &lt;td align=&#34;center&#34;&gt;100%&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;h2 id=&#34;-applications&#34;&gt;🧠 Applications&lt;/h2&gt;
&lt;p&gt;SciMKG enables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multimodal educational question answering&lt;/li&gt;
&lt;li&gt;Multimodal question generation&lt;/li&gt;
&lt;li&gt;Cross-modal knowledge retrieval&lt;/li&gt;
&lt;li&gt;Intelligent tutoring systems&lt;/li&gt;
&lt;li&gt;Science education agents&lt;/li&gt;
&lt;li&gt;Curriculum-level analytics&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-citation&#34;&gt;📄 Citation&lt;/h2&gt;
&lt;p&gt;If you use SciMKG or our construction framework, please cite:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bibtex&#34; data-lang=&#34;bibtex&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;@article&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nl&#34;&gt;SciMKG2026&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{SciMKG: A Multimodal Knowledge Graph for Science Education with Text, Image, Video and Audio}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;author&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{Tong Lu, Zhichun Wang, Yaoyu Zhou, Yiming Guan, Zhiyong Bai, Junsheng Du}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;year&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{2026}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;journal&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{AAAI}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
  </channel>
</rss>
