<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	xmlns:georss="http://www.georss.org/georss"
	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
	
	>
<channel>
	<title>
	Comments on: LLM.int8() and Emergent Features	</title>
	<atom:link href="https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/feed/" rel="self" type="application/rss+xml" />
	<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/</link>
	<description>Making deep learning accessible.</description>
	<lastBuildDate>Tue, 06 Sep 2022 16:54:14 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.0.11</generator>
	<item>
		<title>
		By: Vivek Kumar		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111821</link>

		<dc:creator><![CDATA[Vivek Kumar]]></dc:creator>
		<pubDate>Tue, 06 Sep 2022 16:54:14 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-111821</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111811&quot;&gt;Tim Dettmers&lt;/a&gt;.

Thank you Tim!
We are looking at scattered sparsity to boost inference performance. And, it does come with an additional penalty and anything above 20% sparsity is a bonus.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111811">Tim Dettmers</a>.</p>
<p>Thank you Tim!<br />
We are looking at scattered sparsity to boost inference performance. And, it does come with an additional penalty and anything above 20% sparsity is a bonus.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Tim Dettmers		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111814</link>

		<dc:creator><![CDATA[Tim Dettmers]]></dc:creator>
		<pubDate>Tue, 06 Sep 2022 14:35:13 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-111814</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111484&quot;&gt;awfidius&lt;/a&gt;.

Thanks for pointing out the error!]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111484">awfidius</a>.</p>
<p>Thanks for pointing out the error!</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Tim Dettmers		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111813</link>

		<dc:creator><![CDATA[Tim Dettmers]]></dc:creator>
		<pubDate>Tue, 06 Sep 2022 14:31:58 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-111813</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110730&quot;&gt;Samuel&lt;/a&gt;.

Thanks for pointing this out!]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110730">Samuel</a>.</p>
<p>Thanks for pointing this out!</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Tim Dettmers		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111812</link>

		<dc:creator><![CDATA[Tim Dettmers]]></dc:creator>
		<pubDate>Tue, 06 Sep 2022 14:31:35 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-111812</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110699&quot;&gt;sva&lt;/a&gt;.

Thank you for noticing — fixed!]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110699">sva</a>.</p>
<p>Thank you for noticing — fixed!</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Tim Dettmers		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111811</link>

		<dc:creator><![CDATA[Tim Dettmers]]></dc:creator>
		<pubDate>Tue, 06 Sep 2022 14:30:57 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-111811</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110744&quot;&gt;Vivek Kumar&lt;/a&gt;.

The sparsity is in the multiplication of the hidden state, so we can only prune dynamically with each example. Theoretically, that is possible, but its difficult to do this efficiently since it is a scattered pattern that is not well accelerated on hardware. However, the sparsity can be very high (99%) but some samples are relatively dense (40% sparsity).]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110744">Vivek Kumar</a>.</p>
<p>The sparsity is in the multiplication of the hidden state, so we can only prune dynamically with each example. Theoretically, that is possible, but its difficult to do this efficiently since it is a scattered pattern that is not well accelerated on hardware. However, the sparsity can be very high (99%) but some samples are relatively dense (40% sparsity).</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Tim Dettmers		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111810</link>

		<dc:creator><![CDATA[Tim Dettmers]]></dc:creator>
		<pubDate>Tue, 06 Sep 2022 14:28:33 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-111810</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110661&quot;&gt;Ben Harper&lt;/a&gt;.

Thank you :)]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110661">Ben Harper</a>.</p>
<p>Thank you 🙂</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Tim Dettmers		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111809</link>

		<dc:creator><![CDATA[Tim Dettmers]]></dc:creator>
		<pubDate>Tue, 06 Sep 2022 14:28:21 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-111809</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110660&quot;&gt;Jonathan Kummerfeld&lt;/a&gt;.

Yes, it seems that is possible. In a current collaboration we alter the transformer architecture and see faster training. However, we have not analyzed the dynamics of the outliers.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110660">Jonathan Kummerfeld</a>.</p>
<p>Yes, it seems that is possible. In a current collaboration we alter the transformer architecture and see faster training. However, we have not analyzed the dynamics of the outliers.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: awfidius		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-111484</link>

		<dc:creator><![CDATA[awfidius]]></dc:creator>
		<pubDate>Wed, 31 Aug 2022 10:57:19 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-111484</guid>

					<description><![CDATA[Couple of buglets in the (very nice) I5-&#062;I3 example. 

Let’s do an example. Let’s say we have the vector [3, 1, 2, 3] in I5, and we want to quantize to I3.

Here the step-by-step recipe for quantization:

We find the absolute maximum value of the vector: [3, 1, 2, 3] -&#062; 3
Then we divide by that value: [3, 1, 2, 3] -&#062; [1, 0.33, 0.66, 1.0]
And now we multiple by the range of the target data type I3, which is 4: [1, 0.33, 0.66, 1.0] -&#062; [4.0, 1.33, 2.66, 4.0]
Now we round to the nearest value: [4.0, 1.33, 2.66, 4.0] -&#062; [4, 2**, 2, 4]
We now converted [3, 1, 2, 3**] in I5 to [4, 2**, 2, 4] in I3. To dequantize, we reverse this process.

Divide by 4: [4, 2*, 2, 4] -&#062; [1.0, 0.5, 0.5, 1.0]
Multiply by the absolute maximum: [1.0, 0.5, 0.5, 1.0] -&#062; [3.0, 1.5, 1.5, 3.0]
Now we round again: [3.0, 0.0, 1.5, 3.0] -&#062; [3, 2, 2, 3]
We see that our dequantization and quantization led to *an* error:
[3, 1, 2, 3] to [3, 2, 2, 3]
The second element changed from 1 to 2. This is a quantization error that leads to the loss of information in terms of how precise the information is encoded. If we have such errors and propagate them through many layers of a neural network, they accumulate, and they may change the result of a prediction and degrade the prediction quality.]]></description>
			<content:encoded><![CDATA[<p>Couple of buglets in the (very nice) I5-&gt;I3 example. </p>
<p>Let’s do an example. Let’s say we have the vector [3, 1, 2, 3] in I5, and we want to quantize to I3.</p>
<p>Here the step-by-step recipe for quantization:</p>
<p>We find the absolute maximum value of the vector: [3, 1, 2, 3] -&gt; 3<br />
Then we divide by that value: [3, 1, 2, 3] -&gt; [1, 0.33, 0.66, 1.0]<br />
And now we multiple by the range of the target data type I3, which is 4: [1, 0.33, 0.66, 1.0] -&gt; [4.0, 1.33, 2.66, 4.0]<br />
Now we round to the nearest value: [4.0, 1.33, 2.66, 4.0] -&gt; [4, 2**, 2, 4]<br />
We now converted [3, 1, 2, 3**] in I5 to [4, 2**, 2, 4] in I3. To dequantize, we reverse this process.</p>
<p>Divide by 4: [4, 2*, 2, 4] -&gt; [1.0, 0.5, 0.5, 1.0]<br />
Multiply by the absolute maximum: [1.0, 0.5, 0.5, 1.0] -&gt; [3.0, 1.5, 1.5, 3.0]<br />
Now we round again: [3.0, 0.0, 1.5, 3.0] -&gt; [3, 2, 2, 3]<br />
We see that our dequantization and quantization led to *an* error:<br />
[3, 1, 2, 3] to [3, 2, 2, 3]<br />
The second element changed from 1 to 2. This is a quantization error that leads to the loss of information in terms of how precise the information is encoded. If we have such errors and propagate them through many layers of a neural network, they accumulate, and they may change the result of a prediction and degrade the prediction quality.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Vivek Kumar		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110744</link>

		<dc:creator><![CDATA[Vivek Kumar]]></dc:creator>
		<pubDate>Fri, 19 Aug 2022 19:22:08 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-110744</guid>

					<description><![CDATA[Thanks for such a detailed blog post. Very helpful!
I have a question in the section &quot;How Emergent Features Emerge&quot;, #2 topic, &quot;Attention layers become very sparse&quot;.  Is this sparsity &#062; 20%? Could we just use pruning during training to handle the sparsity well? To improve performance during inference, is it good to just use H/W supported sparsity (block or scattered) ? 
Any comment in this regard to improve performance during inference would be helpful.
Thank you.]]></description>
			<content:encoded><![CDATA[<p>Thanks for such a detailed blog post. Very helpful!<br />
I have a question in the section &#8220;How Emergent Features Emerge&#8221;, #2 topic, &#8220;Attention layers become very sparse&#8221;.  Is this sparsity &gt; 20%? Could we just use pruning during training to handle the sparsity well? To improve performance during inference, is it good to just use H/W supported sparsity (block or scattered) ?<br />
Any comment in this regard to improve performance during inference would be helpful.<br />
Thank you.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Samuel		</title>
		<link>https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/comment-page-1/#comment-110730</link>

		<dc:creator><![CDATA[Samuel]]></dc:creator>
		<pubDate>Fri, 19 Aug 2022 11:00:25 +0000</pubDate>
		<guid isPermaLink="false">https://timdettmers.com/?p=1093#comment-110730</guid>

					<description><![CDATA[Error in the first quantization example. 

The vector originally posited (first example) was: [3,1,2,3], not: [3,1,2,4].  This leads the author to erroneously cite 2 errors when there is only 1.]]></description>
			<content:encoded><![CDATA[<p>Error in the first quantization example. </p>
<p>The vector originally posited (first example) was: [3,1,2,3], not: [3,1,2,4].  This leads the author to erroneously cite 2 errors when there is only 1.</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>
