<feed xmlns='http://www.w3.org/2005/Atom'>
<title>uClibc-alpine/libc/string/sh, branch master</title>
<subtitle>Where we track our uclibc patches
</subtitle>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/'/>
<entry>
<title>sh: fix memcpy saving/restoring FR12-FR15 registers</title>
<updated>2010-12-14T07:08:36+00:00</updated>
<author>
<name>Giuseppe Cavallaro</name>
<email>peppe.cavallaro@st.com</email>
</author>
<published>2010-12-13T10:39:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/commit/?id=640220faf25659eb4c15b78cf8869251dbadbd16'/>
<id>640220faf25659eb4c15b78cf8869251dbadbd16</id>
<content type='text'>
This patch fixes a bug in the memcpy that doesn't save/restore
the FR12-FR15 registers (callee save registers in ST40 ABI) while
copying many cache lines with FPU in single paired precision mode
and by using all FPU registers (DR and XD).

Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;
Reviewed-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;

Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch fixes a bug in the memcpy that doesn't save/restore
the FR12-FR15 registers (callee save registers in ST40 ABI) while
copying many cache lines with FPU in single paired precision mode
and by using all FPU registers (DR and XD).

Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;
Reviewed-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;

Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sh: update the memcpy adding a new loop with aggressive prefetching</title>
<updated>2010-09-15T10:51:04+00:00</updated>
<author>
<name>Salvatore Cro</name>
<email>salvatore.cro@st.com</email>
</author>
<published>2010-09-09T14:10:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/commit/?id=a27dd6924e7964d92b49f4d5ebe2e68cfb2742dd'/>
<id>a27dd6924e7964d92b49f4d5ebe2e68cfb2742dd</id>
<content type='text'>
After exploring different prefetch distance-degree combinations
in this new update of the memcpy function, a new loop has been added
for moving many cache lines with an aggressive prefetching schema.
Prefetch has been removed when move few cache line aligned blocks.
As final result, this memcpy gives us the same performances for small
sizes (we already had!) and better numbers for big copies.
In case of SH4-300 CPU Series, benchmarks show a gain of ~20% for sizes
from 4KiB to 256KiB.
In case of the SH4-200, there is a gain of ~40% for sizes bigger than
32KiB.

Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After exploring different prefetch distance-degree combinations
in this new update of the memcpy function, a new loop has been added
for moving many cache lines with an aggressive prefetching schema.
Prefetch has been removed when move few cache line aligned blocks.
As final result, this memcpy gives us the same performances for small
sizes (we already had!) and better numbers for big copies.
In case of SH4-300 CPU Series, benchmarks show a gain of ~20% for sizes
from 4KiB to 256KiB.
In case of the SH4-200, there is a gain of ~40% for sizes bigger than
32KiB.

Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sh: move data without fetching cache block within the memset</title>
<updated>2010-09-15T10:42:09+00:00</updated>
<author>
<name>Salvatore Cro</name>
<email>salvatore.cro@st.com</email>
</author>
<published>2010-09-09T14:08:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/commit/?id=599c74a4d7e9bbe68b946d65aef2725821ea3fe9'/>
<id>599c74a4d7e9bbe68b946d65aef2725821ea3fe9</id>
<content type='text'>
With this patch the movca.l instruction is used within the memset.
The current memset implementation only uses the FPU and there is
an real gain for all the sizes.
Adding the movca.l instruction numbers always are better than the generic code.
There is a big gain for size greater than 64 KiB but number are worst for 4-32KiB
sizes compared with the implementation without movca.l.

	Time Memory Bandwidth (Mbytes)
-------------------------------------------------
	    Generic         SH4          SH4
	                   (FPU)     (FPU+movca.l)
-------------------------------------------------
512         1143	 1998          1596
1 KiB       1273	 2567          1915
2 KiB       1350	 2993          2128
4-32KiB     1391	 3262          2252
64KiB-16MiB 170		 186	      *830*

Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With this patch the movca.l instruction is used within the memset.
The current memset implementation only uses the FPU and there is
an real gain for all the sizes.
Adding the movca.l instruction numbers always are better than the generic code.
There is a big gain for size greater than 64 KiB but number are worst for 4-32KiB
sizes compared with the implementation without movca.l.

	Time Memory Bandwidth (Mbytes)
-------------------------------------------------
	    Generic         SH4          SH4
	                   (FPU)     (FPU+movca.l)
-------------------------------------------------
512         1143	 1998          1596
1 KiB       1273	 2567          1915
2 KiB       1350	 2993          2128
4-32KiB     1391	 3262          2252
64KiB-16MiB 170		 186	      *830*

Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>use uniform form of C99 keywords</title>
<updated>2010-06-24T13:10:48+00:00</updated>
<author>
<name>Bernhard Reutner-Fischer</name>
<email>rep.dot.nop@gmail.com</email>
</author>
<published>2010-06-24T13:10:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/commit/?id=1dc2afe522b1c6d23c4d16b23e083cc38c69da55'/>
<id>1dc2afe522b1c6d23c4d16b23e083cc38c69da55</id>
<content type='text'>
Signed-off-by: Bernhard Reutner-Fischer &lt;rep.dot.nop@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Bernhard Reutner-Fischer &lt;rep.dot.nop@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sh4: Fixes for SH-4 without an FPU</title>
<updated>2010-06-14T07:39:44+00:00</updated>
<author>
<name>Carmelo Amoroso</name>
<email>carmelo.amoroso@st.com</email>
</author>
<published>2010-06-14T07:39:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/commit/?id=3e0b19f27bba5c47cfe4ea836ef94aad687c14b4'/>
<id>3e0b19f27bba5c47cfe4ea836ef94aad687c14b4</id>
<content type='text'>
This patch disables SH-4 optimizations that rely on the FPU when
building for variants that don't have an FPU, such as SH-4AL.

Signed-off-by: Andrew Stubbs &lt;ams@codesourcery.com&gt;
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch disables SH-4 optimizations that rely on the FPU when
building for variants that don't have an FPU, such as SH-4AL.

Signed-off-by: Andrew Stubbs &lt;ams@codesourcery.com&gt;
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sh: Add new optimisation to the SH4 memcpy</title>
<updated>2009-11-22T20:17:38+00:00</updated>
<author>
<name>Austin Foxley</name>
<email>austinf@cetoncorp.com</email>
</author>
<published>2009-11-22T20:17:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/commit/?id=5c9ef58ec4bcb2def9e30f0b156f9cfcb1d0d163'/>
<id>5c9ef58ec4bcb2def9e30f0b156f9cfcb1d0d163</id>
<content type='text'>
This optimization is based on prefetching and 64bit data transfer via FPU
(only for the little endianess)

Tests shows that:

  ----------------------------------------
  Memory bandwidth    |        Gain
                      | sh4-300 | sh4-200
  ----------------------------------------
  512 bytes to 16KiB  | ~20%    | ~25%
  from 32KiB to 16MiB | ~190%   | ~5%
  ----------------------------------------

Signed-off-by: Austin Foxley &lt;austinf@cetoncorp.com&gt;
Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This optimization is based on prefetching and 64bit data transfer via FPU
(only for the little endianess)

Tests shows that:

  ----------------------------------------
  Memory bandwidth    |        Gain
                      | sh4-300 | sh4-200
  ----------------------------------------
  512 bytes to 16KiB  | ~20%    | ~25%
  from 32KiB to 16MiB | ~190%   | ~5%
  ----------------------------------------

Signed-off-by: Austin Foxley &lt;austinf@cetoncorp.com&gt;
Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sh: Fix up optimized SH-4 memcpy on big endian.</title>
<updated>2009-07-14T05:56:25+00:00</updated>
<author>
<name>Giuseppe Cavallaro</name>
<email>peppe.cavallaro@st.com</email>
</author>
<published>2009-07-13T15:45:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/commit/?id=9952af00f1cc08a5605d6c57578d09a89adb64f4'/>
<id>9952af00f1cc08a5605d6c57578d09a89adb64f4</id>
<content type='text'>
Signed-off-by: Hideo Saito &lt;saito@densan.co.jp&gt;
Signed-off-by: Paul Mundt &lt;lethal@linux-sh.org&gt;
Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;

See Linux Kernel commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e08b954c9a140f2062649faec72514eb505f18c3
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Hideo Saito &lt;saito@densan.co.jp&gt;
Signed-off-by: Paul Mundt &lt;lethal@linux-sh.org&gt;
Signed-off-by: Giuseppe Cavallaro &lt;peppe.cavallaro@st.com&gt;

See Linux Kernel commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e08b954c9a140f2062649faec72514eb505f18c3
Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add optimized memcpy implementation for sh4 (from Stuart Menefy @STMicroelectronics).</title>
<updated>2008-09-09T16:55:27+00:00</updated>
<author>
<name>Carmelo Amoroso</name>
<email>carmelo.amoroso@st.com</email>
</author>
<published>2008-09-09T16:55:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git-old.alpinelinux.org/uClibc-alpine/commit/?id=e4f55f33f69fce85099dd5936cc74856aa1b453d'/>
<id>e4f55f33f69fce85099dd5936cc74856aa1b453d</id>
<content type='text'>
This implementation is based on 'backward copying'.

Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This implementation is based on 'backward copying'.

Signed-off-by: Carmelo Amoroso &lt;carmelo.amoroso@st.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
