summaryrefslogtreecommitdiffstats
path: root/libc/string/sh/sh4/memcpy.S
Commit message (Collapse)AuthorAgeFilesLines
* sh: fix memcpy saving/restoring FR12-FR15 registersGiuseppe Cavallaro2010-12-141-3/+15
| | | | | | | | | | | | This patch fixes a bug in the memcpy that doesn't save/restore the FR12-FR15 registers (callee save registers in ST40 ABI) while copying many cache lines with FPU in single paired precision mode and by using all FPU registers (DR and XD). Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Reviewed-by: Carmelo Amoroso <carmelo.amoroso@st.com> Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>
* sh: update the memcpy adding a new loop with aggressive prefetchingSalvatore Cro2010-09-151-21/+107
| | | | | | | | | | | | | | | | After exploring different prefetch distance-degree combinations in this new update of the memcpy function, a new loop has been added for moving many cache lines with an aggressive prefetching schema. Prefetch has been removed when move few cache line aligned blocks. As final result, this memcpy gives us the same performances for small sizes (we already had!) and better numbers for big copies. In case of SH4-300 CPU Series, benchmarks show a gain of ~20% for sizes from 4KiB to 256KiB. In case of the SH4-200, there is a gain of ~40% for sizes bigger than 32KiB. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>
* sh4: Fixes for SH-4 without an FPUCarmelo Amoroso2010-06-141-1/+1
| | | | | | | | This patch disables SH-4 optimizations that rely on the FPU when building for variants that don't have an FPU, such as SH-4AL. Signed-off-by: Andrew Stubbs <ams@codesourcery.com> Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>
* sh: Add new optimisation to the SH4 memcpyAustin Foxley2009-11-221-10/+110
| | | | | | | | | | | | | | | | | | | This optimization is based on prefetching and 64bit data transfer via FPU (only for the little endianess) Tests shows that: ---------------------------------------- Memory bandwidth | Gain | sh4-300 | sh4-200 ---------------------------------------- 512 bytes to 16KiB | ~20% | ~25% from 32KiB to 16MiB | ~190% | ~5% ---------------------------------------- Signed-off-by: Austin Foxley <austinf@cetoncorp.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>
* sh: Fix up optimized SH-4 memcpy on big endian.Giuseppe Cavallaro2009-07-141-12/+12
| | | | | | | | | | Signed-off-by: Hideo Saito <saito@densan.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> See Linux Kernel commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e08b954c9a140f2062649faec72514eb505f18c3 Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>
* Add optimized memcpy implementation for sh4 (from Stuart Menefy ↵Carmelo Amoroso2008-09-091-0/+808
@STMicroelectronics). This implementation is based on 'backward copying'. Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>