aboutsummaryrefslogtreecommitdiffstats
path: root/community/xxhash/lift-XXH_FORCE_MEMORY_ACCESS-condition.patch
Commit message (Collapse)AuthorAgeFilesLines
* community/xxhash: fix 20x speed degradation on x86*, upgrade to 0.6.5alpine-mips-patches2018-12-181-0/+14
Yes, it is 20 times slower on x86* than it should be because xxhash.c always uses "safe" memcpy()-based methods for unaligned memory access (XXH_readXX) irregardless of input alignment due to x86-default XXH_FORCE_ALIGN_CHECK=0. This ends up with real memcpy() calls in hot path (with -O2 too). The bug affects Alpine x86* (not just edge, but at least 3.8 too -- i.e. this is not something introduced in 0.6.5) for aligned and unaligned inputs. Other architectures are severely affected for unaligned inputs only. The fix lifts the XXH_FORCE_MEMORY_ACCESS=1 condition to enable XXH_readXX methods based on __attribute__((__packed__)) usage everywhere except ARMv6 (which is covered by its own case earlier). This is safe and fast because the compiler will either: - use direct storage access instructions on capable architectures such as aarch64, armv7, ppc64le, s390x, x86* irregardless of input alignment; - or use relatively fast LWL/LWR instructions on mips* with unaligned input; - or use byte load/stores and shifts/ors on armel with unaligned input which is still faster then memcpy() call. All aports that use xxhash.c are likely affected. For example, community/zstd suffers too though not so grave (~15% difference for "zstd -t" on big archive) and main/lz4 is twice slower on basic compression levels. Other aport changes: - modernize; - enable check(); it is short and fast so suitable for slow builders too. The python part is left intact though newer version exists.