...
Page Properties | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
UpdatesUpdates
- Testing on the k230 board shows "only" a 17% runtime improvement when the target for x264 vectorization is a 50% runtime improvement (which will double the spec score)
- However, it looks like the cost of a vector ALU op is at least 3X LMUL
- So a performant uarch where vector ALU ops of reasonable size (128 bits) are 1c would see the expected 50% runtime improvement.
- Considering this resolved.
- Dynamic instruction rates cut by 47%, so in the right ballpark for a 2X performance improvement
- x86 shows a roughly 88% improvement (ie, runtime nearly cut in half)
- aarch64 shows roughly a 104% improvement (ie run time cut by more than 50%)
- 47% reduction in dynamic cycle counts is in the right ballpark
- Need to do performance testing to reach closure
...