Software pipelining ( swp ) is an effective technique for loop optimization 软件流水是开发循环指令级并行的重要编译技术。
Software pipelining is a loop scheduling technique that extracts ilp by overlapping the execution of several consecutive iterations 软件流水是开发循环程序指令级并行性的重要编译优化技术。
Epic defines a new style of architecture that enables higher levels of instruction level parallelism ( ilp ) without unacceptable hardware complexity Epic是一种显性并行指令计算体系结构,主要思想是利用编译器和处理器的协同能力来提高指令级并行度。
Nowadays , all sorts of multimedia services and network services develop flourishingly . it is far from enough to meet the performance requirement of such services to exploit ilp only 在各种多媒体服务以及网络服务蓬勃发展的今天,仅仅开发传统的指令级并行性已经远远不能满足这些服务对微处理器的性能要求。
Based on the dlx simulator , smarcof is modified with sma specific extension and heuristic optimizing rules . simulation of spec code shows that above rules could exploit hybrid parallelism effectively with rather low overhead 基于spec代码的模拟表明该方式能够有效的挖掘系统的潜力,实现深度的指令级并行和线程级并行开发。
State - of - the - art microprocessors exploit instruction level parallelism ( ilp ) to achieve high performance on applications by searching for independent instructions in a dynamic window of instructions and executing them on a wide - issue pipeline 对于当前软件中占主要部分的串行程序而言,微处理器主要依靠开发程序的指令级并行( ilp )来提高性能。
Multithreaded microprocessor , which has many hardware contexts sharing an execution core , can efficiently exploit both the instruction level parallelism and thread level parallelism to acquire higher performance and better performance / power ratio 多份硬件现场共享一组执行单元的多线程处理器能灵活地利用程序中的指令级并行和线程级并行,从而提供更好的性能。
3 ) the instruction - level parallel calculation of streamlines on 3d curvilinear grids has been implemented firstly by using the streaming simd extensions ( sse ) , which are a set of extensions of the intel pentium hi / 4 processor . compared with the conventional algorithm , sse - based algorithm coded by vector class library enhances performance about 55 % , and coded by inlined - assembly is about 75 % ) pentium ( pentium4 )处理器的流simd扩展( sse ) ,首次实现了3d曲线网格流线计算的指令级并行,与传统算法相比,向量类库编码实现的sse算法将性能提高了55左右,嵌入汇编实现提高了75左右。
One of the key elements to achieving higher performance in microprocessors is executing more instructions per cycle . however , dependencies among instructions , varying latencies of certain instructions , and execution resources constraints , limit this parallelism considerably . in order to exploit instruction level parallelism , processor should employ data dependence analysis to identify independent instructions that can execute in parallel 当前,在微处理器体系结构研究中,为了充分提高微处理器的处理性能,主要采用了指令级并行技术( ilp ) ,指令级并行性的开发程度对发挥微处理器的硬件特性,提高程序运行性能至为关键。
Adaptive stack cache with fast address generation policy decouples stack references from other data references , improves instruction - level parallelism , reduces data cache pollution , and decreases data cache miss ratio . stack access latency can be reduced by using fast address generation scheme proposed here 该方案将栈访问从数据高速缓存的访问中分离出来,充分利用栈空间数据访问的特点,提高指令级并行度,减少数据高速缓存污染,降低数据高速缓存失效率,并采用快速地址计算策略,减少栈访问的命中时间。