Researcher on Compute System Reliability

17 May 2021

Enable future computing system through fundamental research on MPSoC reliability modelling.

What you will do

System Architecture innovations are key to position imec for success in fast evolving workloads of tomorrow and to differentiate imec’s process technology innovations with system level value proposition. The Compute System Architecture Unit at imec leads research into futuristic high-performance and highly-secure RISC-V CPUs to extend imec’s semiconductor research leadership deep into the next decade. This unit is also researching accelerator-based architectures for next-generation Artificial Intelligence (AI), compute-in-memory architectures and heterogeneous memory systems. The team is responsible for architecture definition of new CPU and accelerator capabilities, analyzing emerging usage models, and building hardware and software prototypes for data-driven computing hardware.

As we expand our research to better optimize computing, we are looking for a Researcher on Compute System Reliability to contribute hands-on to the development of reliability modeling of HPC MPSoC Compute System. Lifetime reliability is an important and emerging concern arising with advances in technology due to the increase in power density. As a result, temperature variation accelerates wear-out, leading to system failures. The antagonistic relation of lifetime reliability with other design parameters of multiprocessor systems, such as power consumption and temperature, makes its improvement more challenging. Lifetime reliability enhancement approaches are considered at different levels of abstractions and for various system components. We want to extract the precise low-level information of lifetime reliability from determined blocks of processing cores and utilize them at system-level to study the system state criticality at a low-performance cost. This research engineer will be responsible for implementing and integrating system level reliability modeling and mitigating solution in close collaboration with other groups responsible for designing and developing these computer systems. He or she will work closely with partners as well to identify and customize infrastructure and workloads to inform and influence future technology definition.

Responsibilities include:

  • System level reliability research for high performance MPSoC.
  • Developing a model for lifetime reliability management.
  • Mitigating impact of compute system performance degradation using architectural or circuit level innovation.
  • Building a framework to predict the aging - and thus the reliability - of heterogeneous HPC platforms based on physical characteristics and their utilization.
  • Working with the architecture, design and software teams for debugging issues and resolving them
    generating and securing IP.
  • Keeping up-to-date on recent developments in the field. You do this by studying literature and interacting with your colleagues.
  • Good programming practice is a must (C/C++, Python).
  • Strong analytical and problem-solving skills combined with the ability to multitask.
  • Enthusiasm to work in a dynamic and team-oriented environment.
  • Familiarity with ASIC design flow, validation concepts, coverage, etc. is desirable.

What we do for you

We offer you the opportunity to join one of the world’s premier research centers in nanotechnology at its headquarters in Leuven, Belgium. With your talent, passion and expertise, you’ll become part of a team that makes the impossible possible. Together, we shape the technology that will determine the society of tomorrow.

We are proud of our open, multicultural, and informal working environment with ample possibilities to take initiative and show responsibility. We commit to supporting and guiding you in this process; not only with words but also with tangible actions. Through, 'our corporate university', we actively invest in your development to further your technical and personal growth.

We are aware that your valuable contribution makes imec a top player in its field. Your energy and commitment are therefore appreciated by means of a competitive salary with many fringe benefits.

Who you are

You possess a Bachelor’s/ Master’s or PhD degree in Computer Science / ECE with 2-5 years of experience in related areas.

Additional qualifications include:

  • Knowledge of system level reliability modelling.
  • Clear understanding of reliable hardware architecture.
  • Prior experience with architectural simulator (gem5, Sniper, ASIM).
  • Microprocessor architecture (any prior experience with RISC-V is highly desired).
  • Solid background in reliability modelling,

