What are the configurations?
- Enabling cache/MMU
- Enabling SCU
- Integrating with existing initialization for Raspberry Pi 1
Enabling cache/MMU
For Raspberry Pi 1
Cache can be enabled simply by setting the bits in CP15 Control Register.
Cache/mmu setup for Raspberry Pi starts in
~/rtems/c/src/lib/libbsp/arm/shared/mminit.c
This file implements a single memory initialization function.
BSP_START_TEXT_SECTION void bsp_memory_management_initialize(void)
It has ARM1176 specific code. So only a part of the function will be same for Pi2. A separate initialization function would be better.
Next setting up translation table and turn on caches/MMU from CP15.
This is supported in existing code in the file
~/rtems/c/src/lib/libbsp/arm/shared/include/arm-cp15-start.h
in function arm_cp15_start_setup_translation_table_and_enable_mmu_and_cache
/* Enable MMU and cache */
ctrl |= ARM_CP15_CTRL_I | ARM_CP15_CTRL_C | ARM_CP15_CTRL_M;
arm_cp15_set_control(ctrl);
For Raspberry Pi 2
For Cortex-A7, to enable the caches, an additional SMP bit in CP15 Auxiliary Control Register has to be enabled. This is not currently done and so by default the caches and mmu are disabled.
This is similar to cortex-A9.
Following this enable instruction and data caches and MMU.
Reference to Xilinx-zynq
~/libbsp/arm/xilinx-zynq/startup/bspstartmmu.c
The function arm_cp15_start_setup_mmu_and_cache() will be required for Cortex A7.
Next, like for Pi1 (with respective parameters) is a call to function, arm_cp15_start_setup_translation_table_and_enable_mmu_and_cache() which will also be used for A7.
When it comes to MMU, it is about the translation table and access permissions. Here only the translation table concerns us.
ARM MMU allows two types of translations - section based(which requires a single level translation) and page based (which requires two levels of translation). I see that we use section based. So there is is a single translation table used by MMU.
Cache can be enabled simply by setting the bits in CP15 Control Register.
Cache/mmu setup for Raspberry Pi starts in
~/rtems/c/src/lib/libbsp/arm/shared/mminit.c
This file implements a single memory initialization function.
BSP_START_TEXT_SECTION void bsp_memory_management_initialize(void)
It has ARM1176 specific code. So only a part of the function will be same for Pi2. A separate initialization function would be better.
Next setting up translation table and turn on caches/MMU from CP15.
This is supported in existing code in the file
~/rtems/c/src/lib/libbsp/arm/shared/include/arm-cp15-start.h
in function arm_cp15_start_setup_translation_table_and_enable_mmu_and_cache
/* Enable MMU and cache */
ctrl |= ARM_CP15_CTRL_I | ARM_CP15_CTRL_C | ARM_CP15_CTRL_M;
arm_cp15_set_control(ctrl);
For Raspberry Pi 2
For Cortex-A7, to enable the caches, an additional SMP bit in CP15 Auxiliary Control Register has to be enabled. This is not currently done and so by default the caches and mmu are disabled.
This is similar to cortex-A9.
Following this enable instruction and data caches and MMU.
Reference to Xilinx-zynq
~/libbsp/arm/xilinx-zynq/startup/bspstartmmu.c
The function arm_cp15_start_setup_mmu_and_cache() will be required for Cortex A7.
Next, like for Pi1 (with respective parameters) is a call to function, arm_cp15_start_setup_translation_table_and_enable_mmu_and_cache() which will also be used for A7.
When it comes to MMU, it is about the translation table and access permissions. Here only the translation table concerns us.
ARM MMU allows two types of translations - section based(which requires a single level translation) and page based (which requires two levels of translation). I see that we use section based. So there is is a single translation table used by MMU.
- The translation table is set up in arm-cp15-start.h file under arm_cp15_start_set_translation_table_entries(ttb, &config_table [i]) .
- The ARM memory configuration for Raspberry Pi under RTEMS is provided by arm_cp15_start_mmu_config_table[] in mm_config_table.c .
- The memory attributes for ARM memory are controlled using several flags which are defined in arm-cp15.h . These flags are a combination of bits and are present for each entry in the translation table. These bits are according to the ARM v7 translation table descriptor format for sections.
- Relevant to the issue at hand are the bits controlling cacheability. These are TEX[2:0], C and B bits.
- There are settings to control the translation table memory itself. This is done through TTBR0 register (format which supports multiprocessor extensions)
Existing settings include:
TEX[2:0]=1,1,1 & C,B=1,1 -> write back, write allocate, normal memory (for both inner and outer). No write allocate is costly.
This has been applied to the cacheable regions of memory (I identified these as as having the CACHED suffix in the macro in mm_config_table).
Without turning caches/mmu on I get 83333 dhrystones/sec. After turning on 76923 dhrystones/sec.
Tried changes:
- TEX[2:0]=1,0,1 & C,B=0,1 -> cacheable memory: write back, write allocate, normal memory. This significantly reduced performance (58823 dhrysones/sec).
- TEX[2:0]=0,0,1 & C,B=1,1 -> write back, write allocate, normal memory (region does not remain cacheable?). Here the performance was similar to the performance obtained after just enabling caches. Except that, in this case , on first run of dhrystone I get 83333 dhrystones/sec and without this change I get 76923 dhrystones/sec. Else, on subsequent executions of dhrystone the performance fluctuates between the two figures .
- Changes to TTBR0, which controls the translation table memory region attributes.
Code
For now, I have not considered integration of the two Pi variants. I have replaced the existing
BSP_START_TEXT_SECTION void bsp_memory_management_initialize(void)
with initialization required for Pi2.
The link below explains set up for ARM v7 architecture
- Invalidation of caches not done.
- Enable SMP bit (TRM for Cortex A7 mpcore section 4.3.27 System Control) before enabling caches/mmu or performing any cache and TLB maintenance operations.
- Call arm_cp15_start_setup_mmu_and_cache(). Commented branch prediction enable. (TRM for Cortex A7 mpcore section 4.3.27 System Control description for Z bit)
- Set up translation table and set caches/mmu enable bits. Call to function arm_cp15_start_setup_translation_table_and_enable_mmu_and_cache(). This is same as Pi1.
- In the subsequent call to arm_cp15_start_setup_translation_table() , I have added code to configure bits in TTBR0 for inner and outer write-back, write allocate , cacheable, sharable memory. (v7 set up link above, https://github.com/mrvn/test/blob/master/mmu.cc .
- Set domain clients.
- Enable MMU/caches. Invalidate branch predictor (ARM v7 Architecture reference manual section B2.2.6 under Branch prediction maintenance operations)
Changed TTBR0 translation table memory attributes (using the multiprocessor extensions register format). (TRM for Cortex A7 mpcore section 5.2.1 Memory types and attributes).
References
- Cortex A7 MPCore Technical Reference Manual
- ARM v7 Architecture Reference Manual
- Cortex A7 MPCore Technical Reference Manual
- ARM1176jzfs Technical Reference Manual
- Existing RTEMS code for Raspberry Pi, Xilinx-zynq, Realview-pbx-a9 BSPs
Next: Cache problem solved!
See: More on Cache/MMU configuration
Previous: Introduction to the project
No comments:
Post a Comment