{"id":999999353,"date":"2023-05-02T21:50:36","date_gmt":"2023-05-02T21:50:36","guid":{"rendered":"https:\/\/alifsemi.com\/whitepaper\/alif-mcu-ram-regions-and-linker-files\/"},"modified":"2024-10-15T22:01:23","modified_gmt":"2024-10-15T22:01:23","slug":"alif-mcu-ram-regions-and-linker-files","status":"publish","type":"whitepaper","link":"https:\/\/alifsemi.com\/whitepaper\/alif-mcu-ram-regions-and-linker-files\/","title":{"rendered":"RAM Regions and Linker Files in Alif Microcontrollers"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\"><strong>Introduction<\/strong><\/h3>\n\n\n\n<p>Microcontrollers (MCUs) in the Alif Ensemble family contain up to 13.5 MB of total internal SRAM. Through the global memory map, all SRAM banks are accessible by any master in the system. This document will describe the SRAM banks and their differences so that they can be best utilized by software.<\/p>\n\n\n    \r\n    <div class=\"cta_link my-5\">\r\n\r\n        <div class=\"container\">\r\n\r\n            <div class=\"row align-items-center justify-content-center mb-5\">\r\n\r\n                \r\n                    <div class=\"col-12 help_faq text-center\">\r\n\r\n                        \r\n                            \r\n                            <a class=\"btn_faq\" href=\"https:\/\/alifsemi.com\/download\/AWPR0003\">\r\n                                Download Whitepaper                            <\/a>\r\n\r\n                        \r\n                    <\/div>\r\n\r\n                    \r\n                \r\n            <\/div>\r\n\r\n        <\/div>\r\n\r\n    <\/div>\r\n\r\n    \r\n        <style>\r\n            .help_faq .btn_faq {\r\n\r\n                background: rgb(90,203,238);                color: rgb(255,255,255);\r\n            }\r\n        <\/style>\r\n\r\n    \n\n\n<h2 class=\"wp-block-heading\"><strong>RAM Regions<\/strong><\/h2>\n\n\n\n<p>Below is a table describing the size, address location, and operating frequency of each SRAM bank in Ensemble Rev A devices. These are the maximum sizes that can be available. The actual size and availability of each SRAM bank depend on the part number of the Ensemble device. You may refer to the part number decoder in the product\u2019s datasheet for more information on SRAM sizing options.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Rev A Silicon (A0, A1, A6)<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Name<\/strong><\/td><td><strong>Global Address<\/strong><\/td><td><strong>Size [KB]<\/strong><\/td><td><strong>Clock [MHz]<\/strong><\/td><\/tr><tr><td>SRAM0<\/td><td>0x02000000<\/td><td>4096<\/td><td>400<\/td><\/tr><tr><td>SRAM1<\/td><td>0x08000000<\/td><td>2560<\/td><td>400<\/td><\/tr><tr><td>SRAM2<\/td><td>0x50000000<\/td><td>256<\/td><td>400<\/td><\/tr><tr><td>SRAM3<\/td><td>0x50800000<\/td><td>1024<\/td><td>400<\/td><\/tr><tr><td>SRAM4<\/td><td>0x60000000<\/td><td>256<\/td><td>160<\/td><\/tr><tr><td>SRAM5<\/td><td>0x60800000<\/td><td>256<\/td><td>160<\/td><\/tr><tr><td>SRAM6<\/td><td>0x62000000<\/td><td>2048<\/td><td>160<\/td><\/tr><tr><td>SRAM7<\/td><td>0x63000000<\/td><td>512<\/td><td>160<\/td><\/tr><tr><td>SRAM8<\/td><td>0x63100000<\/td><td>2048<\/td><td>160<\/td><\/tr><tr><td>SRAM9<\/td><td>0x64000000<\/td><td>768<\/td><td>160<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Rev B Silicon (B0)<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Name<\/strong><\/td><td><strong>Global Address<\/strong><\/td><td><strong>Size [KB]<\/strong><\/td><td><strong>Clock [MHz]<\/strong><\/td><\/tr><tr><td>SRAM0<\/td><td>0x02000000<\/td><td>4096<\/td><td>400<\/td><\/tr><tr><td>SRAM1<\/td><td>0x08000000<\/td><td>2560<\/td><td>400<\/td><\/tr><tr><td>SRAM2<\/td><td>0x50000000<\/td><td>256<\/td><td>400<\/td><\/tr><tr><td>SRAM3<\/td><td>0x50800000<\/td><td>1024<\/td><td>400<\/td><\/tr><tr><td>SRAM4<\/td><td>0x58000000<\/td><td>256<\/td><td>160<\/td><\/tr><tr><td>SRAM5<\/td><td>0x58800000<\/td><td>256<\/td><td>160<\/td><\/tr><tr><td>SRAM6<\/td><td>0x62000000<\/td><td>2048<\/td><td>160<\/td><\/tr><tr><td>SRAM7<\/td><td>0x63000000<\/td><td>512<\/td><td>160<\/td><\/tr><tr><td>SRAM8<\/td><td>0x63100000<\/td><td>2048<\/td><td>160<\/td><\/tr><tr><td>SRAM9<\/td><td>0x60000000<\/td><td>768<\/td><td>160<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>SRAM0 and SRAM1<\/strong><\/h2>\n\n\n\n<p><br>These two SRAM banks reside in the high-performance region of the MCU. They are intended to be used as bulk SRAM, i.e. general-purpose memory by one or more cores. The memory can be used privately by a single core, or it can be used for shared data intended to be readable by multiple entities. The output framebuffers for an LCD panel or input framebuffers for a Camera module would be placed in these banks, for example.<\/p>\n\n\n\n<p><br>The two SRAMs run at 200MHz each in Rev A silicon, and at 400MHz each in Rev B silicon, served by a 400MHz main bus. With this approach, one core or DMA can utilize the full bandwidth of one SRAM bank and another core or DMA can utilize the full bandwidth of the second SRAM bank with no degradation of performance. This is useful in a multi-core and multi-DMA system.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br><strong>SRAM2, SRAM3, SRAM4, and SRAM5<\/strong><\/h2>\n\n\n\n<p><br>These four SRAM banks are not as general-purpose as the SRAM0 and SRAM1 banks. Rather, SRAM2 through SRAM5 operate as the Tightly Coupled Memory (TCM) for the M55 cores in the RTSS-HP and RTSS-HE subsystems. The TCM is a high-bandwidth and low-latency memory that is primarily used by the M55 core it is attached to.<\/p>\n\n\n\n<p><br>Two of the banks, SRAM2 and SRAM3, are clocked at the same frequency as the RTSS-HP domain and operate as the M55-HP core\u2019s high-speed ITCM and DTCM, respectively. The other two banks, SRAM4 and SRAM5, are clocked at the same frequency as the RTSS-HE domain and operate as the M55-HE core\u2019s high-speed ITCM and DTCM, respectively.<\/p>\n\n\n\n<p><br>The TCM access time is a single clock cycle with no wait states. So interrupt routines, general code, and data can be processed with minimum latency when located in the TCM. There is a single bus to access the ITCM, typically used for instruction memory, and four independent buses to access the DTCM, typically used for data memory. This multiple bus architecture is meant to facilitate Single Instruction Multiple Data (SIMD) or vector operations.<\/p>\n\n\n\n<p><br>Each M55 core sees its own ITCM memory mapped at 0x00000000, and its DTCM mapped at 0x20000000. A core cannot use its own TCMs global addresses. Utility functions LocalToGlobal and GlobalToLocal handle the address mapping for when a potentially-TCM address needs to be passed to another part of the system.<\/p>\n\n\n\n<p><br>Since the TCM is accessible through the global address map it is possible to configure peripherals and DMAs to read\/write data to\/from the TCM. This feature is meant to eliminate the need to copy data between \u201cworking memory\u201d close to the CPU and slower memory for other systems to access. A peripheral like I2S or the camera controller can use the DMA to place audio data or image data directly in the high-speed TCM and notify the CPU that it is ready to process.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br><strong>SRAM6, SRAM7, SRAM8, SRAM9<\/strong><\/h2>\n\n\n\n<p><br>These two SRAM banks reside in the high-efficiency region of the MCU. Just like SRAM0 and SRAM1, they are intended to be used as low-power bulk SRAM. Although the banks are general-purpose, they are also lower clocked than the high-performance SRAM banks so a small performance hit will be observed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br><strong>L1 Caches<\/strong><\/h2>\n\n\n\n<p><br>Beside each Cortex-M55 core there is 32kB of L1 Instruction cache and 32kB of L1 Data cache. The caches are fully hardware managed based on configuration given to the MPU. Using the MPU, your software can assign caching policies to one or more memory address ranges which then affect the caching behavior of those addresses.<\/p>\n\n\n\n<p><br>Through the MPU configuration, code and data can be placed in fast L1 Cache after being read from a slower memory location such as MRAM or SRAM. All subsequent reads of this code and data will come from cache rather than from slower memory to improve overall performance. This is called \u201cread-allocate\u201d cache policy. Data written to MRAM, or SRAM can also be stored in cache first allowing the core to continue with other operations rather than waiting for the write operation to complete in the target memory region. The data written, even if the destination address is in the global memory map, will stay in the data cache until an instruction is used to invalidate the data cache. This is called \u201cwrite-back with write-allocate\u201d policy and cache invalidation functions are needed to push data out from cache to the actual memory. If data is meant to be always written to MRAM or SRAM directly, then you would use a \u201cwrite-through\u201d cache policy instead.<\/p>\n\n\n\n<p><br>There is also a \u201ctransient\u201d hint bit in the memory attributes \u2013 this hints that the data is unlikely to be re-read. Transient data is prioritized for eviction when new data needs to be cached. It makes sense to set this attribute for memory holding large frame buffers, which are typically processed in a linear fashion, so that any given part won\u2019t be revisited until the next pass \u2013 it prevents the travel through the buffer from evicting more general program data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br><strong>Prefetch Unit<\/strong><\/h2>\n\n\n\n<p><br>The M55 has an automatic prefetch unit that identifies linear access patterns by monitoring cache misses and preloads data into the cache. This means that the access latency of the SRAM can be hidden in many \u201cdata streaming\u201d cases. So care should be taken to process large SRAM buffers in a linear fashion to activate this.<br>It is also possible to insert manual prefetch hints into code using the __builtin_prefetch function. This will not gain anything if the automatic prefetch unit has been successful.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br><strong>TCM<\/strong><\/h2>\n\n\n\n<p><br>When the TCM of a core is accessed through its local address (0x0 for ITCM or 0x20000000 for DTCM), then the access is single-cycle, and no caching is involved; cache-related MPU attributes are ignored. When the TCM of another core is accessed through global memory addressing then the MPU caching behavior is applied. It is important to set the write-through bits as well as the read and write allocation bits appropriately.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br><strong>Memory Coherency<\/strong><\/h2>\n\n\n\n<p><br>There is no hardware cache coherency between the M55 cores and other bus masters. If a memory region is being accessed by an M55 core and another master, that region must either be marked non-cacheable or shareable, or the cache must be cleaned and\/or invalidated before or after the other master accesses the region.<\/p>\n\n\n\n<p><br>(If sharing a TCM, only the remote core needs to worry about caches \u2013 the local core always performs fast uncached accesses to it.)<\/p>\n\n\n\n<p><br>Non-cached access is very slow, so it\u2019s generally best to mark regions at least write-through cacheable, and perform the maintenance operations as necessary. If a region is write-back cached (but not write-through), you must clean the relevant address range after writing and before another master reads. It is always necessary to invalidate the cache before reading data written by another master.<\/p>\n\n\n\n<p><br>Invalidate-only D-cache operations should be used with care, as they can potentially lose data that has only been written to the cache. Use only when you are certain that the range in question has nothing that needs to be written-back. It is almost never safe to perform a global D-cache invalidate in normal operation.<\/p>\n\n\n\n<p><br>Note that the M55 can perform speculative reads to normal memory, so memory could be cached despite the code not having explicitly read it. Therefore invalidate operations must be made after the other master has finished writing.<\/p>\n\n\n\n<p><br>Tip: Ranged clean and\/or invalidate operations can be slow if on a large range of addresses, due to the need to loop through those addresses. If the range that needs to be worked on is larger than around 128KiB, it can be more time-effective to perform a global clean or clean+invalidate operation, looping through only 32KiB of cache sets and ways.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br><strong>Arm MPU Configuration example<\/strong><\/h2>\n\n\n\n<p><br>In the mpu_table[], we have SRAM0, SRAM6, SRAM8, and MRAM regions defined and assigned to memory attribute 1. We also have SRAM1 region defined and assigned to memory attribute 2. Regions under attribute 1 have read and write cache allocation enabled as well as write-back mode enabled. This is best for general purpose code and data. Regions under attribute 2 only have read-allocation enabled and write-through mode is in use, together with the transient hint.<\/p>\n\n\n\n<p><strong>static<\/strong> <strong>void<\/strong> <strong>MPU_Load_Regions<\/strong>(<strong>void<\/strong>)<\/p>\n\n\n\n<p>{<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; <strong>static<\/strong> <strong>const<\/strong> ARM_MPU_Region_t mpu_table[] __STARTUP_RO_DATA_ATTRIBUTE = {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RBAR = ARM_MPU_RBAR(0x02000000UL, ARM_MPU_SH_NON, 0UL, 1UL, 0UL),&nbsp;&nbsp;&nbsp; \/\/ RO, NP, XN<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RLAR = ARM_MPU_RLAR(0x023FFFFFUL, 1UL)&nbsp;&nbsp;&nbsp;&nbsp; \/\/ SRAM0<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RBAR = ARM_MPU_RBAR(0x08000000UL, ARM_MPU_SH_NON, 0UL, 1UL, 0UL),&nbsp;&nbsp;&nbsp; \/\/ RO, NP, XN<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RLAR = ARM_MPU_RLAR(0x0827FFFFUL, 2UL)&nbsp;&nbsp;&nbsp;&nbsp; \/\/ SRAM1<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RBAR = ARM_MPU_RBAR(0x70000000UL, ARM_MPU_SH_NON, 0UL, 1UL, 1UL),<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RLAR = ARM_MPU_RLAR(0x71FFFFFFUL, 0UL)&nbsp;&nbsp;&nbsp;&nbsp; \/\/ LP- Peripheral &amp; PINMUX Regions *\/<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RBAR = ARM_MPU_RBAR(0x62000000UL, ARM_MPU_SH_NON, 0UL, 1UL, 0UL),&nbsp;&nbsp;&nbsp; \/\/ RO, NP, XN<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RLAR = ARM_MPU_RLAR(0x621FFFFFUL, 1UL)&nbsp;&nbsp;&nbsp;&nbsp; \/\/ SRAM6<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RBAR = ARM_MPU_RBAR(0x63100000UL, ARM_MPU_SH_NON, 0UL, 1UL, 0UL),&nbsp;&nbsp;&nbsp; \/\/ RO, NP, XN<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RLAR = ARM_MPU_RLAR(0x632FFFFFUL, 1UL)&nbsp;&nbsp;&nbsp;&nbsp; \/\/ SRAM8<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RBAR = ARM_MPU_RBAR(0x80000000UL, ARM_MPU_SH_NON, 1UL, 1UL, 0UL),&nbsp;&nbsp;&nbsp; \/\/ RO, NP, XN<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .RLAR = ARM_MPU_RLAR(0x8057FFFFUL, 1UL)&nbsp;&nbsp;&nbsp;&nbsp; \/\/ MRAM<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; };<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; \/* Define the possible Attribute regions *\/<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; ARM_MPU_SetMemAttr(0UL, ARM_MPU_ATTR_DEVICE);&nbsp;&nbsp;&nbsp; \/* Device Memory *\/<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; ARM_MPU_SetMemAttr(1UL, ARM_MPU_ATTR(&nbsp;&nbsp;&nbsp; \/* Normal Memory, Write-back, Read\/Write-Allocate *\/<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ARM_MPU_ATTR_MEMORY_(1,1,1,1),<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ARM_MPU_ATTR_MEMORY_(1,1,1,1)));<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; ARM_MPU_SetMemAttr(2UL, ARM_MPU_ATTR(&nbsp;&nbsp;&nbsp; \/* Normal Memory, Transient, Write-through, Read-Allocate *\/<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ARM_MPU_ATTR_MEMORY_(0,0,1,0),<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ARM_MPU_ATTR_MEMORY_(0,0,1,0)));<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; \/* Load the regions from the table *\/<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; ARM_MPU_Load(0U, &amp;mpu_table[0], <strong>sizeof<\/strong>(mpu_table)\/<strong>sizeof<\/strong>(ARM_MPU_Region_t));<\/p>\n\n\n\n<p>}<\/p>\n\n\n\n<p>Note that there are default attributes for memory addresses not covered by the loaded table \u2013 see Chapter B8: The System Address Map in the Armv8-M Architecture Reference Manual. These attributes would make most Alif SRAM regions cacheable.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br><strong>Memory Retention<\/strong><\/h2>\n\n\n\n<p><br>Some SRAMs in the high-efficiency domain of the SoC have low leakage properties allowing our MCU to offer memory retention modes. When the MCU enters low power STANDBY or STOP Mode, the SRAM banks can be configured to remain on and retain their contents while the rest of the MCU powers down.<br>The optional retentions in Rev A silicon are as follows:<br>\u2022 Option 1: 512 KB is retained by enabling the option on SRAM2 and SRAM3.<br>\u2022 Option 2: 2048 KB is retained by enabling the option on SRAM6.<br>The optional retentions in Rev B silicon are as follows:<br>\u2022 Option 1: 256 KB is retained by enabling the option on half of SRAM2 and half of SRAM3.<br>\u2022 Option 2: 512 KB is retained by enabling the option on SRAM2 and SRAM3 entirely.<br>\u2022 Option 3: 1920 KB is retained by enabling the option on SRAM6 and SRAM7 (subset of total)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u2003<br><strong>Reasoning within Use Cases<\/strong><\/h2>\n\n\n\n<p><br>The following section describes why one memory region is more suitable than another, depending on the usage.<\/p>\n\n\n\n<p><strong>Use Case 1:<\/strong><\/p>\n\n\n\n<p>M55 core is processing a stream of data<br>\u2022 Bad choice: process data in SRAM or use M55 to fetch data<br>\u2022 Ideal choice: use DMA to place data in to M55\u2019s TCM<\/p>\n\n\n\n<p><br>Why: The M55 core\u2019s access time to TCM is a single clock cycle with no wait states. If the M55 needs to fetch data from slower memory regions, then this has an impact to performance. The M55 core\u2019s L1 cache helps to reduce accesses to slower memory regions containing code or data in other use cases but this use case involves processing a stream of incoming data that cannot be cached. Since the TCM is accessible through the global address map it is possible to configure peripherals and DMAs to read\/write data to\/from the TCM.<\/p>\n\n\n\n<p><br>Another reason to place data in TCM is to take advantage of the multiple bus architecture when performing Single Instruction Multiple Data (SIMD) or vector operations. With four independent buses to DTCM and one to ITCM, it is possible to load a SIMD instruction and up to 2 data parameters in a single cycle.<\/p>\n\n\n\n<p><strong>Use Case 2:<\/strong><\/p>\n\n\n\n<p>Extremely large double-buffer graphics use case.<br>\u2022 Bad choice: application is loaded to SRAM or TCM \u2013 leaves no room for framebuffer<br>\u2022 Ideal choice: application should XIP from MRAM and rely on M55 caching<br>Why: The M55 core\u2019s L1 cache is very efficient at prefetching data from MRAM. There is very little performance degradation shown in our internal benchmarking with cache enabled. Using XIP means the application code and read-only data remains stored in MRAM and read-write data is stored in TCM. This configuration leaves 100% of the bulk SRAM available for use as the graphics framebuffer.<\/p>\n\n\n\n<p><strong>Use Case 3:<\/strong><\/p>\n\n\n\n<p>Small ML model.<br>\u2022 Good choice: model in MRAM, activation buffer in SRAM<br>\u2022 Faster choice: model in MRAM, activation buffer in nearest DTCM<br>Why: The model data is large, read-only and the Ethos-U55 makes relatively few accesses to it \u2013 it works perfectly well as XIP from MRAM. The activation buffer has far more random read and write accesses, and its speed significantly affects the overall inference time. If the buffer can fit in the DTCM of the neighboring M55, then the lower access latency can significantly boost performance. A 40% speed-up was observed doing this for keyword spotting in the high-efficiency subsystem.<br>Doing this requires version 22.11 or later of the Ethos-U core driver, as the ethosu_address_remap hook must be implemented to handle conversion of local TCM addresses that the software sees to global addresses that the Ethos-U55 sees.<\/p>\n\n\n\n<p><br><strong>Use Case 4:<\/strong><\/p>\n\n\n\n<p><br>In a multi-core example, one can set up a pipeline operation between two cores. In this use case, one core is collecting data from sensors and then applying a pre-processing algorithm on the data collected. Then, another core can access the processed data from the first core via the global TCM interface. Doing so will minimize data copy and support ping-pong buffering.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Code Examples<\/strong><\/h2>\n\n\n\n<p><br>The following code example describes how to define a buffer and assign it a section name. This should be the same between compilers.<\/p>\n\n\n\n<p>uint8_t byte_buffer[BUFFER_SIZE] __attribute__((used, section(\u201csection_name\u201d)));<\/p>\n\n\n\n<p>Note: the \u201cused\u201d keyword tells the linker file not to remove this section during any optimization steps.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Scatter Example<\/h4>\n\n\n\n<p><br>How to use an Arm Clang scatter file to place a named section into a specific memory region.<br>Below is an example of section names \u201clcd_frame_buf\u201d and \u201ccamera_frame_buf\u201d placed into SRAM banks 0 and 1, respectively.<\/p>\n\n\n\n<p>&nbsp; RW_SRAM0 SRAM0_BASE SRAM0_SIZE&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp; * (lcd_frame_buf)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ; LCD frame Buffer<\/p>\n\n\n\n<p>&nbsp; }<\/p>\n\n\n\n<p>&nbsp; RW_SRAM1 SRAM1_BASE SRAM1_SIZE&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp; * (camera_frame_buf)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ; Camera Frame Buffer<\/p>\n\n\n\n<p>&nbsp; }<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Linker Example<\/h4>\n\n\n\n<p><br>Below is a GNU-style Linker example of section names \u201clcd_frame_buf\u201d and \u201ccamera_frame_buf\u201d placed into SRAM banks 0 and 1, respectively. The NOLOAD keyword is used to tell the linker that these read\/write regions are zero initialized and therefore do not need to be part of the application binary.<\/p>\n\n\n\n<p>&nbsp; .bss.at_sram0 (NOLOAD) : ALIGN(8)<\/p>\n\n\n\n<p>&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; * (.bss.lcd_image_buf)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \/* LCD Frame Buffer *\/<\/p>\n\n\n\n<p>&nbsp; } &gt; SRAM0<\/p>\n\n\n\n<p>&nbsp; .bss.at_sram1 (NOLOAD) : ALIGN(8)<\/p>\n\n\n\n<p>&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; * (.bss.camera_frame_buf)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \/* Camera Frame Buffer *\/<\/p>\n\n\n\n<p>&nbsp; } &gt; SRAM1<\/p>\n\n\n\n<p>A zero table is defined in the linker file as well. Using these zero table entries will make sure that the memory region will be zeroed-out before use. That operation is part of the init code which runs before main function starts.<\/p>\n\n\n\n<p>.zero.table :<\/p>\n\n\n\n<p>&nbsp; {<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; __zero_table_start__ = .;<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; LONG (ADDR(.bss))<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; LONG (SIZEOF(.bss)\/4)<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; LONG (ADDR(.bss.at_sram0))<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; LONG (SIZEOF(.bss.at_sram0)\/4)<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; LONG (ADDR(.bss.at_sram1))<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; LONG (SIZEOF(.bss.at_sram1)\/4)<\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp; __zero_table_end__ = .;<\/p>\n\n\n\n<p>&nbsp; } &gt; MRAM<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microcontrollers (MCUs) in the Alif Ensemble family contain up to 13.5 MB of total internal SRAM. Through the global memory map, all SRAM banks are accessible by any master in the system. This document will describe the SRAM banks and their differences so that they can be best utilized by software.<\/p>\n","protected":false},"featured_media":999999310,"template":"","whitepaper-category":[36,93],"class_list":["post-999999353","whitepaper","type-whitepaper","status-publish","has-post-thumbnail","hentry","whitepaper-category-system-architecture","whitepaper-category-technical-whitepapers"],"acf":[],"_links":{"self":[{"href":"https:\/\/alifsemi.com\/wp-json\/wp\/v2\/whitepaper\/999999353","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alifsemi.com\/wp-json\/wp\/v2\/whitepaper"}],"about":[{"href":"https:\/\/alifsemi.com\/wp-json\/wp\/v2\/types\/whitepaper"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/alifsemi.com\/wp-json\/wp\/v2\/media\/999999310"}],"wp:attachment":[{"href":"https:\/\/alifsemi.com\/wp-json\/wp\/v2\/media?parent=999999353"}],"wp:term":[{"taxonomy":"whitepaper-category","embeddable":true,"href":"https:\/\/alifsemi.com\/wp-json\/wp\/v2\/whitepaper-category?post=999999353"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}