Enterprise AI Analysis
SKYLIGHT: A Scalable Hundred-Channel 3D Photonic In-Memory Tensor Core Architecture for Real-time AI Inference
Authors: Meng Zhang, Ziang Yin, Nicholas Gangi, Alexander Chen, Brett Bamfo, Tianle Xu, Jiaqi Gu, Zhaoran Rena Huang
The growing computational demands of artificial intelligence (AI) are challenging conventional electronics, making photonic computing a promising alternative. However, existing photonic architectures face fundamental scalability and reliability barriers. This paper introduces SKYLIGHT, a scalable 3D photonic in-memory tensor core architecture designed for real-time AI inference. By co-designing its topology, wavelength routing, accumulation, and programming in a 3D stack, SKYLIGHT overcomes key limitations. Its innovations include a low-loss 3D Si/SiN crossbar topology, a thermally robust non-micro-ring resonator (MRR)-based wavelength-division multiplexing (WDM) component, a hierarchical signal accumulation using a multi-port photodetector (PD), and optically programmed non-volatile phase-change material (PCM) weights. Importantly, SKYLIGHT enables in-situ weight updates that support label-free, layer-local learning (e.g., forward-forward local updates) in addition to inference. Using SimPhony [90] for system-level modeling, we show that a single 144 × 256 SKYLIGHT core is feasible within a single reticle and delivers 342.1 TOPS at 23.7 TOPS/W, enabling ResNet-50 inference at 1212 FPS with ~27 mJ per image, and achieves 84.17 FPS/W end-to-end (1.61× higher than an NVIDIA RTX PRO 6000 Blackwell GPU) under the same workload in real-time measurements. System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT 's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations). With noise-aware training, SKYLIGHT maintains high task accuracy, validating its potential as a comprehensive solution for energy-efficient, large-scale photonic AI accelerators.
Executive Impact: Unlocking Scalable Photonic AI
SKYLIGHT represents a significant leap in photonic computing, offering unprecedented performance and efficiency for real-time AI inference. Its innovative 3D architecture and non-volatile memory address critical bottlenecks, paving the way for next-generation AI accelerators.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
3D Si/SiN Crossing-Free Topology
SKYLIGHT uses a novel 3D Si/SiN photonic crossbar architecture that distributes computation across vertically stacked layers, eliminating cascaded in-fabric crossings. This design enables low-loss scaling to large non-volatile arrays (up to 144 × 256), addressing a critical limitation of traditional 2D planar layouts where waveguide crossings lead to prohibitive insertion losses and routing congestion. The architecture features row waveguides in SiN and column bus waveguides in Si, with vertical escalators for inter-layer transfer, ensuring a crossing-free compute fabric.
High Throughput and Energy Efficiency
SKYLIGHT achieves significant performance gains, delivering 342.1 TOPS at an energy efficiency of 23.7 TOPS/W. For ResNet-50 inference, it enables 1212 FPS at approximately 27 mJ per image. End-to-end, SKYLIGHT demonstrates 84.17 FPS/W, which is 1.61× higher than an NVIDIA RTX PRO 6000 Blackwell GPU under the same workload. The large core size (144x256) directly contributes to real-time performance by executing more computations at once, reducing time-multiplexing overheads, and decreasing energy per inference due to fixed power components (e.g., comb generation, mixed-signal I/O).
Non-Volatile PCM Weight Banks with Optical Programming
The architecture integrates scalable non-volatile Phase-Change Material (PCM) weight banks directly into the photonic crossbar. These PCM cells act as absorptive attenuators for in-memory dot-product, storing weights optically. Programming is achieved using vertically integrated VCSEL arrays (1064 nm wavelength) that deliver localized programming pulses through grating couplers. This optical programming minimizes heater-induced thermal crosstalk and enables precise refractive index modulation, supporting 7-bit multi-level precision with >10^6 cycles endurance. This non-volatile approach drastically reduces programming energy compared to electrical methods and incurs near-zero static hold power during inference, which is crucial for energy-efficient, large-scale photonic cores.
In-Situ Learning and Robustness with Noise-Aware Training
SKYLIGHT is designed to support in-situ weight updates, enabling label-free, layer-local learning approaches like forward-forward local updates, in addition to inference. Through noise-aware quantization and training, the system maintains high task accuracy even under realistic hardware non-idealities. These non-idealities include low-bit quantization (INT6 inputs, INT7 weights, INT8 outputs) and signal-proportional analog noise (modulation, PCM programming, readout variations). Evaluations on diverse machine learning tasks (RF classification, ImageNet-1K, CIFAR-10, Flood mapping) demonstrate SKYLIGHT's robustness and its potential as a comprehensive solution for adaptive unsupervised learning in edge AI environments.
SKYLIGHT dramatically improves energy efficiency, outperforming state-of-the-art GPUs and prior photonic accelerators, making it ideal for energy-constrained edge AI deployments.
SKYLIGHT 3D Photonic Core Workflow
SKYLIGHT integrates several innovations into a 3D stack, optimizing wavelength routing, accumulation, and programming. The process starts with encoding input signals using WDM, routing them through a crossing-free 3D crossbar with PCM-modulated weights, accumulating results hierarchically, and finally converting to digital for output.
| Feature | SKYLIGHT (Proposed) | Conventional Photonic |
|---|---|---|
| Crossbar Topology |
|
|
| Weight Storage |
|
|
| WDM Routing |
|
|
| Accumulation |
|
|
SKYLIGHT's co-design approach yields significant advantages over conventional photonic architectures, particularly in managing optical loss, power consumption, and thermal stability at scale. The 3D crossing-free topology, non-volatile optical PCM, non-resonant WDM, and hierarchical accumulation are key differentiators.
Real-Time AI Task Performance
SKYLIGHT's performance across diverse real-time machine learning tasks highlights its robustness to hardware non-idealities and its ability to maintain high accuracy, even for self-supervised learning scenarios critical for edge AI.
RF Signal Classification (8-class)
Achieved 0.873 accuracy with noise-aware training on CSPB-ML-2018R2, demonstrating robust high-throughput inference for edge autonomy.
Large-Scale Vision (ImageNet-1K)
Achieved 0.752 Top-1 accuracy with SparK-pretrained ResNet50 at >1000 FPS, supporting high-parameter, memory-intensive inference.
Unsupervised Vision Learning (CIFAR-10)
Achieved 0.773 accuracy with noise-aware training for label-free, local learning, adapting to new environments without centralized supervision.
Flood Mapping Segmentation
Achieved 0.493 Mean IoU with noise-aware training on SpaceNet-8, enabling dense, pixel-level segmentation for situational awareness.
Calculate Your Potential ROI
Estimate the tangible benefits of integrating advanced AI solutions into your enterprise. Adjust the parameters to see your projected annual savings and reclaimed human hours.
Your AI Implementation Roadmap
A phased approach to integrate SKYLIGHT's capabilities into your existing infrastructure, ensuring a smooth transition and maximum impact.
Phase 01: Strategic Assessment & Planning
Conduct a deep dive into your current AI/ML workloads and infrastructure. Identify key areas where SKYLIGHT's real-time inference and in-situ learning can provide the most significant competitive advantage. Define KPIs and a phased rollout strategy.
Phase 02: Pilot Integration & Proof-of-Concept
Implement a small-scale SKYLIGHT tensor core for a specific, high-impact workload. Validate performance metrics (TOPS, FPS, energy/image) and verify noise-aware training robustness on your custom datasets. Establish a baseline for future scaling.
Phase 03: Scaled Deployment & Optimization
Expand SKYLIGHT integration to larger clusters and diverse workloads. Implement multi-core chiplet architectures for aggregate throughput. Continuously monitor performance, optimize for specific application demands, and leverage in-situ learning for adaptive model refinement.
Ready to Transform Your AI Capabilities?
Connect with our expert team to explore how SKYLIGHT's breakthrough photonic AI architecture can drive unparalleled performance and efficiency for your enterprise.