International Research Journal of Engineering and Technology (IRJET)Volume: 08 Issue: 05 | May 2021www.irjet.net

\_\_\_\_\_

## AN EFFICIENT NETWORK ON CHIP ROUTER FOR DATA FLOW

## **ARCHITECTURE WITH OPTIMIZED TOPOLOGY**

DR. R. Poovendran<sup>1</sup>, R. Hariharan<sup>2</sup>, S. Jove Stannibal<sup>3</sup>, S. Keerthan Murali<sup>4</sup>, R. Mukilan<sup>5</sup>

<sup>1</sup>Associate Professor, <sup>2,3,4,5</sup>UG Student, Department of Electronics and Communication Engineering Adhiyamaan College of Engineering, M.G.R Nagar, Hosur, Tamil Nadu, India.

poovendranr@gmail.com<sup>1</sup>, harisiddhu466@gmail.com<sup>2</sup>, jstannibal@gmail.com<sup>3</sup>, shreekeerthan@gmail.com<sup>4</sup>, mugilrockz24@gmail.com<sup>5</sup>

\_\_\_\_\_\*\*\*\_

**Abstract**- Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow registering, a lot of information are every now and again moved among handling components through the organization on-chip (NoC). Subsequently, the switch configuration altogether affects the presentation of dataflow design. Normal switches are intended for control-stream multi-center design and we discover they are not appropriate for dataflow engineering. In this work, we examine and separate the highlights of information moves in NoCs of dataflow engineering: different objections, high infusion rate, and execution delicate to delay. In light of the three highlights, we propose a novel and proficient NoC switch for information stream design. The proposed switch upholds multi-objective; consequently, it can move information with numerous objections in a solitary exchange. Additionally, the switch embraces yield cushion to augment throughput and receives non-flutter parcels to limit move delay. Trial results show that the proposed switch can improve the presentation of dataflow engineering by 3.6x over a best in class switch

# *Key Words*: Multi-Destination, Router, Network-On-Chip, Dataflow Architecture, High-Performance Computing.

#### **1. INTRODUCTION**

IRIET

Network on chip (NoC) alludes to a correspondence subsystem on an incorporated circuit which can accomplish an undeniable degree of parallelism since, all connections in the NoC can be worked simultaneously on various information parcels. In dataflow processors, each processing element (PE) is much simpler than a general RISC core. Normally there are no branch predictor, out-of-order control and other complex logic in PEs of dataflow processors. The proportion of chip area occupied by function units in dataflow cores is much larger than that in general cones. Dataflow computing can achieve higher performance than general cores on specific parallel applications. In dataflow processors, there are usually many PEs and each PE executes several instructions.

#### **2.RELEATED WORK**

[1] GPGPUs are acquiring track as standard processors for logical registering [20, 28]. These processors convey high computational throughput and are exceptionally power effective (regarding FLOPs/Watt) [21]. In any case, existing GPGPUs utilize a von-Neumann figure motor and, consequently, experience the ill effects of the model's force failures. Control-based von-Neumann structures are tuned to execute a dynamic, successive stream of directions that impart through express stockpiling (register record and memory). For GPGPUs, this implies that every unique guidance should be brought and decoded, despite the fact that programs generally emphasize over little static parts of the code. Besides, on the grounds that unequivocal capacity is the lone channel for imparting information between guidelines, halfway outcomes are accordingly moved over and over between the utilitarian units and the register record. These failures drastically lessen the energy productivity of present day GPGPUs (just as that of universally useful von-Neumann processors [15, 19]). For instance, late GPGPUs spend just around 10-20% of their dynamic energy on figuring guidelines yet spend up to 30% of their force on the guidance pipeline and the register document. [2] SYSTEMS on chip toward multicore plan for exploiting innovation scaling and furthermore for accelerating framework execution through expanded parallelism in the way that force divider restricts the expansion of the clock recurrence [1]-[3]. Organizations on chip are demonstrated to be attainable and simple to scale for supporting an enormous number of handling components instead of highlight point interconnect wires or shared transports [4]. A multicore framework in which processors convey together through a 2-D lattice organization of switches is appeared in Fig. 1. Every switch has five ports that associate with four adjoining switches and its neighborhood processor. An organization interface (NI) situates between a processor and its switch for changing processor messages into bundles to be moved on the organization and the other way around.

In an average switch, each info port has an information cushion for briefly putting away the parcels on the off chance that that yield channel is occupied. This cushion can be a solitary line as in a wormhole (WH) switch or numerous lines in equal as in virtual channel (VC) switches

#### **3.EXISTING SYSTEM**

Customary dataflow engineering can be partitioned into two kinds: coarse-grained dataflow structures and fine-grained dataflow designs. Coarse grained dataflow structures, like TeraFlux[29] and Runnemede[30], parcel programs into many program blocks as per their reliance.

The dataflow graph is mapped into the PE array and each PE executes only part of the dataflow graph and transfers their results directly to their consumers

#### **4. PROPOSED SYSTEM**

The proposed switch upholds numerous objections. It can move information with numerous objections in a solitary exchange. The switch embraces yield support to augment throughput and receives non-bounce bundles to limit move delay. The proposed switch can improve the exhibition of dataflow engineering by 3.6x over a best in class switch.

#### **5.METHODOLODY**

In the control-flow many-core architecture, transfers in NoCs occur only when cache misses happen. The injection rate of general cores is relatively low. However, in dataflow processors, each instruction produces a result and the result will be transferred through NoCs except that their destinations are all in the same PE with the producer.



Fig -1: Architecture of the dataflow processor used as the experimental platform.

If a dataflow processor wants to maintain a relatively high performance, it has to execute instructions at every cycle. Then it will produce results and send packets to the connected router at almost every cycle. Therefore the injection rate of NoCs in dataflow architecture is usually much higher than that in control-flow architecture.



Fig -2:Dataflow graph and dataflow program

Considering the high injection rate of dataflow architecture, we directly adopt four mesh on-chip networks to evaluate the dataflow accelerator in the experiment. Table 1 shows the injection rate of three typical kernels on dataflow processor with four mesh networks. The injection rate is collected in an ideal situation where the transfer delay of routers is set to zero. Only when dataflow architecture achieves high performance can the injection rate be meaningful.

The NoC networks consist of four  $8 \times 8$  mesh networks, thereby the total number of routers is 256. Even with so many routers, the average injection rate of routers still reaches 51.5%. However, in many-core architectures the injection rate is generally under  $5\%^{[26]}$ . The injection rate of routers in dataflow architecture is much higher than that in control-flow many-core architecture.

#### **6.EXPERIMENTAL RESULT**



Fig -3: CHART FOR BUFFER ALGORITHM

The modified network in which high injection allocation scheme is implemented is shown above .The adopted mechanisms can reduce the possibility of congestion in routers and decrease the packet transfer delay in dataflow architecture shows the average transfer delay in each router. The graph which is represented below shows that the performance of the network is improved by implementing the new modified technique.



Fig -4: Graph Chart For Buffer Algorithm



Fig -5: Reduction of packet on nodes with counts through multi-destination mechanism



**Fig -6:** Performance of different router on node with typical data flow applications

### 7. CONCLUSION

In this work, we proposed a novel and productive NoC switch for dataflow engineering. The proposed switch upholds numerous objections; subsequently it can move information with different objections in a solitary exchange. Moreover, the switch embraces yield cushion to boost throughput and receives non-flutter parcels to limit move delay. Test results showed that the proposed switch can improve the exhibition of dataflow engineering by 3.6x over a best in class switch.

#### REFERENCES

- [1] Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Tema O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning In Proc. the 19<sup>th</sup> International Conference on Architectural Support for Programming Languages and Operating Systems., 2019.
- [2] Voitsechov D, Etsion Y. Single-graph multiple flows: Energy efficient design alternative for GPGPUs. In Proc. The 41st Int. Symp. Computer Architecture,2019.
- [3] Tran A T, Baas B M. Achieving high-performance onchip networks with shared-buffer routers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,2019
- [4] Milutinovic V, Salom J, Trifunovic N, Giorgi R. Guide to Dataflow Supercomputing (1st edition). Springer International Publishing,2018.
- [5] Wei L, Zhou L. An equilibrium partitioning method for multicast traffic in 3D NoC architecture. In Proc. The IFIP/IEEE International Conference on Very Large Scale Integration,2018.

#### **BIOGRAPHY:**



Dr. R. Poovendran, Associate Professor, Electronics And Communication Engineering Department, Adhiyamaan College of Engineering, Anna University.