SaaS: Microsoft Fabric

Microsoft introduced Fabric in May 2023 which is an end-to-end analytics platform integrating several technologies into a unified product empowering data and business professionals to further unlock their data potential.

Currently in Preview. If you are looking for current development visit https://blog.fabric.microsoft.com/en-US/blog

Each of those components provides an “experience” (MS naming) which is designed for a specific task:

– Data Factory -> data integration service which can connect to more than 90 built-in data sources and can be used as an ETL/ELT tool (successor of SSIS). At the moment there are two options: Dataflow is a self-service data preparation tool that utilizes Azure Data Lake storage to stage the source data. Data Pipeline is a cloud ETL service for serverless data integration and data transformation (Used also in Data Factory and Synapse products).

– Synapse Data Engineering -> enables users to design, build, and maintain infrastructures & systems in order to analyze large volumes of data. Using: •Lakehouse you can store big data by Upload/new data pipeline/new dataflow/shortcut (e.g.to OneLake).
•Notebook you can run code in PySpark(Python), Spark(Scala), Spark SQL & SparkR.
•Spark Job Definition you can schedule a recurring PySpark(Python) / Spark(Scala/Java) SparkR execution.
•Data pipeline is a cloud ETL service for serverless data integration and data transformation (same as mentioned in the Data Factory task).
•Import Notebook upload notebook code files from local machine to the Power BI workspace.

– Synapse Data Warehousing provides the ability to build a virtual warehouse containing data from any source. Queries can be created through the [Visual Query editor] or the [SQL Query editor] for those familiar with SSMS. Data in the Warehouse is stored in a parquet file and published as Delta Lake Logs. Additionally, Microsoft promises “Leading performance at scale”, use of SQL engine over an open data format, and distributed query processing.

– Synapse Data Science empowers users to complete end-to-end data science workflows with the use of code PySpark(Python)/Spark(Scala)/Spark SQL & SparkR.

– Synapse Real-Time Analytics is a big data analytics platform optimized for streaming and time-series data. Available items are
•Eventstream for capturing, transforming, and routing real-time events to various destinations with a no-code experience.
•KQL database for data storage and management. Data loaded into a KQL database can be accessed in OneLake and is exposed to other Fabric experiences.
• KQL queryset to run queries, view, and customize query results on data. The KQL queryset allows you to save queries for future use, export and share queries with others and includes the option to generate a Power BI report.

– Power BI is a product that will help you turn data into insights about your business, where consists of 3 basic parts: a Windows application [Power BI]. a SaaS service [powerbi.com], and Power BI Mobile.