1.Data Ingestion (Transport/Format):
- Data arrives from various sources in different formats (structured or unstructured).
- Real-time streams or batch data may come from databases, telemetry devices, or video equipment.
- Cloud technologies facilitate data reception, processing, and storage.
2.Transformation (ETL/Processing):
- Raw data often requires transformation for machine learning or business analytics.
- JSON data, for instance, can be transformed into tabular format.
- The goal is efficient data representation without altering its content.
3.Staging storage:
- Storing raw data in its original format separates reception from processing.
- Object Storage serves as a suitable repository.
4.StorageData storage encompasses a rich array of cloud-based systems. The choice depends on data formats and specific tasks. Let’s explore some of these technologies:
- ClickHouse: Ideal for analytical queries.
- PostgreSQL: Widely used for transactional queries.
- MongoDB: Suitable for storing data in JSON-like structures.
- Elasticsearch: Enables fast full-text search.
- Spark and HDFS: Distributed systems for handling large datasets and integrating machine learning.
5.User Applications and Business LogicThese applications can range from analytical reporting systems to search engines or high-throughput data processing apps.
Kubernetes manages containerized applications, ensuring resilience and scalability.