RESEARCH

Image & Vision

Investigates Image data fusion techniques that combine image and track data from multiple sensors to achieve improved accuracies and more specific inferences than could be achieved by the use of a single sensor alone. Our aim is to explore the state-of-the-art image processing algorithms for achieving effective data fusion as in:

 

Image-to-Text Generation for Active LLM

작성자 관리자 날짜 2024-06-10 15:06:44 조회수 174

1. Motivation
- current smart kiosk systems

  • Mainly depend on speech and touch without any visual information
  • There are limitations to the richness of the responses in LLM due to solely using speech input
  • Operate in a passive manner, necesitating user initiation through touchscreen inputs

 

2. Research goal and issue
- Goal : Develop image-to-text conversion technology for active LLM model
- Issue

  • The current face detection encounters challenges in identifying users
  • Most image-to-text based models need huge computational resources

 

3. Approach
- user detection

  • Current methods for detecting faces often overlook practical application such as identifying users who have a specific intended use
  • Develop identifying user criteria using face detection methods

- image-to-text generation

  • image captioning is the task of describing comprehensive image contents in words
  • scene graph generation method which obtains the relationship between objects is more proper
  • develop scene graph generation method for lightweight architecture

 

4. Result
- Face identification : Identify users by comparing with pre-registered face vectors using a pre-trained model
- Face Expression Recognition : Develop visual emotion recognition model with lightweight. Emotion detection performance is suboptimal when the user is in a side view
- Face Engagement : Engagement is essential for preprocessing to understand user emotions. Engagement is determined using the key points.

 

댓글 (0)

등록된 댓글이 없습니다.
작성 권한이 없습니다.