Leveraging the ILLA Cloud and Hugging Face Partnership for Endpoints

Discover how Illa Cloud and Hugging Face are transforming audio-to-text conversion with advanced cloud computing and AI models. Learn how to leverage this powerful technology for improved productivity and new use cases in our latest blog post.

Leveraging the ILLA Cloud and Hugging Face Partnership for Endpoints


We are thrilled to announce the partnership between ILLA Cloud and Hugging Face, a collaboration that combines the power of ILLA Cloud's application-building capabilities with Hugging Face's state-of-the-art natural language processing (NLP) models. This collaboration enables users to harness the power of both platforms in a seamless and efficient manner, unlocking new opportunities and capabilities. Hugging Face's PEFT (Position Embeddings as Fourier Transform) is a technique that leverages Fourier Transform to efficiently compute positional embeddings in sequence models. Hugging face detectors and accelerator also supports GPT4 through the models it provides.

ILLA Cloud is a versatile platform that allows users to build applications by connecting various components and actions, while Hugging Face is a leading provider of NLP models, tools, and resources for developers, researchers, and businesses.

In this comprehensive blog post, we will demonstrate the benefits of this partnership by guiding you through the creation of an audio-to-text application in ILLA Cloud using Hugging Face's Inference Endpoints and the openai/whisper-base model. Additionally, we will discuss some possible use cases and applications for this technology.

Step 1: Build the front-end interface with components

The first step in creating your application is to design an intuitive interface using ILLA Cloud's components, such as a file upload and a button. This interface will enable users to upload audio files and initiate the transcription process easily.

Ensure that the interface is user-friendly and visually appealing. Consider incorporating clear instructions, so users understand how to use the application effectively.

Step 2: Add a Hugging Face resource

To incorporate a Hugging Face resource, complete the required fields as follows:

This step establishes a connection between your ILLA Cloud application and the Hugging Face model, allowing for seamless integration and execution.

Step 3: Configure an Action

Next, configure the action to execute the Hugging Face model:

  1. Select the appropriate parameter type. For the openai/whisper-base model, opt for Binary it since it requires binary file input.
  2. Map the input file from the front-end interface to the action parameter.

Carefully configuring the action ensures that your application processes the audio input correctly and efficiently.

Step 4: Connect the components and actions

Now, establish connections between the components and actions in ILLA Cloud:

  1. Add an event handler to the button, which triggers the action run when clicked.
  2. Set the value of a text component to {{whisper.data[0].text}}. This displays the transcription result on the text component.

By connecting the components and actions, you create a seamless user experience, allowing users to witness the power of Hugging Face's NLP models in action.

Usecase 2:

Here we will demonstrate how to use Stable diffusion v1.5 and Stable diffusion v2.1 in ILLA Cloud through Huggingface.
Step 1: Building a Front-end Interface

We construct a front-end interface by utilizing a drag-and-drop approach to place essential components such as input fields, buttons, images, and more. After adjusting the styles of the components, we obtain the following complete webpage.

Step 2: Creating Resources and Actions

We establish resources and actions by utilizing the Hugging Face Inference API to connect to the Stable Diffusion model. Two models can be utilized: runwayml/stable-diffusion-v1-5 and stabilityai/stable-diffusion-2-1.

Choose the "Hugging Face Inference API" for this purpose.

Provide a name for this resource and enter your token from the Hugging Face platform.

In the Action configuration panel, please enter the Model ID and Parameter. We will retrieve the selected model from radioGroup1, so fill in the Model ID as  {{radioGroup1.value}} . For the input, since it is obtained from the input field, fill in the parameter as {{input1.value}}. The configuration should be as shown in the following image.

We attempt to input "A mecha robot in a favela in expressionist style" in the input component and run the Action. The resulting execution is as follows. From the left panel, you can observe the available data that can be called, including base64binary and dataURI.

Step 3: Displaying Data on Components

To display the image obtained from Step 2, we modify the Image source of the image component to {{generateInput.fileData.dataURI}}. This will enable us to show the generated image.

Step 4: Running the Action with Components

To run the action created in Step 2 when the button component is clicked, add an event handler to the button component.

Step 5: Testing

Following the previous four steps, you can utilize additional components and data sources to complete other tasks and build a more comprehensive tool. For example, you can use other models to assist in generating prompts or store prompts in localStorage or a database. Let's take a look at the complete outcome when all the steps are implemented.

Use Cases and Applications

The audio-to-text application you've created using ILLA Cloud and Hugging Face's openai/whisper-base the model has numerous potential use cases and applications, including:

  1. Meeting minutes: Automatically transcribe meeting recordings, saving time and effort while ensuring accurate documentation.
  2. Podcast transcription: Convert podcast episodes into text, making them more accessible and easily searchable.
  3. Interview transcription: Transcribe interviews for qualitative research, enabling researchers to analyze and code text-based data.
  4. Voice assistants: Enhance voice assistant capabilities by converting spoken user commands into text for further processing.

These use cases are just a few examples of the many possibilities enabled by this powerful partnership.

Expanding the Application

To further enhance your audio-to-text application, consider incorporating additional features such as:

  1. Language translation: Integrate a machine translation model to automatically translate the transcribed text into different languages, making your application more versatile and useful for a global audience.
  2. Sentiment analysis: Analyze the transcribed text for sentiment, allowing users to gauge the overall tone of the audio content.
  3. Keyword extraction: Implement a keyword extraction model to identify essential topics and concepts from the transcribed text, enabling users to quickly understand the primary focus of the audio content.
  4. Text summarization: Summarize the transcribed text using an abstractive or extractive summarization model, providing users with a condensed version of the content.

By adding these features, you can create a more comprehensive and powerful application that addresses various user needs and requirements. You may also want to explore more on the Huggingface ai detector and accelerator, text-to-video hugging face, and LLM model.

Encouraging Collaboration and Innovation

The partnership between ILLA Cloud and Hugging Face encourages collaboration and innovation in the field of natural language processing. By working together, both platforms can support the development of new, groundbreaking applications that push the boundaries of what's possible with NLP technology.

With the increasing demand for NLP solutions in various industries, this partnership paves the way for developers, researchers, and businesses to create and implement cutting-edge solutions that address real-world problems. As a result, the collaboration between ILLA Cloud and Hugging Face contributes to the continued growth and evolution of the NLP landscape.

In summary, the ILLA Cloud and Hugging Face partnership empowers users to build powerful applications that take advantage of state-of-the-art NLP models. By following this comprehensive tutorial and considering additional features and use cases, you can create an audio-to-text application that showcases the potential of this collaboration and opens the door to endless possibilities in the world of NLP.


The partnership between ILLA Cloud and Hugging Face provides users with a seamless and powerful way to build applications that leverage cutting-edge NLP models. By following this tutorial, you can quickly create an audio-to-text application that utilizes Hugging Face's Inference Endpoints in ILLA Cloud. This collaboration not only simplifies the application-building process but also opens up new possibilities for innovation and growth.

Join our Discord Community: discord.com/invite/illacloud

GitHub page: github.com/illacloud/illa-builder
Try Free
Build Your internal tools at lightning speed!
Try For Free