numpy torch torchvision Pillow moviepy librosa transformers opencv-python-headless requests matplotlib scipy pyspellchecker







[](https://buymeacoffee.com/foreigngods)
Welcome to the ComfyUI-Mana-Nodes project!
This collection of custom nodes is designed to supercharge text-based content creation within the ComfyUI environment.
Whether you’re working on dynamic captions, transcribing audio, or crafting engaging visual content, Mana Nodes has got you covered.
If you like Mana Nodes, give our repo a ⭐ Star and 👀 Watch our repository to stay updated.
You can install Mana Nodes via the ComfyUI-Manager
Or simply clone the repo into the custom_nodes directory with this command:
git clone https://github.com/ForeignGods/ComfyUI-Mana-Nodes.git
and install the requirements using:
.\python_embed\python.exe -s -m pip install -r requirements.txt --user
If you are using a venv, make sure you have it activated before installation and use:
pip install -r requirements.txt
✒️ Text to Image Generator
fontTo set the font and its styling you need to input 🆗 Font Properties node here.
canvasTo configure the canvas input the 🖼️ Canvas Properties
textSpecifies the text to be rendered on the images. Supports multiline text input for rendering on separate lines.
“`
“1”: “Hello”,
“10”: “World”,
“20”: “End”
“`
frame_countSets the amount of frames this node will output.
transcriptionInput the transcription output from the 🎤 Speech Recognition node here.
Based on this transcription data, 🖼️ Canvas Properties and 🆗 Font Properties the text should be formatted in a way that builds up lines of words until there is no space on the canvas left (transcription_mode: fill, line).
highlight_fontInput a secondary font 🆗 Font Properties, that is used to highlight the active caption (transcription_mode: fill, line). When manually setting the text the following syntax can be used to defined which word/character:
Hello <tag>World</tag>
imagesThe generated images with the specified text and configurations, in common ComfyUI format (compatible with other nodes).
transcription_framestampsFramestamps formatted based on canvas, font and transcription settings.
Can be useful to manually correct errors by 🎤 Speech Recognition node.
Example: Save this output with 📝 Save/Preview Text -> manually correct mistakes -> remove transcription input from ✒️ Text to Image Generator node -> paste corrected framestamps into text input field of ✒️ Text to Image Generator node.
🆗 Font Properties
font_fileFonts located in the custom_nodes\ComfyUI-Mana-Nodes\font_files\example_font.ttf or system font directories (supports .ttf, .otf, .woff, .woff2).
font_sizeEither set single value font_size or input animation definition via the ⏰ Scheduled Values node. (Convert font_size to input)
font_colorEither set single color value (CSS3/Color/Extended color keywords) or input animation definition via the 🌈 Preset Color Animations node. (Convert font_color to input)
x_offset, y_offsetEither set single horiontal and vertical offset value or input animation definition via the ⏰ Scheduled Values node. (Convert x_offset/y_offset to input)
rotationEither set single rotation value or input animation definition via the ⏰ Scheduled Values node. (Convert rotation to input)
rotation_anchor_x, rotation_anchor_yHorizontal and vertical offsets of the rotation anchor point, relative to the texts initial position.
kerningSpacing between characters of font.
border_widthWidth of the text border.
border_colorEither set single color value (CSS3/Color/Extended color keywords) or input animation definition via the 🌈 Preset Color Animations node. (Convert border_color to input)
shadow_colorEither set single color value (CSS3/Color/Extended color keywords) or input animation definition via the 🌈 Preset Color Animations node. (Convert shadow_color to input)
shadow_offset_x, shadow_offset_yHorizontal and vertical offset of the text shadow.
fontUsed as input on ✒️ Text to Image Generator node for the font and highlight_font.
🖼️ Canvas Properties
height, widthDimensions of the canvas.
background_colorBackground color of the canvas. (CSS3/Color/Extended color keywords)
paddingPadding between image border and font.
line_spacingSpacing between lines of text on the canvas.
imagesCan be used to input images instead of using background_color.
canvasUsed as input on ✒️ Text to Image Generator node to define the canvas settings.
⏰ Scheduled Values
frame_countSets the range of the x axis of the chart. (always starts at 1)
value_rangeSets the range of the y axis of the chart. (Example: 25, will would be ranging from -25 to 25)
This can be changed by zooming via the mousewheel and will reset to the specified value if changed.
easing_typeIs used to generate values in between of the manually added values by the user by clicking the Generate Values button.
The available easing functions are:
step_modeThe option single will force the chart to display every single tick/step on the chart.
The option auto will automatically remove ticks/step to prevent overlapping.
animation_resetUsed to specify the reset behaviour of the animation.
scheduled_valuesAdding Values: Click on the chart to add keyframes at specific points.
Editing Values: Double-click on a keyframe to edit its frame and value.
Deleting Values: Click on the delete button associated with each keyframe to remove it.
Generating Values: Click on the “Generate Values” button to interpolate values between existing keyframes.
Deleting Generated Values: Click on the “Delete Generated” button to remove all interpolated values.
scheduled_valuesOutputs a list of frame and value pairs and the animation_reset option.
At the moment this output can be used to animate the following widgets (Convert property to input) of the 🆗 Font Properties node:
🌈 Preset Color Animations
color_presetCurrently the following color animation presets are available:
animation_durationSets the length of the animation measured as frames.
animation_resetUsed to specify the reset behaviour of the animation.
scheduled_colorsOutputs a list of frame and color definitions and the animation_reset option.
At the moment this output can be used to animate the following widgets (Convert property to input) of the 🆗 Font Properties node:
🎤 Speech Recognition
Converts spoken words in an audio file to text using a deep learning model.
audioAudio file path or URL.
wav2vec2_modelThe Wav2Vec2 model used for speech recognition. (https://huggingface.co/models?search=wav2vec2)
spell_check_languageLanguage for the spell checker.
framestamps_max_charsMaximum characters allowed until new framestamp line is created.
fpsFrames per second, used for synchronizing with video. (Default set to 30)
transcriptionText transcription of the audio. (Should only be used as font2img transcription input)
raw_stringRaw string of the transcription without timestamps.
framestamps_stringFrame-stamped transcription.
timestamps_stringTranscription with timestamps.
raw_stringReturns the transcribed text as one line.
THE GREATEST TRICK THE DEVIL EVER PULLED WAS CONVINCING THE WORLD HE DIDN'T EXIST
framestamps_stringDepending on the framestamps_max_chars parameter the sentece will be cleared and starts to build up again until max_chars is reached again.
"27": "THE",
"31": "THE GREATEST",
"43": "THE GREATEST TRICK",
"73": "THE GREATEST TRICK THE",
"77": "DEVIL",
"88": "DEVIL EVER",
"94": "DEVIL EVER PULLED",
"127": "DEVIL EVER PULLED WAS",
"133": "CONVINCING",
"150": "CONVINCING THE",
"154": "CONVINCING THE WORLD",
"167": "CONVINCING THE WORLD HE",
"171": "DIDN'T",
"178": "DIDN'T EXIST",
timestamps_stringReturns all transcribed words, their start_time and end_time in json format as a string.
[
{
"word": "THE",
"start_time": 0.9,
"end_time": 0.98
},
{
"word": "GREATEST",
"start_time": 1.04,
"end_time": 1.36
},
{
"word": "TRICK",
"start_time": 1.44,
"end_time": 1.68
},
...
]
🎞️ Split Video
videoPath the video file.
frame_limitMaximum number of frames to extract from the video.
frame_startStarting frame number for extraction.
filename_prefixPrefix for naming the extracted audio file. (relative to .\ComfyUI\output)
framesExtracted frames as image tensors.
frame_countTotal number of frames extracted.
audio_filePath of the extracted audio file.
fpsFrames per second of the video.
height, width:Dimensions of the extracted frames.
🎥 Combine Video
framesSequence of images to be used as video frames.
filename_prefixPrefix for naming the video file. (relative to .\ComfyUI\output)
fpsFrames per second for the video.
audio_fileAudio file path or URL.
video_filePath to the created video file.
📣 Generate Audio (experimental)
Converts text to speech and saves the output as an audio file.
textThe text to be converted into speech.
filename_prefixPrefix for naming the audio file. (relative to .\ComfyUI\output)
This node uses a text-to-speech pipeline to convert input text into spoken words, saving the result as a WAV file. The generated audio file is named using the provided filename prefix and is stored relative to the .\ComfyUI-Mana-Nodes directory.
Model: https://huggingface.co/spaces/suno/bark
Bark supports various languages out-of-the-box and automatically determines language from input text. When prompted with code-switched text, Bark will even attempt to employ the native accent for the respective languages in the same voice.
Example:
Buenos días Miguel. Tu colega piensa que tu alemán es extremadamente malo. But I suppose your english isn't terrible.
Below is a list of some known non-speech sounds, but we are finding more every day.
[laughter]
[laughs]
[sighs]
[music]
[gasps]
[clears throat]
— or … for hesitations
♪ for song lyrics
capitalization for emphasis of a word
MAN/WOMAN: for bias towards speaker
Example:
" [clears throat] Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as... ♪ singing ♪."
Bark can generate all types of audio, and, in principle, doesn’t see a difference between speech and music. Sometimes Bark chooses to generate text as music, but you can help it out by adding music notes around your lyrics.
Example:
♪ In the jungle, the mighty jungle, the lion barks tonight ♪
You can provide certain speaker prompts such as NARRATOR, MAN, WOMAN, etc. Please note that these are not always respected, especially if a conflicting audio history prompt is given.
Example:
WOMAN: I would like an oatmilk latte please.MAN: Wow, that's expensive!
📝 Save/Preview Text
stringThe string to be written to the file.
filename_prefixPrefix for naming the text file. (relative to .\output)
| Demo 1 | Demo 2 | Demo 3 |
| —— | —— | —— |
||||
The values for the ⏰ Scheduled Values node cannot be imported yet (you have to add them yourself).
Turn on audio.
https://github.com/ForeignGods/ComfyUI-Mana-Nodes/assets/78089013/e5a39327-db61-46ad-abea-10e27e4551c1
Your contributions to improve Mana Nodes are welcome!
If you have suggestions or enhancements, feel free to fork this repository, apply your changes, and create a pull request. For significant modifications or feature requests, please open an issue first to discuss what you’d like to change.