Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
StreamPot: Run FFmpeg as an API with fluent-FFmpeg compatibility, queues and S3 (github.com/streampot)
218 points by thunderbong on July 28, 2024 | hide | past | favorite | 36 comments


Specification of ffmpeg commands can get quite complex, especially once you get to filters.

I never understood why it hasn't grown a DSL to specify the input/output/transformation. Could be JSON-based for what I care. Then we could have nice things like easy programmatic generation of these scripts and schema verification (for each particular ffmpeg version).

Also, when I looked at how LosslessCut (an Electron app) does the timeline preview thumbnails, it just calls the ffmpeg CLI app however many times it needs (once per thumbnail). With the number of heavy users of ffmpeg, I'm wondering how nobody has yet started a project to run ffmpeg in-memory as a server and avoid the process startup cost as I'm sure LosslessCut isn't the only one who does this.


What do you mean? FFMpeg supports filters via its own DSL; see https://ffmpeg.guide/ or https://ffmpeg.org/ffmpeg-filters.html

It's not quite JSON and I have to look up the syntax every time I use it, but it's well-defined.

The general syntax is:

    [input1][input2]... filterchain [output1];
    [input3] filterchain [output2];
    ...
to specify chains of filters that link input streams to output streams.

For example, the following:

    ffmpeg -i INPUT -vf "
      split [main][tmp];
      [tmp] crop=iw:ih/2:0:0, vflip [flip];
      [main][flip] overlay=0:H/2
    " OUTPUT
is doing something like this (python pseudocode):

    main,tmp = input.split(2)
    flip = (tmp
       .crop(iw, ih/2, 0, 0)
       .vflip())
    out = overlay([main, flip], 0, H/2)
Here's a more complicated example (visualization on https://ffmpeg.guide/graph/demo ):

    ffmpeg -i ./long-video.mp4 
    -i ./background-music.mp3
    -filter_complex "[0:v]trim=duration=30[out0];
      [0:a]atrim=duration=15[out1];
      [out1][1:a]
         amix=inputs=2:duration=longest:
         dropout_transition=2:weights=1 1:
         normalize=1
      [out2]"
     -map "[out0]" -map "[out2]"
    ./shorter-video.mp4
Pseudocode:

    # input files (-i options)
    in0 = read("./long_video.mp4")
    in1 = read("./background_music.mp3")
    # filter graph
    out0 = in0.video.trim(duration=30)
    out1 = in0.audio.trim(duration=15)
    out2 = amix([out1, in1.audio],
        duration="longest", dropout_transition=2, 
        weights=[1,1], normalize=1)
    # output stream mapping
    out_stream[0] = out0
    out_stream[1] = out2


That'd be a great solution, I'm using ffmpeg in my personal media server app to generate thumbnails and transcode media on the fly and I was wondering that too.

There are also a few things that I had to patch on top of ffmpeg (for example it doesn't do well with generating singular HLS packets on demand). Would be nice to have a pluggable architecture that can link multiple things together like that.

Might implement this myself if I end up having the time for that.


I only found this project focused on streaming but it's unmaintained (some summer of code or thesis project, IIRC):

https://trac.ffmpeg.org/wiki/ffserver


Interesting that it uses a similar configuration syntax to Apache HTTPd.


Seems like writing a script wrapper around ffmpeg could easily solve the first problem. All it needs to do is to read in JSON/YAML/TOML/etc. then parse them into command line equivalents and call ffmpeg. That is a kind of script pretty much any coding LLM could generate.

I was watching a streamer trying to use ffmpeg in a streaming configuration and it didn't seem to work very well in that case. However, if someone was belligerent enough, I would guess it would be possible to use libavformat/libavcodec directly to parse out frames from video. For example, the streamer "sphaerophoria" posted a short series where he built a video editor from scratch [1]. I believe the code he wrote could be used as the basis for a custom-built editor to extract thumbnails from videos.

1. https://www.youtube.com/watch?v=l_hD99zpPBY&ab_channel=sphae...


If they are such a heavy user why don't they just use libavcodec directly?


Seeing the need for polling the job status is off putting, especially for a js api that already is using async. If you _need_ to use polling, at least provide a convenience method I can just await. Better yet, add some signaling to the http api via eventsource or something.


Thank you! We do have one - `runAndWait`. I will shortly update the docs and I agree that using SSE would be more efficient than polling. Will add that next!


Long polling would be cool!


Highly prefer and recommend websocket connections for polling. This is what I ended up doing for my side project's video encoding needs[1]. It lets me get silky smooth progress indicators on multiple encoding resolutions at once.

[1] https://github.com/jjcm/nonio-video-cdn/blob/master/route/en...


Do you mean over* polling (instead of polling)? Polling (repeatedly asking the server) and pushing (the server telling you directly) are two different things; polling over WebSocket seems sort of weird.


A push over a websocket is still fundamentally a polling behaviour; it’s just happening further down the stack, below the awareness of application code on the client and implemented internally on the server. This does save you an explicit round-trip in your own code though.

Ultimately the only element that isn’t polling in a classical machine architecture is the interrupt handler in the top half of the kernel (or system equivalent).


> A push over a websocket is still fundamentally a polling behaviour; it’s just happening further down the stack, below the awareness of application code on the client and implemented internally on the server.

Not necessarily. The server may internally use interprocess signalling or sockets or something to find out when the process is ready instead of busy-waiting. For example, if ffmpeg is run as a child process, the parent process can sleep a thread waiting for the child to terminate.

Just about any sort of busy-wait polling behaviour is a code smell in my book. Stop wasting cycles. Let the thing you're waiting on tell you when its ready. If it can't, fix it until it can. This is all opensource code.


That will also eventually be polling, within a kernel subsystem or library function that one has merely chosen to perceive as a black box for convenience of reasoning.

Seriously, unless you know we’re delivering an interrupt or a signal, assume it’s polling all the way down. And I’m not so sure about the signals.


When my program is waiting to read from a network stream, or from a USB device, it uses 0% CPU and responds instantly when the signal comes in. That is how software should be.

Maybe there's cases in the kernel where polling is unavoidable. But polling should happen as little as possible - for the sake of both latency and efficiency. I don't spend much time thinking about the kernel. In userland, polling is almost never necessary. I'd love it if just about all software that manages data that changes over time was written such that downstream consumers can be notified when that data changes. This should be the case for filesystems, databases, message queues, web servers, device lists over USB and so on.


It isn’t.


Servers doing polling internally and sending you periodic status updates over WebSocket is relatively common. I don't think polling over WebSocket is a good idea in this situation though (as in sending periodic requests to the server asking for a reply).


Eh adding websockets is a significant complexity jump, in my experience it's hard to write websocket client code which doesn't ever end up in a bad state despite connection breaks, and I still haven't figured out how to avoid random long delays in creating a websocket in my own projects. If you only need a completion callback and not continuous progress updates, long polling is an excellent solution which adds very little complexity compared to websockets.


Can you elaborate on what you mean by signaling to the http api?


Jack from StreamPot here. So happy to see it shared here. We'd love you to try it out and give us any feedback or requests.


Looks really nice, great work!


Is there any way I can run this in a fully local setup? Curious if a config for nginx could be provided?


Looks like you could potentially point this at Minio local S3 compatible storage based on env vars?

https://docs.streampot.io/installation


Isn't the local version just running ffmpeg as a command line?


This is great, I don't have a concrete use-case right now but can definitely see myself returning to this in the future. One thing that would be handy is having a way to either accept a ffmpeg cli command, or convert from a cli command to the typescript syntax. My experience with ffmpeg if you often do a lot of copying and pasting of commands from documentation or random guides, it'd save a bit of time if you didn't need to transcribe the commands into streampot's syntax.


Chat GPT has been a game changer for my ffmpeg usage. Instead of cobbling together commands from StackOverflow questions and guides, I just describe what I want in plain English and it's got a pretty damn good success rate for giving me what I need.


Thank you! We'll add it on our roadmap


Somebody please bring fully functional ffmpeg to Android via F-Droid please, one which is not compiled against /storage/emulated/0/ like Termux, etc, are. Many users rely on multiple profiles and don't run anything in the primary profile. I know of nothing right now like this. No FFshare doesn't work for purpose as outlined.

Bonus if all of Termux can be fixed for secondary Android profiles,


I see there's a paid version of Streampot. My understanding is the ffmpeg license is pretty restrictive, and doesn't allow for any commercial use unless you sign a contract with them. Is that something Streampot has done? Just wondering because I have looked into using ffmpeg in a commercial project and the license stopped me from doing so.


I am looking at https://www.ffmpeg.org/legal.html It doesn't look like FFmpeg themselves have any problem with commercial use. Unless you end up using the tech that falls under the mpeg pattents, based on what I am reading on the link above. Right? or are there other issues I am missing?


I didn't even look at the website to see they offered a paid service. Looking now -- even aside from potential license issues -- at risk of making yet another HN Dropbox comment, I'm curious how viable that is as a business model. What's being offered here? Running ffmpeg on their servers?

You'd figure anyone with a project with enough throughput for their "most popular" tier (5TB/month, 1000 requests/minute) and a budget of $165/month for video processing, they could wire up something on their own servers with not much effort. Even barring the cost of the service, I imagine it'd be cheaper to run that processing yourself staying within the server than it would be to upload it to another service and then download it again for thousands of videos.

Am I missing something?


You're missing the fact that video encoding is a whole different ballgame when it comes to what kind of CPU or VM you need. Any crappy oversubscribed cloud VM can host a website. Meanwhile ffmpeg wants at least dedicated cores, preferably entire dedicated servers.

Streampot lets you send 300 requests per minute for the cost of the cheapest dedicated server instance on hetzner.


Its not ffmpeg, its the codecs you have to worry about - if your posting the video publicly that is.


Looks really cool. I've thought about building a business around ffmpeg myself. Btw, your self hosting link is broken.


Thanks! Oops, good spot. Updated it. This is where it was supposed to go https://docs.streampot.io/installation.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: