Menu

SNES De-blur

Home Forums OSSC OSSC – Feature Requests SNES De-blur

This topic contains 13 replies, has 4 voices, and was last updated by  paulb_nl 8 months ago.

Viewing 14 posts - 1 through 14 (of 14 total)
  • Author
    Posts
  • #8206

    paulb_nl
    Participant

    I have made a page where you can see the de-blur in action. You can use the sliders to change the lpf value to see how it looks.
    http://pbnl.byethost7.com/snes/snes-reverse-lpf.html

    It works by using the current pixel and previous pixel to calculate the original pixel using a generic low pass filter function.

    On the 256×240 optimized mode capture it works very well. It also works on generic linetriple but the capture had noise that also gets sharpened.

    It would be a nice feature for the non 1-CHIP consoles. 🙂

    snes de-blur

    #8238

    BuckoA51
    Keymaster

    I think this would be possible but would require a lot of work.

    #8244

    paulb_nl
    Participant

    I think the tricky part will be the reverse lpf formula because I have read that division is difficult to do on a FPGA. This is the formula I am using:

    originalData = PrevData - ((PrevData - Data) / lpfValue)

    I could change it to multiplication with a lookup table for the different lpf values if thats needed.

    #8260

    marqs
    Participant

    Floating point division on Cyclone IV would be slow but probably sufficient for 5.37MHz pixel clock when using 256×240 optimized mode with SNES/NES. That could be tested after getting more important/generic features out the way.

    #8270

    paulb_nl
    Participant

    Ok thanks. I have updated the page with a faster multiplication version just in case if division would be too slow for normal linedouble or generic linetriple mode.

    a = (PrevData - Data) * lpf_values[lpfv];
    b = PrevData * 100;
    originalData = ((b - a) * 656) >> 16; // (b - a) / 100
    #11005

    paulb_nl
    Participant

    I have been working on implementing this on the OSSC. This is the function I am using now:

    function [7:0] apply_reverse_lpf;
        input enable;
        input [7:0] data;
        input [7:0] data_prev;
        input [8:0] lpfv;
        int a, b, c;
    
        begin
            a = data_prev << 7;
            b = (data_prev - data) * lpfv;
            c = (a < b ? 0 : (a - b));
    
            if (enable)
                apply_reverse_lpf = (c > (255 << 7)) ? 8'hFF : (c >> 7);
            else
                apply_reverse_lpf = data;
        end
    endfunction

    The function compares current to previous pixel and of course for the 256×240 optimized mode the source pixels are repeated so it should only store the previous pixel at the last repeated pixel.

    I check when linebuf_hoffset changes and it seems to change at the third repeat. So I use lpf_last_hoffset to store the previous pixel at the following cycle.

    I added this after calling the reverse_lpf function:

    if ((V_MULTMODE == V_MULTMODE_3X) & (H_MULTMODE == H_MULTMODE_OPTIMIZED))
        begin
            if (lpf_last_hoffset == 1'b1)
                begin
                    R_prev <= R_act;
                    G_prev <= G_act;
                    B_prev <= B_act;
                end
    
            if (linebuf_hoffset_prev != linebuf_hoffset)
                    lpf_last_hoffset <= 1'b1;
            else
                    lpf_last_hoffset <= 1'b0;
    
            linebuf_hoffset_prev <= linebuf_hoffset;
        end
    else
        begin
            R_prev <= R_act;
            G_prev <= G_act;
            B_prev <= B_act;
        end

    Any improvement suggestions are welcome. There is probably a better way to check the input pixel offset.

    Here is a picture of it in action in lineX3 optimized:

    #11017

    marqs
    Participant

    That looks nice! Along with scaling filters and line4x/5x, this should be exciting news to SNES owners.

    Does apply_reverse_lpf have any effect on timing, i.e. does pclk3x Fmax still stay around 150MHz? Effective linetriple pixel clock is only 80MHz, but ideal Line5x implementation (not using HDMI_TX pixel replication as is done now) would need to run at at 160MHz in which case apply_reverse_lpf might need to be pipelined.

    #11019

    Harrumph
    Participant

    Exciting news indeed! Really looking forward to trying this out! 😀

    #11022

    paulb_nl
    Participant

    If I am looking at it correctly, the Timing analyzer reports pclk3x Fmax to be around 60MHz now. That can’t be good 😀

    #11023

    marqs
    Participant

    Ok, that’s what I suspected. Good news is that in optimized modes there’s several clock cycles available per input pixel so a pipelined implementation should easily pass timing requirements.

    #11028

    marqs
    Participant

    Alternatively, you could organize the code so that output of apply_reverse_lpf is captured to registers after N (>=2) cycles from the moment when its inputs are changed, and then constrain respective paths to multicycle (set_multicycle_path) to avoid timing violations. The registers driving apply_reverse_lpf inputs must remain stable for the same N cycles, but that should be a given in 256/320 column optimized modes.

    #11032

    paulb_nl
    Participant

    I am new to Verilog/FPGA so my knowledge is very limited. I have no idea how to change it to a pipelined version. Would it still work in linedouble mode then?

    For the alternative, do you mean something like this? Compare R_act to R_prev and when it has changed increase a register every cycle then if register is 2 do R_pp1 <= apply_reverse_lpf. I guess that wouldnt work with linedouble/generic linetriple.

    Are you willing to make the pipelined version? With your knowledge it will probably take you a few minutes 🙂

    #11142

    marqs
    Participant

    Multicycle implementation would only work with modes that have required number of cycles per each read from linebuffer (i.e. limited to optimized modes), while pipelined implementation could work regardless of line multiplication mode. I’ll take a look into this after getting upcoming fw release done.

    #11161

    paulb_nl
    Participant

    Thanks! The apply_reverse_lpf function can be a bit optimized by bitshifting with 4 instead of 7 and using shortint instead of int. lpfvalue rang is then 16(off) to 63. Fmax was around 80MHz that way.

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic.