As you can see, comparing AAAA... with A is instant, but comparing A with AAAA... takes a lot of time:
In [5]: %timeit extractor.describe_dna('A' * 10000, 'A')
10000 loops, best of 3: 129 µs per loop
In [6]: %timeit extractor.describe_dna('A', 'A' * 10000)
1 loops, best of 3: 1.13 s per loop
Perhaps more importantly, memory usage also sky rockets. I couldn't run this test with a 50 Kbp sample sequence on a machine with 4G memory, completely freezing my machine for half a minute. I would like to prevent this from happening on the server.
I didn't look into this further, but I suspect it tries to find the inserted sequence in the original sequence, which of course is not possible. Could this be an easy case to optimize?
As you can see, comparing
AAAA...withAis instant, but comparingAwithAAAA...takes a lot of time:Perhaps more importantly, memory usage also sky rockets. I couldn't run this test with a 50 Kbp sample sequence on a machine with 4G memory, completely freezing my machine for half a minute. I would like to prevent this from happening on the server.
I didn't look into this further, but I suspect it tries to find the inserted sequence in the original sequence, which of course is not possible. Could this be an easy case to optimize?