Skip to content

Performance problems with large insertions #9

@martijnvermaat

Description

@martijnvermaat

As you can see, comparing AAAA... with A is instant, but comparing A with AAAA... takes a lot of time:

In [5]: %timeit extractor.describe_dna('A' * 10000, 'A')
10000 loops, best of 3: 129 µs per loop

In [6]: %timeit extractor.describe_dna('A', 'A' * 10000)
1 loops, best of 3: 1.13 s per loop

Perhaps more importantly, memory usage also sky rockets. I couldn't run this test with a 50 Kbp sample sequence on a machine with 4G memory, completely freezing my machine for half a minute. I would like to prevent this from happening on the server.

I didn't look into this further, but I suspect it tries to find the inserted sequence in the original sequence, which of course is not possible. Could this be an easy case to optimize?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions