Skip to content Skip to sidebar Skip to footer

How To Find The First Index Of Any Of A Set Of Characters In A String

I'd like to find the index of the first occurrence of any “special” character in a string, like so: >>> 'Hello world!'.index([' ', '!']) 5 …except that's not valid

Solution 1:

You can use enumerate and next with a generator expression, getting the first match or returning None if no character appears in s:

s = "Hello world!"

st = {"!"," "}
ind = next((i for i, ch  inenumerate(s) if ch in st),None)
print(ind)

You can pass any value you want to next as a default return value if there is no match.

If you want to use a function and raise a ValueError:

deffirst_index(s, characters):
    st = set(characters)
    ind = next((i for i, ch inenumerate(s) if ch in st), None)
    if ind isnotNone:
        return ind
    raise ValueError

For smaller inputs using a set won't make much if any difference but for large strings it will be a more efficient.

Some timings:

In the string, last character of character set:

In [40]: s = "Hello world!" * 100    
In [41]: string = s    
In [42]: %%timeit
st = {"x","y","!"}
next((i for i, ch in enumerate(s) if ch in st), None)
   ....: 
1000000 loops, best of 3: 1.71 µs per loop    
In [43]: %%timeit
specials = ['x', 'y', '!']
min(map(lambda x: (string.index(x) if (x instring) elselen(string)), specials))
   ....: 
100000 loops, best of 3: 2.64 µs per loop

Not in the string, larger character set:

In [44]: %%timeit
st = {"u","v","w","x","y","z"}
next((i for i, ch inenumerate(s) if ch in st), None)
   ....: 
1000000 loops, best of 3: 1.49 µs per loop

In [45]: %%timeit
specials = ["u","v","w","x","y","z"]
min(map(lambda x: (string.index(x) if (x in string) elselen(string)), specials))
   ....: 
100000 loops, best of 3: 5.48 µs per loop

In string an very first character of character set:

In [47]: %%timeit
specials = ['H', 'y', '!']
min(map(lambda x: (string.index(x) if (x in string) elselen(string)), specials))
   ....: 
100000 loops, best of 3: 2.02 µs per loop

In [48]: %%timeit
st = {"H","y","!"}
next((i for i, ch inenumerate(s) if ch in st), None)
   ....: 
1000000 loops, best of 3: 903 ns per loop

Solution 2:

I would favour the re module as it's built in and already tested. It's also optimised for exactly this kind of thing.

>>>import re>>>re.search(r'[ !]', 'Hello World!').start()
5

You probably want to check that a match was found though or catch the exception when it's not.

There are reasons for not using re, but I would want to see a good comment justifying the rational. Thinking that you can "do it better" is generally unnecessary, makes it harder for others to read the code and is less maintainable.

Solution 3:

Use a gen-exp and the find method.

>>>a = [' ', '!']>>>s = "Hello World!">>>min(s.find(i) for i in a)
5

To remove -1s if they occur, you can have a filter within the list comp

>>>a = [' ', '!','$']>>>s = "Hello World!">>>min(s.find(i) for i in a if i in s)
5

or you can substitute None

>>> min(s.find(i) if i in s else None for i in a)
5

Adding the timeit results

$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(s.find(i) for i in a if i in s)"1000000 loops, best of 3: 0.902 usec per loop
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';next((i for i, ch  in enumerate(s) if ch in a),None)"1000000 loops, best of 3: 1.25 usec per loop
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(map(lambda x: (s.index(x) if (x in s) else len(s)), a))"1000000 loops, best of 3: 1.12 usec per loop

In your Example case, Padraic's beautiful solution is a little slow. However in large test cases, It is definitely a winner. (It's a little surprising that alfasin's "Not as optimized" is also faster here)

Added the implementation details

>>> deftake1(s,a):
... min(s.find(i) for i in a if i in s)
... >>> import dis
>>> dis.dis(take1)
  20 LOAD_GLOBAL              0 (min)
              3 LOAD_CLOSURE             0 (s)
              6 BUILD_TUPLE              19 LOAD_CONST               1 (<code object <genexpr> at 0x7fa622e961b0, file "<stdin>", line 2>)
             12 MAKE_CLOSURE             015 LOAD_FAST                1 (a)
             18 GET_ITER            
             19 CALL_FUNCTION            122 CALL_FUNCTION            125 POP_TOP             
             26 LOAD_CONST               0 (None)
             29 RETURN_VALUE        
>>> deftake2(s,a):
... next((i for i, ch  inenumerate(s) if ch in a),None)
... >>> dis.dis(take2)
  20 LOAD_GLOBAL              0 (next)
              3 LOAD_CLOSURE             0 (a)
              6 BUILD_TUPLE              19 LOAD_CONST               1 (<code object <genexpr> at 0x7fa622e96e30, file "<stdin>", line 2>)
             12 MAKE_CLOSURE             015 LOAD_GLOBAL              1 (enumerate)
             18 LOAD_FAST                0 (s)
             21 CALL_FUNCTION            124 GET_ITER            
             25 CALL_FUNCTION            128 LOAD_CONST               0 (None)
             31 CALL_FUNCTION            234 POP_TOP             
             35 LOAD_CONST               0 (None)
             38 RETURN_VALUE        
>>> deftake3(s,a):
... min(map(lambda x: (s.index(x) if (x in s) elselen(s)), a))
... >>> dis.dis(take3)
  20 LOAD_GLOBAL              0 (min)
              3 LOAD_GLOBAL              1 (map)
              6 LOAD_CLOSURE             0 (s)
              9 BUILD_TUPLE              112 LOAD_CONST               1 (<code object <lambda> at 0x7fa622e44eb0, file "<stdin>", line 2>)
             15 MAKE_CLOSURE             018 LOAD_FAST                1 (a)
             21 CALL_FUNCTION            224 CALL_FUNCTION            127 POP_TOP             
             28 LOAD_CONST               0 (None)
             31 RETURN_VALUE        

As you can clearly see in Padraic's case the loading of the global functions next and enumerate are the ones which are killing the time along with None at the end. In alfasin's solution the primary slowing down is the lambda function.

Solution 4:

Not as optimized as Padraic Cunningham's solution, but still a one liner:

string = "Hello world!"
specials = [' ', '!', 'x']
min(map(lambda x: (string.index(x) if (x in string) elselen(string)), specials))

Post a Comment for "How To Find The First Index Of Any Of A Set Of Characters In A String"